Apple Silicon Macs get a performance boost thanks to better unified memory usage.
See full article...
See full article...
Seems like a perfectly good newbie question to me.newbie question:
Are there security and privacy issues with this?
Thanks
None. This solves all the security and privacy issues with cloud based AI.newbie question:
Are there security and privacy issues with this?
Thanks
Macs (currently) go to 512GB with the Ultra-3. I just sold one for twice what I paid for it. I'm still expecting the M5 Ultras to support 512GB as well when they launch even if the largest M3 on sale right now is 256GBAlso remmeber that Macs top out at 256GB RAM and AMD tops out at 128GB RAM, and both can be clustered up to 4x machines.
Nope. Maybe what you say is sarcastic but else, tools will search anything on your computer and may make web searches with that to solve your problem. It can be your username, your code with your access tokens and anything else.None. This solves all the security and privacy issues with cloud based AI.
So, a newbie question here:Nope. Maybe what you say is sarcastic but else, tools will search anything on your computer and may make web searches with that to solve your problem. It can be your username, your code with your access tokens and anything else.
Saying everything will stay local is untrue.
Gather around the campfire, children. The old one is about to tell a tale of times gone by: did you know that in the ancient days, RAM was affordable and commonplace?
Yes, all the inference happens locally. Make sure you have good doc accessible in your environment if what you try to do is coding or documentation. You also can't match the performance of a frontier model running by one of the big providers (neither in speed or quality). But it still can be very useful.So, a newbie question here:
Can an air-gapped computer run a local AI?
Apple RAM pricing makes sense--for the Studio series. Unlike PC memory which is varying degrees of surpassing JEDEC spec...The Studio run has had memory far beyond what any desktop PC in terms of speed can even POST never mind have stable (800GB/second??!). The only way a PC gets remotely close, or surpasses 800GB/s, are workstation.server threadripper or Epyc systems with quad channel or higher memory which cost more. So yes they're expensive--but you're getting a faster desktop product for the money.Not for Mac users![]()
Meh. Don't nitpick. Myths and other tall tales have always been told around the campfire!Apple RAM was affordable? Do tell.
So, the entire dataset needs to also be on the local machine. Thus any local AI would seem to be very specialized and limited.Yes, all the inference happens locally. Make sure you have good doc accessible in your environment if what you try to do is coding or documentation. You also can't match the performance of a frontier model running by one of the big providers (neither in speed or quality). But it still can be very useful.
Well, it depends on what model you're wanting to run and what you're wanting to do with it--and the hardware you have to throw at the problem. There are low-end models that will run "fine" on an 8GB GPU and be pretty fast. But if you're wanting to run a fat 120b parameter model, you need to have a lot of memory to throw at it. Bad metaphor time--a Chromebook is a pretty awful computer but if all you're doing is email and watching YouTube it is "fine".So, the entire dataset needs to also be on the local machine. Thus any local AI would seem to be very specialized and limited.
Not sure why you’re being downvoted. I ordered the same earlier today with a delivery date of 4/20.I’ve got a 128GB M5 Max laptop coming April 9th
Back in the days of Intel iMacs you could use your own memory. My last one had 128 GB... Mostly for use with VMS.Apple RAM was affordable? Do tell.
I use vLLM to load and test models locally. I also use it to deploy and serve models on external infra. It has been solid for our testingWhat tool(s) would you recommend in place of Ollama on macOS? LMStudio or some other option?
I started working with local models recently and this was one of my questions. I used the local LLMs while in airplane mode (no network) and they worked as expected.So, a newbie question here:
Can an air-gapped computer run a local AI?
It is unclear how much demand there will be for new models for desktop use.Not necesarily. Training the models in the first place is significantly harder than running them. Orders of magnitude harder. So, companies still need humongous datacenters to iterate more and more refined models.
After the AI bubble pops or deflates, some, but not all of those AI datacenters (both the training and the inference ones) will be repurposed to HPC, VDI or normal cloud, depending on the hardware within.
Definatley depends on the use case. I have a local model I use for local home assistant control and it has been good enough for the last few months (meets spousal certification).I would not call it, "almost as good." The things that Claude Opus 4.6 and GPT 5.4 can do is insane.
OpenAI and Oracle are running out of steam and Microslop is getting cold……I was hoping that the wait time on a mac studio (end of June in Australia) meant there was a new model on the way. Now I'm guessing it's just AI demand ruining yet another thing.
Not sure what you mean - I'm running Qwen 2.5 at 32 billion parameters and a...4 bit? quant (I'm not sitting at that machine right now, so I'm not 100% positive) on a 4090 at home, and it works fine isolated from the internet. It's using thirty-ish GB of disk, which isn't exactly tiny, but it's a half to a third the size of a modern AAA game, so it's not exactly enormous either.So, the entire dataset needs to also be on the local machine. Thus any local AI would seem to be very specialized and limited.
Apple RAM pricing makes sense--for the Studio series. Unlike PC memory which is varying degrees of surpassing JEDEC spec...The Studio run has had memory far beyond what any desktop PC in terms of speed can even POST never mind have stable (800GB/second??!). The only way a PC gets remotely close, or surpasses 800GB/s, are workstation.server threadripper or Epyc systems with quad channel or higher memory which cost more. So yes they're expensive--but you're getting a faster desktop product for the money.
Catch being--that is the Studio. The Mini and other devices have much more JEDEC spec like memory you'd find in a PC. Which is absolutely highway robbery.
Sorry, but you two are wrong. There's nothing special about Apple's memory. They are still same memory chips others are buying. To get that bandwidth they are using same trick as GPUs. Large number of memory channels. (64 16-bit channels) While there is no desktop or workstation CPU with similar bandwidth, there is server-class CPU with such bandwidth. Granite Rapids AP from Intel. Example: Xeon 6979P It can use MRDIMM DDR5 at 8800MT/s in 12 channel configuration. Workstation version has at most 8 channels with same memory speed, though.Oh the Apple system is faster but it is a much more expensive machine. Those top Studio RAM configs are so expensive to RAM upgrade because it is ~800GB/second memory which is 4x faster than strix halo.
Which the memory speed isn't an AMD problem--it is a PC problem. Strix Halo is unique because it is one of the only ways to actually get guaranteed 8000MT/s AKA 200GB/second memory in a desktop sans buying workstation gear. You can buy 8000+ MT/s claimed RAM kits--but good luck finding a PC CPU platform that can reliably hit those speeds and timings and be production stable.The only other way is a Threadripper with quad channel memory--which just the mainboard and the CPU will cost more than the strix halo.
Yeah, "doesn't have access to all of the internet's information" is what I'm getting at.Not sure what you mean - I'm running Qwen 2.5 at 32 billion parameters and a...4 bit? quant (I'm not sitting at that machine right now, so I'm not 100% positive) on a 4090 at home, and it works fine isolated from the internet. It's using thirty-ish GB of disk, which isn't exactly tiny, but it's a half to a third the size of a modern AAA game, so it's not exactly enormous either.
I've not run into any particular limitations, at least for my uses - some hobby JS coding, some TTRPG scenario development, some fiction writing. It's not as good as the frontier models, unsurprisingly, but it's plenty good enough to be useful.
Unless the limitation you're talking about is "doesn't have access to all of the internet's information," in which case...yeah. Being disconnected does impose that limitation.
I suppose it depends what you mean by "sophisticated AI applications", the only LLM one I can think of off the top of my head that can't be done locally is "deep research", which requires an internet connection and the ability to circumvent rate limiting and other bot mitigations. What are you talking about?Yeah, "doesn't have access to all of the internet's information" is what I'm getting at.
I don't doubt that a single, powerful computer can locally run sophisticated AI applications. I'm just thinking that the only data the AI has to work with are local files, which (I surmise) limits the AI's range of output.
Copies of Wikipedia are available for downloading. I’m not familiar with the specifics but entities are doing this around the clockYeah, "doesn't have access to all of the internet's information" is what I'm getting at.
I don't doubt that a single, powerful computer can locally run sophisticated AI applications. I'm just thinking that the only data the AI has to work with are local files, which (I surmise) limits the AI's range of output.
I ran a 405b parameter model on my Linux computer with only 128GB RAM and 500GB swap. It works.It says "Please make sure you have a Mac with more than 32GB of unified memory."
Not sure if it will work with exactly 32GB.
I still hope and wish for a future that is local. The stupid datacenters sucking up all the RAM is ruining it, but I really do think the road forward is local, home-based LLM evaluation servers. Then they can safely hold your entire life - your calendar, your texts, your emails, your search history - and really be your personal assistant, toolchain, and more. We just need slightly more powerful local setups, and someone to figure out how to get a frontier-quality model onto your local system, either through piracy, distillation, or some sort of licensing agreement.
Ollama supports a ton of different models. It is only limited to a single model (so far) when it comes to supporting this new feature, and only in preview.I haven't played with Ollama per se. It does look like it's got more features than what Apple includes with the MLX code. Which makes sense as more of the scripts in the MLX setup are basically demos. I'm kinda surprised that Ollama only wants to run a single model though. Using mlx-lm you can have most models from Hugging Face running as a local chatbot in two or three commands.
Nope. Maybe what you say is sarcastic but else, tools will search anything on your computer and may make web searches with that to solve your problem. It can be your username, your code with your access tokens and anything else.
Saying everything will stay local is untrue.
Yes. Also, the comment you are replying to is conflating risks with tools you may use that connect to Ollama (ie. openclaw) with privacy issues with ollama proper.So, a newbie question here:
Can an air-gapped computer run a local AI?
Upvoted because you are trying to understand. I don't get the downvotes.So, the entire dataset needs to also be on the local machine. Thus any local AI would seem to be very specialized and limited.
I look forward to openclaw on an air-gapped computer using social engineering to cross the air gap.Yes. Also, the comment you are replying to is conflating risks with tools you may use that connect to Ollama (ie. openclaw) with privacy issues with ollama proper.
You can use ollama without connecting it to tools (depends on your use case). You can use ollama on an air gapped computer. You can even use ollama with tools on an air-gapped computer, although that obviously won't support any internet use cases.