Running local models on Macs gets faster with Ollama’s MLX support

ZippyPeanut · Apr 1, 2026

Unimportant said:
newbie question:
Are there security and privacy issues with this?

Thanks

Seems like a perfectly good newbie question to me.

I can't see that there would be any security and privacy issues with something that is run entirely locally and doesn't reach out to the internet. But who really knows? Is there any surreptitious code running in the background that reports activity that we don't want reported? But I doubt that this presents much of a privacy/security threat.

My follow-up question is, Why would anyone downvote the OP's question, a question that's prefaced by "newbie question"?

Edit: Added verb

OrangeCream · Apr 1, 2026

Unimportant said:
newbie question:
Are there security and privacy issues with this?

Thanks

None. This solves all the security and privacy issues with cloud based AI.

Still Breathing · Apr 1, 2026

williamyf said:
Also remmeber that Macs top out at 256GB RAM and AMD tops out at 128GB RAM, and both can be clustered up to 4x machines.

Macs (currently) go to 512GB with the Ultra-3. I just sold one for twice what I paid for it. I'm still expecting the M5 Ultras to support 512GB as well when they launch even if the largest M3 on sale right now is 256GB

RecycledHandle · Apr 1, 2026

OrangeCream said:
None. This solves all the security and privacy issues with cloud based AI.

Nope. Maybe what you say is sarcastic but else, tools will search anything on your computer and may make web searches with that to solve your problem. It can be your username, your code with your access tokens and anything else.

Saying everything will stay local is untrue.

ZippyPeanut · Apr 1, 2026

RecycledHandle said:
Nope. Maybe what you say is sarcastic but else, tools will search anything on your computer and may make web searches with that to solve your problem. It can be your username, your code with your access tokens and anything else.

Saying everything will stay local is untrue.

So, a newbie question here:

Can an air-gapped computer run a local AI?

TigerAway · Apr 1, 2026

Lexus Lunar Lorry said:
Gather around the campfire, children. The old one is about to tell a tale of times gone by: did you know that in the ancient days, RAM was affordable and commonplace?

Apple RAM was affordable? Do tell.

RecycledHandle · Apr 1, 2026

ZippyPeanut said:
So, a newbie question here:

Can an air-gapped computer run a local AI?

Yes, all the inference happens locally. Make sure you have good doc accessible in your environment if what you try to do is coding or documentation. You also can't match the performance of a frontier model running by one of the big providers (neither in speed or quality). But it still can be very useful.

Tactical Finesse · Apr 1, 2026

TigerAway said:
Not for Mac users

Apple RAM pricing makes sense--for the Studio series. Unlike PC memory which is varying degrees of surpassing JEDEC spec...The Studio run has had memory far beyond what any desktop PC in terms of speed can even POST never mind have stable (800GB/second??!). The only way a PC gets remotely close, or surpasses 800GB/s, are workstation.server threadripper or Epyc systems with quad channel or higher memory which cost more. So yes they're expensive--but you're getting a faster desktop product for the money.

Catch being--that is the Studio. The Mini and other devices have much more JEDEC spec like memory you'd find in a PC. Which is absolutely highway robbery.

ZippyPeanut · Apr 1, 2026

TigerAway said:
Apple RAM was affordable? Do tell.

Meh. Don't nitpick. Myths and other tall tales have always been told around the campfire!

ZippyPeanut · Apr 1, 2026

RecycledHandle said:
Yes, all the inference happens locally. Make sure you have good doc accessible in your environment if what you try to do is coding or documentation. You also can't match the performance of a frontier model running by one of the big providers (neither in speed or quality). But it still can be very useful.

So, the entire dataset needs to also be on the local machine. Thus any local AI would seem to be very specialized and limited.

Tactical Finesse · Apr 1, 2026

ZippyPeanut said:
So, the entire dataset needs to also be on the local machine. Thus any local AI would seem to be very specialized and limited.

Well, it depends on what model you're wanting to run and what you're wanting to do with it--and the hardware you have to throw at the problem. There are low-end models that will run "fine" on an 8GB GPU and be pretty fast. But if you're wanting to run a fat 120b parameter model, you need to have a lot of memory to throw at it. Bad metaphor time--a Chromebook is a pretty awful computer but if all you're doing is email and watching YouTube it is "fine".

On the PC side...Strix Halo platforms are nowhere near the fastest---but for $1500-$2000ish bucks you cannot beat the memory allocation available for inference. If you have $4000, a Mac Studio is a faster system with a similar or more RAM available.

PBG4 Dude · Apr 1, 2026

pond-iridium.2q said:
I’ve got a 128GB M5 Max laptop coming April 9th

Not sure why you’re being downvoted. I ordered the same earlier today with a delivery date of 4/20.

xyzzy01 · Apr 1, 2026

TigerAway said:
Apple RAM was affordable? Do tell.

Back in the days of Intel iMacs you could use your own memory. My last one had 128 GB... Mostly for use with VMS.

SuperOuss · Apr 1, 2026

zogus said:
What tool(s) would you recommend in place of Ollama on macOS? LMStudio or some other option?

I use vLLM to load and test models locally. I also use it to deploy and serve models on external infra. It has been solid for our testing

When I tried ollama, I encountered a degradation of the performance the more the model was used (locally and on external infra, lots of different models)

PBG4 Dude · Apr 1, 2026

ZippyPeanut said:
So, a newbie question here:

Can an air-gapped computer run a local AI?

I started working with local models recently and this was one of my questions. I used the local LLMs while in airplane mode (no network) and they worked as expected.

ZippyPeanut · Apr 1, 2026

PBG4 Dude said:
Not sure why you’re being downvoted. I ordered the same earlier today with a delivery date of 4/20.

Here you go, one for you and one for the OP:

Resistance · Apr 1, 2026

williamyf said:
Not necesarily. Training the models in the first place is significantly harder than running them. Orders of magnitude harder. So, companies still need humongous datacenters to iterate more and more refined models.

After the AI bubble pops or deflates, some, but not all of those AI datacenters (both the training and the inference ones) will be repurposed to HPC, VDI or normal cloud, depending on the hardware within.

It is unclear how much demand there will be for new models for desktop use.

To my knowledge, training and inference datacenters are pretty much the same.

Both VDI and "normal cloud" datacenters are so different that (when considering TCO) it may well be cheaper to scrap an AI datacenter and build a new datacenter somewhere else than retrofit an AI datacenter.

As for HPC assuming your workload is suitable for an AI datacenter as is (a big assumption), you're still going to have problems. AI GPUs are rapidly depreciating assets, both in that they quickly become outdated, and that they wear out. Also, there appear to be a vast quantity more than could ever economically be used for HPC.

Reusing the planned AI datacenter capacity on any meaningful scale is a wild fantasy. Doing so with existing capacity is also unlikely.

salinmooch · Apr 1, 2026

Ralf The Dog said:
I would not call it, "almost as good." The things that Claude Opus 4.6 and GPT 5.4 can do is insane.

Definatley depends on the use case. I have a local model I use for local home assistant control and it has been good enough for the last few months (meets spousal certification).

I really use it so the rest of the family can use casual requests to query the house and not need to be to pedantic, and I don't need it to do a conversation or coding or web related request. Exposing entities carefully and defining intents makes for a tight, flexible, and private system.

Dano40 · Apr 1, 2026

Eldorito said:
I was hoping that the wait time on a mac studio (end of June in Australia) meant there was a new model on the way. Now I'm guessing it's just AI demand ruining yet another thing.

OpenAI and Oracle are running out of steam and Microslop is getting cold……

Control Group · Apr 1, 2026

ZippyPeanut said:
So, the entire dataset needs to also be on the local machine. Thus any local AI would seem to be very specialized and limited.

Not sure what you mean - I'm running Qwen 2.5 at 32 billion parameters and a...4 bit? quant (I'm not sitting at that machine right now, so I'm not 100% positive) on a 4090 at home, and it works fine isolated from the internet. It's using thirty-ish GB of disk, which isn't exactly tiny, but it's a half to a third the size of a modern AAA game, so it's not exactly enormous either.

I've not run into any particular limitations, at least for my uses - some hobby JS coding, some TTRPG scenario development, some fiction writing. It's not as good as the frontier models, unsurprisingly, but it's plenty good enough to be useful.

Unless the limitation you're talking about is "doesn't have access to all of the internet's information," in which case...yeah. Being disconnected does impose that limitation.

DKlimax · Apr 1, 2026

Tactical Finesse said:
Apple RAM pricing makes sense--for the Studio series. Unlike PC memory which is varying degrees of surpassing JEDEC spec...The Studio run has had memory far beyond what any desktop PC in terms of speed can even POST never mind have stable (800GB/second??!). The only way a PC gets remotely close, or surpasses 800GB/s, are workstation.server threadripper or Epyc systems with quad channel or higher memory which cost more. So yes they're expensive--but you're getting a faster desktop product for the money.

Catch being--that is the Studio. The Mini and other devices have much more JEDEC spec like memory you'd find in a PC. Which is absolutely highway robbery.

Tactical Finesse said:
Oh the Apple system is faster but it is a much more expensive machine. Those top Studio RAM configs are so expensive to RAM upgrade because it is ~800GB/second memory which is 4x faster than strix halo.

Which the memory speed isn't an AMD problem--it is a PC problem. Strix Halo is unique because it is one of the only ways to actually get guaranteed 8000MT/s AKA 200GB/second memory in a desktop sans buying workstation gear. You can buy 8000+ MT/s claimed RAM kits--but good luck finding a PC CPU platform that can reliably hit those speeds and timings and be production stable.The only other way is a Threadripper with quad channel memory--which just the mainboard and the CPU will cost more than the strix halo.

Sorry, but you two are wrong. There's nothing special about Apple's memory. They are still same memory chips others are buying. To get that bandwidth they are using same trick as GPUs. Large number of memory channels. (64 16-bit channels) While there is no desktop or workstation CPU with similar bandwidth, there is server-class CPU with such bandwidth. Granite Rapids AP from Intel. Example: Xeon 6979P It can use MRDIMM DDR5 at 8800MT/s in 12 channel configuration. Workstation version has at most 8 channels with same memory speed, though.

ZippyPeanut · Apr 1, 2026

Control Group said:
Not sure what you mean - I'm running Qwen 2.5 at 32 billion parameters and a...4 bit? quant (I'm not sitting at that machine right now, so I'm not 100% positive) on a 4090 at home, and it works fine isolated from the internet. It's using thirty-ish GB of disk, which isn't exactly tiny, but it's a half to a third the size of a modern AAA game, so it's not exactly enormous either.

I've not run into any particular limitations, at least for my uses - some hobby JS coding, some TTRPG scenario development, some fiction writing. It's not as good as the frontier models, unsurprisingly, but it's plenty good enough to be useful.

Unless the limitation you're talking about is "doesn't have access to all of the internet's information," in which case...yeah. Being disconnected does impose that limitation.

Yeah, "doesn't have access to all of the internet's information" is what I'm getting at.

I don't doubt that a single, powerful computer can locally run sophisticated AI applications. I'm just thinking that the only data the AI has to work with are local files, which (I surmise) limits the AI's range of output.

Resistance · Apr 1, 2026

ZippyPeanut said:
Yeah, "doesn't have access to all of the internet's information" is what I'm getting at.

I don't doubt that a single, powerful computer can locally run sophisticated AI applications. I'm just thinking that the only data the AI has to work with are local files, which (I surmise) limits the AI's range of output.

I suppose it depends what you mean by "sophisticated AI applications", the only LLM one I can think of off the top of my head that can't be done locally is "deep research", which requires an internet connection and the ability to circumvent rate limiting and other bot mitigations. What are you talking about?

Unimportant · Apr 1, 2026

ZippyPeanut said:
Yeah, "doesn't have access to all of the internet's information" is what I'm getting at.

I don't doubt that a single, powerful computer can locally run sophisticated AI applications. I'm just thinking that the only data the AI has to work with are local files, which (I surmise) limits the AI's range of output.

Copies of Wikipedia are available for downloading. I’m not familiar with the specifics but entities are doing this around the clock

That’s a start for a set of general data

kmcmurtrie · Apr 1, 2026

Tamerz said:
It says "Please make sure you have a Mac with more than 32GB of unified memory."

Not sure if it will work with exactly 32GB.

I ran a 405b parameter model on my Linux computer with only 128GB RAM and 500GB swap. It works.

It also needed 30+ hours to answer complex questions, so it's all a matter of how long you're willing to wait

uhuznaa · Apr 2, 2026

Taircron said:
I still hope and wish for a future that is local. The stupid datacenters sucking up all the RAM is ruining it, but I really do think the road forward is local, home-based LLM evaluation servers. Then they can safely hold your entire life - your calendar, your texts, your emails, your search history - and really be your personal assistant, toolchain, and more. We just need slightly more powerful local setups, and someone to figure out how to get a frontier-quality model onto your local system, either through piracy, distillation, or some sort of licensing agreement.

While I share the sentiment (personal AI assistants are only really useful when they have access to as much of your personal data as possible and this is not something I would want to do with any LLM running out there) this will NOT help the RAM situation. Any local/personal setup will be idle most of the time while cloud based systems can run at basically 100% utilization.

The AI/privacy situation is heavily leaning towards giving up on privacy, for reasons. And just look at the good old Cloud for personal data storage and syncing: This is actually quite easy and cheap to do locally and STILL most people prefer to throw everything just at the services Google, MS, Apple or others offer, just for convenience.

I mean, if your email, calendars and documents live at Google anyway, why not use Google's AI on them then?

Even in the best case any halfway useful local setup will be expensive enough that hardly anyone will bother with it.

reyan · Apr 2, 2026

TheShark said:
I haven't played with Ollama per se. It does look like it's got more features than what Apple includes with the MLX code. Which makes sense as more of the scripts in the MLX setup are basically demos. I'm kinda surprised that Ollama only wants to run a single model though. Using mlx-lm you can have most models from Hugging Face running as a local chatbot in two or three commands.

Ollama supports a ton of different models. It is only limited to a single model (so far) when it comes to supporting this new feature, and only in preview.

I have 30+ models installed using ollama on my M2 Max Studio at the moment.

reyan · Apr 2, 2026

RecycledHandle said:
Nope. Maybe what you say is sarcastic but else, tools will search anything on your computer and may make web searches with that to solve your problem. It can be your username, your code with your access tokens and anything else.

Saying everything will stay local is untrue.

Is that really a problem with ollama per se? As opposed to using ollama as the AI back-end to various tools that you separately decide to give access to the internet?

If you use ollama with, let's say, open webui to run it as a chatbot only, it is not making requests to the internet. If you use ollama from CLI, same thing.

Yes, installing openclaw on your primary system with all of your data, using local models which are less capable (and more prone to doing dumb things), probably increases your risks. I'm not sure it is fair to say this is an ollama issue though, as opposed to an openclaw issue.

(BTW, for those who are interested in using openclaw without giving it access to everything on your mac, there are numerous workarounds. Docker/Podman locally as an example. Right now, I run openclaw on a dedicated raspberry pi which still uses the mac studio for AI processing by sending API calls across my LAN. It uses my Mac's GPU and unified memory for processing AI tokens, but otherwise only has access to what is on the Raspberry Pi itself. ie. nothing)

reyan · Apr 2, 2026

ZippyPeanut said:
So, a newbie question here:

Can an air-gapped computer run a local AI?

Yes. Also, the comment you are replying to is conflating risks with tools you may use that connect to Ollama (ie. openclaw) with privacy issues with ollama proper.

You can use ollama without connecting it to tools (depends on your use case). You can use ollama on an air gapped computer. You can even use ollama with tools on an air-gapped computer, although that obviously won't support any internet use cases.

reyan · Apr 2, 2026

ZippyPeanut said:
So, the entire dataset needs to also be on the local machine. Thus any local AI would seem to be very specialized and limited.

Upvoted because you are trying to understand. I don't get the downvotes.

To answer: Sort of. The model itself will have some capabilities it "learned" from its training data, which doesn't require the reference material (aside from the model itself) to be local. You can also use tools like openclaw to give your local AI the ability to reach out to the internet (although that will defeat or weaken the privacy stance you were asking about earlier).

Overall though, you are not going to get the speed, model capability and breadth of access to reference information by running a model locally, versus the huge models running in the cloud. If your requirements are light and you only need it to reference private data, it could still meet the mark.

numerobis · Apr 3, 2026

What's the J/compute ratio for home compute versus giant datacenter compute?

I assume it's slightly worse, though maybe the lower volume of cooling at home makes up for it (indeed, in the heating season, any electricity use I have at home is free, I'd otherwise just be lighting up the radiators).

numerobis · Apr 3, 2026

reyan said:
Yes. Also, the comment you are replying to is conflating risks with tools you may use that connect to Ollama (ie. openclaw) with privacy issues with ollama proper.

You can use ollama without connecting it to tools (depends on your use case). You can use ollama on an air gapped computer. You can even use ollama with tools on an air-gapped computer, although that obviously won't support any internet use cases.

I look forward to openclaw on an air-gapped computer using social engineering to cross the air gap.

Running local models on Macs gets faster with Ollama’s MLX support

Ars Legatus Legionis

Ars Legatus Legionis

Ars Centurion

Wise, Aged Ars Veteran

Ars Legatus Legionis

Ars Tribunus Militum

Wise, Aged Ars Veteran

Wise, Aged Ars Veteran

Ars Legatus Legionis

Ars Legatus Legionis

Wise, Aged Ars Veteran

Wise, Aged Ars Veteran

Ars Praetorian

Smack-Fu Master, in training

Wise, Aged Ars Veteran

Ars Legatus Legionis

Wise, Aged Ars Veteran

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Legatus Legionis

Ars Praefectus

Ars Legatus Legionis

Wise, Aged Ars Veteran

Ars Praetorian

Ars Centurion

Ars Tribunus Angusticlavius

Ars Praetorian

Ars Praetorian

Ars Praetorian

Ars Praetorian

Ars Tribunus Angusticlavius

Ars Tribunus Angusticlavius