ARMageddon

byrningman · Mar 3, 2026

Presumably the new approach means that Apple is making smaller, cheaper dies. It’s a shame that it hasn’t resulted in lower prices for the MBPs. But I guess RAM/SSD costs are soaring, and likely TSMC’s prices too. Has the AIpocalypse averted ARMaggedon?

dmsilev · Mar 3, 2026

The individual dies are smaller, which I'm sure helps yields, but the total area is probably larger than what it would be for a monolithic chip made on the same process. You still need all of the same functional blocks, and you have to add whatever logic and external interfacing is needed to drive the inter-die communications. Also, now every Pro and Max chip needs one of those silicon interposer things, rather than just the Ultras.

Advantages are that you can make a 2-chip module whose total size is bigger than the single-reticule limit, you have more flexibility in binning chips and managing yields, and you at least open the pathway towards larger ensembles of dies in a single package.

byrningman · Mar 4, 2026

I guess the logic behind the new Pro/Max CPU scheme is that there are few workloads that fully use more than 6 “Super” cores, and that those workloads that do are super parallel in a way that the new “performance” cores actually give you more compute per watt and/or die space? I’ve seen claims that the new “performance” cores (I prefer to think of them as “workhorse” cores) give 70% performance of the super cores. So presumably they do so while using less than 70% of the watts and/or transistors (ideally both).

Chris FOM · Mar 4, 2026

I wouldn’t be surprised if thermals also play a role. It’s not easily the get an actively cooled Arm Mac to throttle, but lighting up the whole chip is the one way to do it. The 14” MBP with a Max chip was particularly vulnerable to this under highly threaded workloads that could saturate the chip. Having a smaller number of cores that sacrifice efficiency for all-out single threaded/lightly threaded performance but then a large number of somewhat slower but much more tightly power optimized for sustained heavily multithreaded performance (but still much faster than the efficiency cores that put efficiency above all other considerations) makes some degree of sense. Apple’s had different clock speeds for their performance cores depending on how many are active for years, this just lets them further optimize that approach.

Bonusround · Mar 4, 2026

With M5 Apple have chosen to increase overall CPU core count and implement a third core variant. But whither symmetrical multithreading?
It is possible this new performance core implements SMT as a strategy of catering to wide, multithreaded workloads. Do we think this is likely?

El Capitano · Mar 4, 2026

Bonusround said:
It is possible this new performance core implements SMT as a strategy of catering to wide, multithreaded workloads. Do we think this is likely?

No.

wrylachlan · Mar 4, 2026

Bonusround said:
It is possible this new performance core implements SMT as a strategy of catering to wide, multithreaded workloads. Do we think this is likely?

No.

Bonusround · Mar 5, 2026

M5 series: super cores are 10-wide @ 4.6GHz, performance cores 7-wide at 4.4Ghz, efficiency cores 6-wide @ 3GHz.

nytta0 · Mar 5, 2026

Bonusround said:
M5 Pro/Max: super cores are 10-wide @ 4.6GHz, performance cores 7-wide at 4.4Ghz.

Interresting, thanks! And what was the M4 Pro/Max?

wrylachlan · Mar 5, 2026

Bonusround said:
M5 series: super cores are 10-wide @ 4.6GHz, performance cores 7-wide at 4.4Ghz, efficiency cores 6-wide @ 3GHz.

Is this your guess or has it been reported somewhere?

Bonusround · Mar 5, 2026

wrylachlan said:
Is this your guess or has it been reported somewhere?

As has been 'reported'. Were it personal speculation I would make that clear.

To be specific, this comes from Quinn Nelson's recent video where he cites the following from Baidu:

According to Quinn these folks on Baidu have been accurate for several chip generations now.

educated_foo · Mar 5, 2026

Bonusround said:
M5 series: super cores are 10-wide @ 4.6GHz

Ten-wide?! The branch prediction, speculative execution, and prefetch behind that must be insane.

Bonusround · Mar 5, 2026

educated_foo said:
Ten-wide?! The branch prediction, speculative execution, and prefetch behind that must be insane.

Yeah, it's pretty impressive. The big cores on M4 are also 10-wide and run at... I think 4.4 GHz, but check my work on that.

nytta0 · Mar 5, 2026

@Bonusround, did I miss some of your older posts on that? That's great info on the M5 cores, and I'd just like a comparison with the M4 Pro/Max. I can't seem to find the info elsewhere easily.

Bonusround · Mar 5, 2026

Just one post up.

nytta0 · Mar 5, 2026

Oh, thanks. And I didn't notice all the edits to the other posts.

And do you know about the M4 Pro/Max efficiency cores vs the new M5 Pro/Max performance cores?

Edit:
In any case, I guess I just need to wait a few days for reviews and measurments. There will be plenty of data and benchmarks then

Bonusround · Mar 5, 2026

nytta0 said:
Oh, thanks. And I didn't notice all the edits to the other posts.

And do you know about the M4 Pro/Max efficiency cores vs the new M5 Pro/Max performance cores?

I don't, but Howard at Eclectic Light can help:

Like P cores, E cores can be set to run at any of 5 values between the minimum of 1,020 MHz and maximum of 2,592 MHz (1.0-2.6 GHz). When running macOS, cluster frequency is set by macOS at a kernel level; other operating systems may offer more direct control. This range of frequencies is significantly narrower than that of E cores in the M3, which range between 744-2,748 MHz.

E cores idle at 1,020 MHz, and although they can be shut down altogether, that’s exceptional given the steady demand for macOS background threads to be run on them. Nevertheless, powermetrics still reports their ‘down’ residencies separately from idle residencies.

No idea on M4 e-core instruction window size.

Bonusround · Mar 6, 2026

M4: Performance = 10-wide @ 4.4GHz, Efficiency = 6-wide (est) @ 2.6GHz
M5: Super = 10-wide @ 4.6GHz, Performance = 7-wide @ 4.4GHz

M4 Max: 12P + 4E = (12*10*4.4 + 4*6*2.6) = 590.4
M5 Max: 6S + 12P = (6*10*4.6 + 12*7*4.4) = 645.6

Just some stupid math.

The Limey · May 23, 2026

An interesting article on interposer technology and how AI spend is driving investment in this technology. I can’t help but see consequences for us: https://www.chipstrat.com/p/advanced-packaging-intels-emib-vs

stevenkan · May 23, 2026

Bonusround said:
I don't, but Howard at Eclectic Light can help:

When running macOS, cluster frequency is set by macOS at a kernel level; other operating systems may offer more direct control.

No idea on M4 e-core instruction window size.

Is Asahi Linux running on these yet?

The Limey · 2026-06-02T20:23:15-0400

First die shots of the M5 and some initial analysis at the Anandtech forums: https://forums.anandtech.com/threads/apple-silicon-soc-thread.2587205/page-495

Bonusround · 2026-06-02T20:45:33-0400

Grabbed the full-res shots for those without an anandtech forum login.

M5 Pro:

dmsilev · 2026-06-02T21:04:51-0400

Nice!

The obvious next question is what does the Max GPU die look like. One could certainly imagine that the Max die is what TSMC actually fabs, 2x what we're seeing here for the Pro (with the second copy rotated 180 degrees), and then either cut in half to make two Pro chips or left intact to make 1 Max. If that's the case, then the Max has an "extra" UltraFusion port, which of course immediately suggests daisy-chaining a couple of the things together...

wrylachlan · 2026-06-02T21:51:24-0400

So I think this tells us a few things about the M5 Ultra.

The connection is face to face. And the existing CPU die has one connection side. So to get to 2X Max, either a) the Max GPU tile has 2 connection sides to allow a CPU<->GPU<->GPU<->CPU configuration, b) there’s a new ultra-specific tile with 2 connection faces allowing GPU<->CPU<->GPU or c) there’s a new massive GPU tile that allows for CPU<->GPU<->CPU.

At this point my money is on a new tile. GPU is probably easier to lay out than CPU. And since GPU is really the core asset of the Ultra, why stop at 80 GPU blocks? A dedicated GPU tile for the Ultra could be essentially as big as they want it up to the reticle limit - say 120 GPU blocks? 160?

Bonusround · 2026-06-02T21:59:21-0400

All the memory controllers are on the GPU die(s). With the GPU–CPU–GPU option that means some memory accesses would need to run across the CPU die, so I don't think b) is a likely option. My vote is for c).

dmsilev · 2026-06-02T22:52:04-0400

Another option might be C-G-G, one common CPU die and two Max GPUs.

ARMageddon

Ars Tribunus Militum

Ars Tribunus Angusticlavius

Ars Tribunus Militum

Senator

Ars Tribunus Militum

Ars Praefectus

Ars Legatus Legionis

Ars Tribunus Militum

Wise, Aged Ars Veteran

Ars Legatus Legionis

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Militum

Wise, Aged Ars Veteran

Ars Tribunus Militum

Wise, Aged Ars Veteran

Ars Tribunus Militum

Ars Tribunus Militum

Ars Tribunus Militum

Ars Legatus Legionis

Ars Tribunus Militum

Ars Tribunus Militum

Ars Tribunus Angusticlavius

Ars Legatus Legionis

Ars Tribunus Militum

Ars Tribunus Angusticlavius