dmsilev

Ars Tribunus Angusticlavius
7,434
Subscriptor
The individual dies are smaller, which I'm sure helps yields, but the total area is probably larger than what it would be for a monolithic chip made on the same process. You still need all of the same functional blocks, and you have to add whatever logic and external interfacing is needed to drive the inter-die communications. Also, now every Pro and Max chip needs one of those silicon interposer things, rather than just the Ultras.

Advantages are that you can make a 2-chip module whose total size is bigger than the single-reticule limit, you have more flexibility in binning chips and managing yields, and you at least open the pathway towards larger ensembles of dies in a single package.
 
  • Like
Reactions: Bonusround

byrningman

Ars Tribunus Militum
2,255
Subscriptor
I guess the logic behind the new Pro/Max CPU scheme is that there are few workloads that fully use more than 6 “Super” cores, and that those workloads that do are super parallel in a way that the new “performance” cores actually give you more compute per watt and/or die space? I’ve seen claims that the new “performance” cores (I prefer to think of them as “workhorse” cores) give 70% performance of the super cores. So presumably they do so while using less than 70% of the watts and/or transistors (ideally both).
 
I wouldn’t be surprised if thermals also play a role. It’s not easily the get an actively cooled Arm Mac to throttle, but lighting up the whole chip is the one way to do it. The 14” MBP with a Max chip was particularly vulnerable to this under highly threaded workloads that could saturate the chip. Having a smaller number of cores that sacrifice efficiency for all-out single threaded/lightly threaded performance but then a large number of somewhat slower but much more tightly power optimized for sustained heavily multithreaded performance (but still much faster than the efficiency cores that put efficiency above all other considerations) makes some degree of sense. Apple’s had different clock speeds for their performance cores depending on how many are active for years, this just lets them further optimize that approach.
 

Bonusround

Ars Tribunus Militum
2,951
Subscriptor
Is this your guess or has it been reported somewhere?
As has been 'reported'. Were it personal speculation I would make that clear.

To be specific, this comes from Quinn Nelson's recent video where he cites the following from Baidu:

Screenshot 2026-03-05 135615.png


According to Quinn these folks on Baidu have been accurate for several chip generations now.
 
Last edited:

Bonusround

Ars Tribunus Militum
2,951
Subscriptor
Oh, thanks. And I didn't notice all the edits to the other posts.

And do you know about the M4 Pro/Max efficiency cores vs the new M5 Pro/Max performance cores?

I don't, but Howard at Eclectic Light can help:
Like P cores, E cores can be set to run at any of 5 values between the minimum of 1,020 MHz and maximum of 2,592 MHz (1.0-2.6 GHz). When running macOS, cluster frequency is set by macOS at a kernel level; other operating systems may offer more direct control. This range of frequencies is significantly narrower than that of E cores in the M3, which range between 744-2,748 MHz.

E cores idle at 1,020 MHz, and although they can be shut down altogether, that’s exceptional given the steady demand for macOS background threads to be run on them. Nevertheless, powermetrics still reports their ‘down’ residencies separately from idle residencies.

No idea on M4 e-core instruction window size.
 

dmsilev

Ars Tribunus Angusticlavius
7,434
Subscriptor
Nice!

The obvious next question is what does the Max GPU die look like. One could certainly imagine that the Max die is what TSMC actually fabs, 2x what we're seeing here for the Pro (with the second copy rotated 180 degrees), and then either cut in half to make two Pro chips or left intact to make 1 Max. If that's the case, then the Max has an "extra" UltraFusion port, which of course immediately suggests daisy-chaining a couple of the things together...
 

wrylachlan

Ars Legatus Legionis
15,058
Subscriptor
So I think this tells us a few things about the M5 Ultra.

The connection is face to face. And the existing CPU die has one connection side. So to get to 2X Max, either a) the Max GPU tile has 2 connection sides to allow a CPU<->GPU<->GPU<->CPU configuration, b) there’s a new ultra-specific tile with 2 connection faces allowing GPU<->CPU<->GPU or c) there’s a new massive GPU tile that allows for CPU<->GPU<->CPU.

At this point my money is on a new tile. GPU is probably easier to lay out than CPU. And since GPU is really the core asset of the Ultra, why stop at 80 GPU blocks? A dedicated GPU tile for the Ultra could be essentially as big as they want it up to the reticle limit - say 120 GPU blocks? 160?