All the tidbits we’ve gleaned about the Mac Studio, Studio Display, and M1 Ultra

Status
You're currently viewing only ERIFNOMI's posts. Click here to go back to viewing the entire thread.

ERIFNOMI

Ars Legatus Legionis
18,134
I am actually really pleasantly surprised by the Mac Studio. It is a lot of power in a small and comparatively inexpensive package. I think a lot of people will be buying these.

Obviously it would be better if it had upgradable RAM and SSD, but still, a powerful little package at a not crazy price.

I'm not sure I've got the numbers exactly right, but the 800GB/s of memory bandwidth that the M1 Ultra has would require 20 DDR5 DIMM slots.

There are benefits to integration, after all.

YMMV, of course, and not all use cases require that level of memory bandwidth.
The M1 Ultra should have a 1024b memory bus (the M1 Max is 512b, Ultra is two Maxes). If you could get DIMMs of LPDDR5, that would be 16 channels running at 6400MT/s.
 
Upvote
13 (13 / 0)

ERIFNOMI

Ars Legatus Legionis
18,134
...
While user serviceable SSD wouldn't really change the power equation, the RAM being on package is part of how these chips are so potent. User addressable RAM slots would significantly constrict the machine performance. ...

There's nothing preventing Apple from having some on-package RAM and also providing expansion slots for more RAM.

If the RAM in the slots is slower, that's fine. NUMA systems are already well-understood.

You say this as if it would be trivial for them to completely redesign their memory model to support multiple tiers of RAM. OK...
Memory access across the M1 Max is already not entirely homogeneous. I imagine on the Ultra it'll be even "worse." The cores are in clusters that effectively share memory bandwidth. A single core can't suck down a full 400GB/s (on the Max) and the CPU alone can't hit that mark either, even with all cores chugging along. That bandwidth is there for the GPU and CPU. The link to the cores saturate before the memory controllers do.

I wouldn't be too worried about memory that's not on the package. People are putting way too much of the M1 family's performance on the RAM being physically close to the CPU. There is definitely an advantage there, but it isn't responsible for all of the performance of this SoC.
 
Upvote
36 (38 / -2)

ERIFNOMI

Ars Legatus Legionis
18,134
While user serviceable SSD wouldn't really change the power equation, the RAM being on package is part of how these chips are so potent. User addressable RAM slots would significantly constrict the machine performance.

And it would go against the entire ethos of these M1 chips. The whole point is that the RAM is UNIFIED memory. It's an order of magnitude faster than standard RAM sticks, and can be accessed by the GPU and CPU simultaneously without any copying.

Adding RAM sticks would introduce a second tier of memory that would have to sit below the unified memory, so it would pretty much just act as a CPU-only high speed buffer before hitting the SSD. 260-pin DDR4 is only about 20 GB/s, so 1/20th the speed of the unified (worse because of the copy penalty), and only ~3x as fast as the SSD. Just not worth it other than for cost savings (which apple has never cared about).
It's not running an order of magnitude faster than RAM anywhere else. It's 6400MT/s which is spec for LPDDR5 and is achievable with very good DDR5 which is still pretty new. It's about twice as fast as DDR4.

That 800GBps number is because of the absolutely massive width from combining all the memory controllers. Comparing a 1024b wide memory bus to a single DDR4 DIMM of 64b might as well be lying.
 
Upvote
0 (7 / -7)

ERIFNOMI

Ars Legatus Legionis
18,134
While user serviceable SSD wouldn't really change the power equation, the RAM being on package is part of how these chips are so potent. User addressable RAM slots would significantly constrict the machine performance.

And it would go against the entire ethos of these M1 chips. The whole point is that the RAM is UNIFIED memory. It's an order of magnitude faster than standard RAM sticks, and can be accessed by the GPU and CPU simultaneously without any copying.

Adding RAM sticks would introduce a second tier of memory that would have to sit below the unified memory, so it would pretty much just act as a CPU-only high speed buffer before hitting the SSD. 260-pin DDR4 is only about 20 GB/s, so 1/20th the speed of the unified (worse because of the copy penalty), and only ~3x as fast as the SSD. Just not worth it other than for cost savings (which apple has never cared about).
It's not running an order of magnitude faster than RAM anywhere else. It's 6400MT/s which is spec for LPDDR5 and is achievable with very good DDR5 which is still pretty new. It's about twice as fast as DDR4.

That 800GBps number is because of the absolutely massive width from combining all the memory controllers. Comparing a 1024b wide memory bus to a single DDR4 DIMM of 64b might as well be lying.

It seems to me like you're splitting hairs. The fact is that the M-series chips have memory speeds much closer to GPU memory than traditional RAM sticks. Yes, the CPU itself can't fully saturate the bus, but real world workloads readily can. It also has the benefit of mutual access by the CPU and GPU, something a standard RAM stick doesn't have.
Each individual channel is not significantly faster than DDR5 and is normal for fast LPDDR5. What makes those huge bandwidth numbers is the wide bus. It's fast because there are 16 channels. The same would be true of a CPU with socketed memory if it had 16 channels.

There's no reason socketed memory couldn't be unified. It's unified because the CPU and the GPU share the same memory controller and the system is designed to unify the memory. They could just as easily say the last 8GB is GPU only and treat it like separate memory. Living on a DIMM or on some interposer next to the CPU doesn't change that.
 
Upvote
2 (4 / -2)
Post content hidden for low score. Show…

ERIFNOMI

Ars Legatus Legionis
18,134
...
Asking because I have no idea about memory systems: wouldn't a competent memory manager just move anything that's latency-constrained up to the on-package memory, anyhow?

A memory manager would put frequently-accessed pages in the on-package memory.

You would still want minimum latency when accessing pages from the slotted memory.

OK. Follow up if you don't mind: what sort of workloads would require low-latency access to a sufficiently large pool of data, that moving pages in and out of on-package RAM becomes a significant enough performance constraint that there would be a real-world impact vs swapping to a fast SSD? Are these workloads common for the intended market of this machine?

Also why add to extra cost, pin connections out of package etc. and the big one... the increase in power requirements to drive those pins, the board lanes, the DIMM slots, and the DIMMs themselves?

Now maybe just maybe in a Mac Pro you would pay the price in cost and power to do that but in these system little obvious reasons exist when you already have such substantial on package memory available (working with very low latency and lower power connections).
The Mac Studio starts at $2k. The cheapest one with the M1 Ultra is $4k. Maxed out it's $8k. I think this market would be able to handle the cost of some DIMMs.

That’s true, but for the work this is expected to do, 128GB is plenty. For the upcoming Mac Pro? Well, that will be a very interesting bit of speculation. I believe it will be based on the upcoming M2, at least, I hope so. If so, it’s the second generation of the base technology. Possibly Apple has been able to double the memory bandwidth either with double the lines, or double the bandwidth per line. Either is certainly possible.they could double the amount of RAM per chip. So begin with 16GB rather than 8GB, and double after that to 256 per Ultra equivalent. Maybe the Mac Pro would have two, or even four of those. How many GB is really needed at the top end? Would 1GB be enough?
Apple didn't invent some new fancy RAM here. They're using standard LPDDR5. They're not going to double that clock. And they're already using a 1024b bus. That's where 800Gbps comes from (16 channels, 64b per channel, 6400MT/s). They might make the bus even wider, but it's pretty ludicrously large already. If they make it wider, it would have to come with a whole hell of a lot of cores, both CPU and GPU, to actually take advantage of it. A single core cluster is already limited to somewhere around 200GBps. But if you have enough die space for a bigger bus, it's because you smashed even more of these smaller dies onto an interposer, so they're going to get that bus width for free with the added cores. Unless they need more for interconnects when they go to more than 2 dies.
 
Upvote
-1 (3 / -4)

ERIFNOMI

Ars Legatus Legionis
18,134
The Mac Studio starts at $2k. The cheapest one with the M1 Ultra is $4k. Maxed out it's $8k. I think this market would be able to handle the cost of some DIMMs.

That’s true, but for the work this is expected to do, 128GB is plenty. For the upcoming Mac Pro? Well, that will be a very interesting bit of speculation. I believe it will be based on the upcoming M2, at least, I hope so. If so, it’s the second generation of the base technology. Possibly Apple has been able to double the memory bandwidth either with double the lines, or double the bandwidth per line. Either is certainly possible.they could double the amount of RAM per chip. So begin with 16GB rather than 8GB, and double after that to 256 per Ultra equivalent. Maybe the Mac Pro would have two, or even four of those. How many GB is really needed at the top end? Would 1GB be enough?
Apple didn't invent some new fancy RAM here. They're using standard LPDDR5. They're not going to double that clock. And they're already using a 1024b bus. That's where 800Gbps comes from (16 channels, 64b per channel, 6400MT/s). They might make the bus even wider, but it's pretty ludicrously large already. If they make it wider, it would have to come with a whole hell of a lot of cores, both CPU and GPU, to actually take advantage of it. A single core cluster is already limited to somewhere around 200GBps. But if you have enough die space for a bigger bus, it's because you smashed even more of these smaller dies onto an interposer, so they're going to get that bus width for free with the added cores. Unless they need more for interconnects when they go to more than 2 dies.

He was talking about the memory bus capacity, it doesn't -need- to imply the need to double the clock rate of the memory itself, it does imply potentially hardware extra to the memory that allows two LPDDR5 to multiplexed on to lanes running at higher rate.

Anyway I suspect it is more likely Apple would go with higher density LPDDR5 modules instead of increasing lanes/rate, since I don't suspect we will see enough of a boost in performance in the M2 to need an increase in memory bandwidth.

Still the need for DIMMs and memory off package is really fairly fringe in workflows given the current and likely future memory available on package.... fringe enough I am not sure we will see it happening. ..again the Mac Pro may be the special guy that gets it, for a nontrivial price for those that really need that fringe case supported on macOS device.
They said "double the memory bandwidth."

I don't doubt Apple will never go back to socketed memory. Why would they? They can sell you memory at outrageous prices and if you underbuy the first time around, your only option is to pony up and buy a second system to upgrade.
 
Upvote
-15 (4 / -19)

ERIFNOMI

Ars Legatus Legionis
18,134
The Mac Studio starts at $2k. The cheapest one with the M1 Ultra is $4k. Maxed out it's $8k. I think this market would be able to handle the cost of some DIMMs.

That’s true, but for the work this is expected to do, 128GB is plenty. For the upcoming Mac Pro? Well, that will be a very interesting bit of speculation. I believe it will be based on the upcoming M2, at least, I hope so. If so, it’s the second generation of the base technology. Possibly Apple has been able to double the memory bandwidth either with double the lines, or double the bandwidth per line. Either is certainly possible.they could double the amount of RAM per chip. So begin with 16GB rather than 8GB, and double after that to 256 per Ultra equivalent. Maybe the Mac Pro would have two, or even four of those. How many GB is really needed at the top end? Would 1GB be enough?
Apple didn't invent some new fancy RAM here. They're using standard LPDDR5. They're not going to double that clock. And they're already using a 1024b bus. That's where 800Gbps comes from (16 channels, 64b per channel, 6400MT/s). They might make the bus even wider, but it's pretty ludicrously large already. If they make it wider, it would have to come with a whole hell of a lot of cores, both CPU and GPU, to actually take advantage of it. A single core cluster is already limited to somewhere around 200GBps. But if you have enough die space for a bigger bus, it's because you smashed even more of these smaller dies onto an interposer, so they're going to get that bus width for free with the added cores. Unless they need more for interconnects when they go to more than 2 dies.

He was talking about the memory bus capacity, it doesn't -need- to imply the need to double the clock rate of the memory itself, it does imply potentially hardware extra to the memory that allows two LPDDR5 to multiplexed on to lanes running at higher rate.

Anyway I suspect it is more likely Apple would go with higher density LPDDR5 modules instead of increasing lanes/rate, since I don't suspect we will see enough of a boost in performance in the M2 to need an increase in memory bandwidth.

Still the need for DIMMs and memory off package is really fairly fringe in workflows given the current and likely future memory available on package.... fringe enough I am not sure we will see it happening. ..again the Mac Pro may be the special guy that gets it, for a nontrivial price for those that really need that fringe case supported on macOS device.
They said "double the memory bandwidth."

I don't doubt Apple will never go back to socketed memory. Why would they? They can sell you memory at outrageous prices and if you underbuy the first time around, your only option is to pony up and buy a second system to upgrade.

Yeah double the bandwidth and that could be done by more lanes to more LPDDR5 modules or by faster lanes to hardware that connects to more LPDDR5 modules. I think the former is more likely then the later but again as you (and noted) increased memory bandwidth is not likely needed in the M2 timeframe unless something surprising happens with compute capabilities.

Anyway are Apple's prices for LPDDR5 crazy? It look not that far out of inline with market prices for top end memory and then factor in supply chain issues doing fun things with prices, etc.
I literally said either wider bus or faster clock. Those are your two options (or some combo) to double your bandwidth.

Apple has historically charged out the ass for memory and storage, even when you had the option of upgrading yourself. LPDDR5 doesn't come in a socketable form, so comparing prices directly is hard. Right now they're probably close to DDR5 prices of the same speed because DDR5 is brand spanking new and prices for everything are fucked. We'll see how prices look overtime.

Their storage prices are still pretty ridiculous though.
 
Upvote
-7 (1 / -8)
Post content hidden for low score. Show…
Status
You're currently viewing only ERIFNOMI's posts. Click here to go back to viewing the entire thread.