The M1 Ultra should have a 1024b memory bus (the M1 Max is 512b, Ultra is two Maxes). If you could get DIMMs of LPDDR5, that would be 16 channels running at 6400MT/s.I am actually really pleasantly surprised by the Mac Studio. It is a lot of power in a small and comparatively inexpensive package. I think a lot of people will be buying these.
Obviously it would be better if it had upgradable RAM and SSD, but still, a powerful little package at a not crazy price.
I'm not sure I've got the numbers exactly right, but the 800GB/s of memory bandwidth that the M1 Ultra has would require 20 DDR5 DIMM slots.
There are benefits to integration, after all.
YMMV, of course, and not all use cases require that level of memory bandwidth.
Memory access across the M1 Max is already not entirely homogeneous. I imagine on the Ultra it'll be even "worse." The cores are in clusters that effectively share memory bandwidth. A single core can't suck down a full 400GB/s (on the Max) and the CPU alone can't hit that mark either, even with all cores chugging along. That bandwidth is there for the GPU and CPU. The link to the cores saturate before the memory controllers do....
While user serviceable SSD wouldn't really change the power equation, the RAM being on package is part of how these chips are so potent. User addressable RAM slots would significantly constrict the machine performance. ...
There's nothing preventing Apple from having some on-package RAM and also providing expansion slots for more RAM.
If the RAM in the slots is slower, that's fine. NUMA systems are already well-understood.
You say this as if it would be trivial for them to completely redesign their memory model to support multiple tiers of RAM. OK...
It's not running an order of magnitude faster than RAM anywhere else. It's 6400MT/s which is spec for LPDDR5 and is achievable with very good DDR5 which is still pretty new. It's about twice as fast as DDR4.While user serviceable SSD wouldn't really change the power equation, the RAM being on package is part of how these chips are so potent. User addressable RAM slots would significantly constrict the machine performance.
And it would go against the entire ethos of these M1 chips. The whole point is that the RAM is UNIFIED memory. It's an order of magnitude faster than standard RAM sticks, and can be accessed by the GPU and CPU simultaneously without any copying.
Adding RAM sticks would introduce a second tier of memory that would have to sit below the unified memory, so it would pretty much just act as a CPU-only high speed buffer before hitting the SSD. 260-pin DDR4 is only about 20 GB/s, so 1/20th the speed of the unified (worse because of the copy penalty), and only ~3x as fast as the SSD. Just not worth it other than for cost savings (which apple has never cared about).
Each individual channel is not significantly faster than DDR5 and is normal for fast LPDDR5. What makes those huge bandwidth numbers is the wide bus. It's fast because there are 16 channels. The same would be true of a CPU with socketed memory if it had 16 channels.It's not running an order of magnitude faster than RAM anywhere else. It's 6400MT/s which is spec for LPDDR5 and is achievable with very good DDR5 which is still pretty new. It's about twice as fast as DDR4.While user serviceable SSD wouldn't really change the power equation, the RAM being on package is part of how these chips are so potent. User addressable RAM slots would significantly constrict the machine performance.
And it would go against the entire ethos of these M1 chips. The whole point is that the RAM is UNIFIED memory. It's an order of magnitude faster than standard RAM sticks, and can be accessed by the GPU and CPU simultaneously without any copying.
Adding RAM sticks would introduce a second tier of memory that would have to sit below the unified memory, so it would pretty much just act as a CPU-only high speed buffer before hitting the SSD. 260-pin DDR4 is only about 20 GB/s, so 1/20th the speed of the unified (worse because of the copy penalty), and only ~3x as fast as the SSD. Just not worth it other than for cost savings (which apple has never cared about).
That 800GBps number is because of the absolutely massive width from combining all the memory controllers. Comparing a 1024b wide memory bus to a single DDR4 DIMM of 64b might as well be lying.
It seems to me like you're splitting hairs. The fact is that the M-series chips have memory speeds much closer to GPU memory than traditional RAM sticks. Yes, the CPU itself can't fully saturate the bus, but real world workloads readily can. It also has the benefit of mutual access by the CPU and GPU, something a standard RAM stick doesn't have.
Apple didn't invent some new fancy RAM here. They're using standard LPDDR5. They're not going to double that clock. And they're already using a 1024b bus. That's where 800Gbps comes from (16 channels, 64b per channel, 6400MT/s). They might make the bus even wider, but it's pretty ludicrously large already. If they make it wider, it would have to come with a whole hell of a lot of cores, both CPU and GPU, to actually take advantage of it. A single core cluster is already limited to somewhere around 200GBps. But if you have enough die space for a bigger bus, it's because you smashed even more of these smaller dies onto an interposer, so they're going to get that bus width for free with the added cores. Unless they need more for interconnects when they go to more than 2 dies.The Mac Studio starts at $2k. The cheapest one with the M1 Ultra is $4k. Maxed out it's $8k. I think this market would be able to handle the cost of some DIMMs....
Asking because I have no idea about memory systems: wouldn't a competent memory manager just move anything that's latency-constrained up to the on-package memory, anyhow?
A memory manager would put frequently-accessed pages in the on-package memory.
You would still want minimum latency when accessing pages from the slotted memory.
OK. Follow up if you don't mind: what sort of workloads would require low-latency access to a sufficiently large pool of data, that moving pages in and out of on-package RAM becomes a significant enough performance constraint that there would be a real-world impact vs swapping to a fast SSD? Are these workloads common for the intended market of this machine?
Also why add to extra cost, pin connections out of package etc. and the big one... the increase in power requirements to drive those pins, the board lanes, the DIMM slots, and the DIMMs themselves?
Now maybe just maybe in a Mac Pro you would pay the price in cost and power to do that but in these system little obvious reasons exist when you already have such substantial on package memory available (working with very low latency and lower power connections).
That’s true, but for the work this is expected to do, 128GB is plenty. For the upcoming Mac Pro? Well, that will be a very interesting bit of speculation. I believe it will be based on the upcoming M2, at least, I hope so. If so, it’s the second generation of the base technology. Possibly Apple has been able to double the memory bandwidth either with double the lines, or double the bandwidth per line. Either is certainly possible.they could double the amount of RAM per chip. So begin with 16GB rather than 8GB, and double after that to 256 per Ultra equivalent. Maybe the Mac Pro would have two, or even four of those. How many GB is really needed at the top end? Would 1GB be enough?
They said "double the memory bandwidth."Apple didn't invent some new fancy RAM here. They're using standard LPDDR5. They're not going to double that clock. And they're already using a 1024b bus. That's where 800Gbps comes from (16 channels, 64b per channel, 6400MT/s). They might make the bus even wider, but it's pretty ludicrously large already. If they make it wider, it would have to come with a whole hell of a lot of cores, both CPU and GPU, to actually take advantage of it. A single core cluster is already limited to somewhere around 200GBps. But if you have enough die space for a bigger bus, it's because you smashed even more of these smaller dies onto an interposer, so they're going to get that bus width for free with the added cores. Unless they need more for interconnects when they go to more than 2 dies.The Mac Studio starts at $2k. The cheapest one with the M1 Ultra is $4k. Maxed out it's $8k. I think this market would be able to handle the cost of some DIMMs.
That’s true, but for the work this is expected to do, 128GB is plenty. For the upcoming Mac Pro? Well, that will be a very interesting bit of speculation. I believe it will be based on the upcoming M2, at least, I hope so. If so, it’s the second generation of the base technology. Possibly Apple has been able to double the memory bandwidth either with double the lines, or double the bandwidth per line. Either is certainly possible.they could double the amount of RAM per chip. So begin with 16GB rather than 8GB, and double after that to 256 per Ultra equivalent. Maybe the Mac Pro would have two, or even four of those. How many GB is really needed at the top end? Would 1GB be enough?
He was talking about the memory bus capacity, it doesn't -need- to imply the need to double the clock rate of the memory itself, it does imply potentially hardware extra to the memory that allows two LPDDR5 to multiplexed on to lanes running at higher rate.
Anyway I suspect it is more likely Apple would go with higher density LPDDR5 modules instead of increasing lanes/rate, since I don't suspect we will see enough of a boost in performance in the M2 to need an increase in memory bandwidth.
Still the need for DIMMs and memory off package is really fairly fringe in workflows given the current and likely future memory available on package.... fringe enough I am not sure we will see it happening. ..again the Mac Pro may be the special guy that gets it, for a nontrivial price for those that really need that fringe case supported on macOS device.
I literally said either wider bus or faster clock. Those are your two options (or some combo) to double your bandwidth.They said "double the memory bandwidth."Apple didn't invent some new fancy RAM here. They're using standard LPDDR5. They're not going to double that clock. And they're already using a 1024b bus. That's where 800Gbps comes from (16 channels, 64b per channel, 6400MT/s). They might make the bus even wider, but it's pretty ludicrously large already. If they make it wider, it would have to come with a whole hell of a lot of cores, both CPU and GPU, to actually take advantage of it. A single core cluster is already limited to somewhere around 200GBps. But if you have enough die space for a bigger bus, it's because you smashed even more of these smaller dies onto an interposer, so they're going to get that bus width for free with the added cores. Unless they need more for interconnects when they go to more than 2 dies.The Mac Studio starts at $2k. The cheapest one with the M1 Ultra is $4k. Maxed out it's $8k. I think this market would be able to handle the cost of some DIMMs.
That’s true, but for the work this is expected to do, 128GB is plenty. For the upcoming Mac Pro? Well, that will be a very interesting bit of speculation. I believe it will be based on the upcoming M2, at least, I hope so. If so, it’s the second generation of the base technology. Possibly Apple has been able to double the memory bandwidth either with double the lines, or double the bandwidth per line. Either is certainly possible.they could double the amount of RAM per chip. So begin with 16GB rather than 8GB, and double after that to 256 per Ultra equivalent. Maybe the Mac Pro would have two, or even four of those. How many GB is really needed at the top end? Would 1GB be enough?
He was talking about the memory bus capacity, it doesn't -need- to imply the need to double the clock rate of the memory itself, it does imply potentially hardware extra to the memory that allows two LPDDR5 to multiplexed on to lanes running at higher rate.
Anyway I suspect it is more likely Apple would go with higher density LPDDR5 modules instead of increasing lanes/rate, since I don't suspect we will see enough of a boost in performance in the M2 to need an increase in memory bandwidth.
Still the need for DIMMs and memory off package is really fairly fringe in workflows given the current and likely future memory available on package.... fringe enough I am not sure we will see it happening. ..again the Mac Pro may be the special guy that gets it, for a nontrivial price for those that really need that fringe case supported on macOS device.
I don't doubt Apple will never go back to socketed memory. Why would they? They can sell you memory at outrageous prices and if you underbuy the first time around, your only option is to pony up and buy a second system to upgrade.
Yeah double the bandwidth and that could be done by more lanes to more LPDDR5 modules or by faster lanes to hardware that connects to more LPDDR5 modules. I think the former is more likely then the later but again as you (and noted) increased memory bandwidth is not likely needed in the M2 timeframe unless something surprising happens with compute capabilities.
Anyway are Apple's prices for LPDDR5 crazy? It look not that far out of inline with market prices for top end memory and then factor in supply chain issues doing fun things with prices, etc.