Given the recent MS new that they banned Microslop on their Discord then shut it down. We need a simliar rebrand of Copilot to express our disdain.I've yet to meet someone who actually wants a Copilot+ system. Like, I guess in theory they must exist somewhere. But I haven't seen one in person.
I'm not sold on the use case for an NPU in a desktop. For a mobile device without a discrete GPU I can see it being useful for running AI functions that don't require much performance, but for a desktop only the very bottom tier of desktops are likely to lack a discrete GPU for most consumers.
Last I heard NPUs are FAR weaker at AI tasks than pretty much any GPU. So the only use case I could see would be running a local AI model that requires more memory than the VRAM on your GPU. But for a local model with that sort of demand, will an NPU have enough performance to even be relevant? Can I run that 64GB local model that my 5090 can't run due to memory constraints on my NPU if I have 64GB of system ram? I tend to doubt it. (I don't have a 5090, just making my point).
So from my perspective either an NPU can run a bigger model than a GPU if you have enough system ram in which case it's potentially relevant or it can't in which case it's a waste of silicon that I would rather not pay for.
Are they planning to continue selling desktop CPUs without an NPU? If so then ok non issue. If not then we're looking at AMD raising prices for wasted silicon.
Yeah I get the feeling that this is almost entirely hype based. AI is all the rage right now so this lets AMD do a press release that basically says see we're selling more AI related thingies. Yay. It's fine if you want to make money on AI but if that's your goal you probably need to convince more hyperscalers to fork over the big money for your data center GPUs. So far they're making some progress on that front, but a lot less than I think most of their investors were expecting.The NPU is weaker, but supposedly far more efficient for the imagined use case. The idea is that an integrated NPU is able to perform a relative (basic) LLM/inference -related task more efficiently than the usual "APU" design of CPU + iGPU. On paper, sure, this checks out. In reality... I think there is little utility for the average consumer or business in the imagined scenario of accelerating compute for a small local model efficiently.
Why is it useless? Well, the tiny local models that could make sense would have to be highly-tailored for any real utility at a size that can fit (like an open 7B model or less). A story by the Economist last month covered this and explained (with survey data) that only a small number of workers use "AI" daily and that the vast majority of these users only use LLMs via cloud providers and as a glorified search / reference system. The tiny local models which fit within 4-16gb are terrible for this particular task - few are web-search/agent enabled, plus small models lack context and hallucinate more.
So the stated goal of power efficiency for a local model is all but pointless when the average model that might actually provide utility is larger than the device can realisticlaly run. Generally speaking, I think you are correct in that NPUs are wasted silicon. This funcitionality would make more sense baked into a power-hungry iGPU on all of these processors - even if that scenario means less efficiency for the rare user that actually needs the NPU functionality. At least the silicon would be more likely to be utilized in that scenario.
Somebody, somewhere, in the marketing equivalent of the Skunkworks is desperately trying to think up the next buzzword / phrase so they can be ahead of the curve.Can't wait for the day marketing stops using the 'AI' moniker...
Techpowerup said:
To carve out the Ryzen AI 7 450 series processor models, AMD configured the silicon with 4 "Zen 5" and 4 "Zen 5c" cores. The Ryzen AI 5 440 series chips are configured with 3 "Zen 5" and 3 "Zen 5c" cores. The Ryzen AI 5 435 series chips come with 2 "Zen 5" and 4 "Zen 5c" cores
So they sacrificed a ton of potentially useful (meaning, full Zen 5 cores, and/or GPU compute units to match or exceed the 8700G [edit: and/or an increase in L3 over 5000G, ffs]) die area for that fucking NPU.According to techpowerup AMD are announcing non-PRO AI versions as well.
View attachment 129471
and also (from a different page):
The Kook-Aid has been flowing among the corporate tech world, but if you peel off the layer of hype, the hysteria is not far below it.Can't wait for the day marketing stops using the 'AI' moniker...
Click to Do has been pretty useful. Similar to Google's circle to search.
I think NPUs will be useful for offloading "AI" OCR, voice recognition, and video processing. Not every use of "AI" needs to be LLM chatbots, and Microsoft could integrate things like better OCR into search and provide actually useful improvements.
Given the recent MS new that they banned Microslop on their Discord then shut it down. We need a simliar rebrand of Copilot to express our disdain.
I vote CoPlop from Microslop to really drive the knife in.
AMD missed the boat when they didn't brand these RAISIN processors.Somebody, somewhere, in the marketing equivalent of the Skunkworks is desperately trying to think up the next buzzword / phrase so they can be ahead of the curve.
That should be straightforward - just install Linux.I have a copilot plus ryzen 350. I'm still trying to figure out how to disable all of the bullshit from copilot
The singular use case that has impressed me is generative fill in photo editing tools. The fill tool is a lot faster and more accurate on Paint running on my ARM Surface laptop than Photoshop CS 6 ever was running on my x86 desktop. I believe the generative fill feature of Paint is only available if you have an NPU in your CPU.Since it looks like we're going to be saddled with NPUs in our hardware from now on, is there anything useful (i.e. NOT AI) that they can be used for?
Apple's been using them for loads of stuff since they first added them to the iPhone back in 2017. Face ID was the first main use, but for instance Metal 4 uses the NPU to do DLSS instead of having the GPU do it - since you probably aren't using the NPU to do much anyway when playing a game. But x86 won't be able to do that since you have the GPU and NPU on opposite sides of a slow PCI bus and then have to ship the output back across it because the monitor is back on the other side again.Since it looks like we're going to be saddled with NPUs in our hardware from now on, is there anything useful (i.e. NOT AI) that they can be used for?
How much die space is wasted?Waste of die space.
Err, why not use the NPU for other ML stuff? Why are we discussing LLMs here? Running an LLM on an NPU is kinda dumb for all the reasons that you explained so well. So why even consider it.The NPU is weaker, but supposedly far more efficient for the imagined use case. The idea is that an integrated NPU is able to perform a relative (basic) LLM/inference ...
If they really wanted to ramp up AI use for local models on desktops and laptops, what they really need to do IMO is develop a new memory interface connecting to both the CPU and discrete GPU that allows for a unified memory architecture. So your 5090 would come with zero memory and would just use whatever system ram you have. That might be 16GB for a budget system or it might be 256GB for a high end system. THAT would allow running local LLM models big enough to actually be good even on lower performance GPUs. The performance of your GPU would basically determine how fast you got your result while the amount of memory would determine which model you could run and those 2 things would be unrelated to each other. I don't really see a situation where an NPU is relevant for an AI model big enough to give good results. Can it run copilot? Probably. Is Copilot worth using? Not from what I've noticed.Err, why not use the NPU for other ML stuff? Why are we discussing LLMs here? Running an LLM on an NPU is kinda dumb for all the reasons that you explained so well. So why even consider it.
People seem to hate this line of reasoning, but the only upgrade I could see to justify a PS6 over the 5 is, ironically, a beefy ass NPU.I mean there could be also lots of useful AI. The issue is really the software stack, how much RAM these NPUs can access, and how fast that RAM is. I would really like to see better AI in games to make NPCs more believable, or just AI enemies ...be better.
For Christ's sake, AoE IV's AI is NOT that much better than AoE 1's... just play a map with a puddle and it will build the Spanish Armada in it. Is MS was serious about pushing NPUs, I bet there would be a lot they could do in their own games to make good use of them.
Instead, they decided to take screenshots of my password and credit card details. I guess that... I will avoid NPUs then?![]()
I'm still waiting for the i- prefix to go away. It's been over 20 years. I'm still waiting.Can't wait for the day marketing stops using the 'AI' moniker...
If it's similar to Strix Point, it looks roughly comparable to about 3.5x full-fat Zen 5 cores, or 4x RDNA 3.5 WGPs (so 8X CUs), or 16MB of L3 cache.How much die space is wasted?
That's literally what Apple did. And MLX will split up tasks between GPU and NPU depending on which is more suitable or use both. NPUs are still more efficient at those kinds of things computationally, they just generally don't have the memory bandwidth that the GPU has. Fix that bandwidth problem, and the NPU will be pretty clearly better. Apple CPUs are dual channel for the base, 4x for Pro, 8x for Max, and 16x for Ultra with all cores having equal access. That's why they have the option to use the NPU for DLSS, since the GPU just passes it a pointer, and the NPU just passes one back. No need to copy over a PCI bus.If they really wanted to ramp up AI use for local models on desktops and laptops, what they really need to do IMO is develop a new memory interface connecting to both the CPU and discrete GPU that allows for a unified memory architecture. So your 5090 would come with zero memory and would just use whatever system ram you have. That might be 16GB for a budget system or it might be 256GB for a high end system. THAT would allow running local LLM models big enough to actually be good even on lower performance GPUs. The performance of your GPU would basically determine how fast you got your result while the amount of memory would determine which model you could run and those 2 things would be unrelated to each other. I don't really see a situation where an NPU is relevant for an AI model big enough to give good results.
That's not a problem with NPU RAM access or speed. It's a function of game developers either not wanting to spend the time to develop an AI model trained on the game, or more likely, them not yet having time to do it. A AAA game has about a 6 year development cycle. The first public version of ChatGPT was less than 3 ½ years ago. You're not going to see the first AAA designed with proper AI in mind for another 3 years.I mean there could be also lots of useful AI. The issue is really the software stack, how much RAM these NPUs can access, and how fast that RAM is. I would really like to see better AI in games to make NPCs more believable, or just AI enemies ...be better.
This doesn’t sound like a fundamental problem with machine-learned game AI. Rther, the statement that it was performing “optimally” makes it sound like your acquaintance was simply training it against the wrong goal. After all, maximizing performance is rarely the ultimate objective for the computer player in any game; rather, it’s being just good enough to give the human player a challenge.I knew a guy that was experimenting with (what was then called) machine learning and games a decade ago. It independently adapted to changes in both player tactics and game settings. E.g. adjust a rifle's stats, and it would use it more or less depending on effectiveness.
The problem was that people don't actually want it. It behaved optimally, not realistically. The AI would learn to aggressively min-max their playstyle. Imagine a pro player that leaned hard on cheese, that was supernaturally able to time exploits. It was hard to tune it down too. Make it too dumb to figure out the exploit and it would also fail to understand what the player was doing and how to respond. It was computationally cheaper, and easier to tune, with conventional heuristic methods.
Models are larger these days, but I think the underlying issue remains the same: LLM's are plausible because people just see the end product. Interactive game ML/AI immerses people in how AI makes the proverbial sausage and the steps it takes to get to end will break the suspension of disbelief.
It's a big pool of RAM but on any consumer PC it's a slow pool of RAM. You're around 100GB/s, vs 1.5TB/s on the GPU - or higher. PCs have a couple of serious tradeoffs they need to solve here - either move the GPU on package to eliminate the PCI bottleneck, or increase memory bandwidth potentially at the loss of expandability. That's why Apple did both of them, allowing for a unified memory space, much higher memory bandwidth and no penalty to copying data from CPU to GPU. Now, it came at the expense of not being able to upgrade your GPU or RAM, but you have hardware that can run much larger models locally than you can on x86 (because the limited RAM on the GPU incurs such a cost to copy across PCI) because you have that big pool, and on something like a Mac Studio, you have 8x the memory bandwidth of these announced products.For the folks questioning why you'd want an NPU machine over a dedicated video card, it's all about that big pool of RAM to run much larger AI models locally. People aren't buying these things to game on, nor are they buying them for speed. Nor are they buying them for Copilot+, no matter what microsoft pays manufacturers to put on the outside of the box. Most of the people on this site are allergic to anything AI so this obv. doesn't appeal to them, but these NPU devices have been fantastic for running home compute without having to pay the Nvidia tax.
Yeah if they tweak things so that the NPU and GPU can work together then I can see an NPU being a value add. But not if it's either or. And yeah Apple is kind of the proof of concept. If they can do it then obviously it can be done. That would open up options for better customization of PCs, especially desktops where it's easier to mix and match parts. A gamer might want a 5090 and only bother with 32GB of system memory. Someone wanting to play around with AI on a shoestring budget might opt for a 5070 but spring for 128GB of unified memory allowing them to make use of larger models even if performance was a bit lower.That's literally what Apple did. And MLX will split up tasks between GPU and NPU depending on which is more suitable or use both. NPUs are still more efficient at those kinds of things computationally, they just generally don't have the memory bandwidth that the GPU has. Fix that bandwidth problem, and the NPU will be pretty clearly better. Apple CPUs are dual channel for the base, 4x for Pro, 8x for Max, and 16x for Ultra with all cores having equal access. That's why they have the option to use the NPU for DLSS, since the GPU just passes it a pointer, and the NPU just passes one back. No need to copy over a PCI bus.
I've yet to meet someone who actually wants a Copilot+ system. Like, I guess in theory they must exist somewhere. But I haven't seen one in person.
There's 2 problems with Copilot IMO.I've yet to meet someone who actually wants a Copilot+ system. Like, I guess in theory they must exist somewhere. But I haven't seen one in person.
Dammit I came to post this. Do I really have to be reading Ars at 5:30 am to not get ninja'd?
Once you start ramping up GPU compute power, you quickly reach the point where LLM inference is limited by memory bandwidth.If they really wanted to ramp up AI use for local models on desktops and laptops, what they really need to do IMO is develop a new memory interface connecting to both the CPU and discrete GPU that allows for a unified memory architecture. So your 5090 would come with zero memory and would just use whatever system ram you have. That might be 16GB for a budget system or it might be 256GB for a high end system. THAT would allow running local LLM models big enough to actually be good even on lower performance GPUs. The performance of your GPU would basically determine how fast you got your result while the amount of memory would determine which model you could run and those 2 things would be unrelated to each other. I don't really see a situation where an NPU is relevant for an AI model big enough to give good results. Can it run copilot? Probably. Is Copilot worth using? Not from what I've noticed.
I don't really expect this to happen though because this would potentially canabalize data center GPU sales if you could slap 500+GB of system memory into a desktop and run a data center sized LLM model on a much cheaper 5090. Slower sure, but $3-5k for a 5090 is a lot cheaper than $30-50k for a data center Blackwell GPU.
If those annotations are correct, I'm surprised.If it's similar to Strix Point, it looks roughly comparable to about 3.5x full-fat Zen 5 cores, or 4x RDNA 3.5 WGPs (so 8X CUs), or 16MB of L3 cache.
https://www.techpowerup.com/325035/amd-strix-point-silicon-pictured-and-annotated
Yes you do unless you are prepared to move to a different time zone. Get ahead of the game by living in Nova Scotia!Dammit I came to post this. Do I really have to be reading Ars at 5:30 am to not get ninja'd?![]()
Part of the cost of chiplets. They give you versatility but costs you die area.If those annotations are correct, I'm surprised.
Looks like their NPU is quite a big larger than Apple's, despite similar advertised performance (TOPS).
But still, it only amounts to 7% of the die area. I don't think AMD is charging anybody extra for that 7%.
Several indie games have already experimented with AI for NPCs.That's not a problem with NPU RAM access or speed. It's a function of game developers either not wanting to spend the time to develop an AI model trained on the game, or more likely, them not yet having time to do it. A AAA game has about a 6 year development cycle. The first public version of ChatGPT was less than 3 ½ years ago. You're not going to see the first AAA designed with proper AI in mind for another 3 years.