FSR 4.1 finally comes to older Radeons

AMD finally made it official: FSR 4.1 upscaling will come to Radeon 7000 this summer, and to Radeon 6000 next year. Many sources, for example Videocardz.

An interesting side effect of this is that AMD can then turn TSMC's 7nm and 5nm wafers into products that are contemporarily somewhat competitive, but still somewhat affordable ... if it weren't for RAMageddon.
 

evan_s

Ars Tribunus Angusticlavius
7,503
Subscriptor
It feels like another case where they got pressured into doing the right thing just like letting 5000 series chips coming to older motherboards on AM4. IMO the 7000 series discrete GPUs not getting anything was annoying and frustrating but the thing that really hurt was stuff like Strix Halo which was their best iGPU solution. Beyond that they don't seem to be in any hurry to actually release RDNA4 iGPUs. That was their current best product in the category still stuck on pretty poor FSR3 up-scaling or using XeSS from a competitor and seemingly staying that way for the foreseeable future. That really isn't a great look for that product.
 

IceStorm

Ars Legatus Legionis
26,255
Moderator
AM4 was a "how do we compress our CPUIDs to fit onto 16Mbit BIOS chips". It's really not the same as "we have a working INT8 version of FSR, we gave it away for free wink wink, but we won't officially support it for one to two years".

The leaked INT8 version worked out of the box with RDNA3, but not with RDNA2. People had to do work to get INT8 working on RDNA2, but it now does - without AMD's help.

Hardware Unboxed plans to test the homegrown driver against AMD's official drivers once they finally arrive.

MLID believes AMD had no choice but to release FSR4 for older cards because their social media presence was being poisoned by comments. He lists several videos having nothing to do with Radeon, all of which have 60% hate comments about FSR4, not to mention every tweet getting a deluge of "Where is FSR4 for RDNA3?" comments. A couple weeks ago, AMD turned off comments on their YouTube videos. He thinks that's when they finally started considering restarting the "paused" FSR4 INT8 work for PC. It's not like they have to do a lot of work - Cerny's team did most of it for PSSR 2.0.

He believes the about-face was in the past day or two. He has a source that attended a meeting for AMD partners (as in AIBs) on May 14, and someone asked if RDNA3 would get FSR 4 support. The AMD rep said it probably never would. A few hours later Jack Huynh announces it.

The good news is that devs he has talked to since the annoucement indicate they will be adding FSR4 support now that the lion's share of RDNA cards will support it. He also confirms he has a source that says that it will work with RDNA 3 and 3.5 APUs shortly after RDNA3 desktop (before the end of the year). He also speculates that FSR4 may help Xbox Series X meet/beat regular PS5 performance. The Xbox Series X has signfiicantly (2.3x) better INT8 performance than the PS5. His sources say that PS5 SDK updates stopped at FSR 2.2. While 3.1 can be added, it takes a lot of work. Microsoft, on the other hand, has support for FSR 3.1 in their SDK already.

His evidence for why the older consoles would get FSR 4.1? The Steam Deck. He references a DF video where they tested it, and it offered the same performance as FSR 3.1 (2x over stock 720p), but with FSR4 quality.
 
The good news is that devs he has talked to since the annoucement indicate they will be adding FSR4 support now that the lion's share of RDNA cards will support it.
Let's hope that the acceptance of FSR4 by game studios does actually see the promised increase. And that AMD actually learns the right lesson from that.

And yes, Valve is of course the one party that stands the most to gain from this move. After AMD themselves ... but I don't see AMD having this flash of insight.

I mean, look how AMD reaped maximal negative backlash from dragging their feet, and now they get a little more backlash from proving beyond doubt that they could have delivered this kind of improvement much earlier, as it is mostly a software thing. On top of that: the APU debacle, where RDNA4 is not coming to the iGPU market segment anytime soon, which up to now implied that FSR4 was not part of those products either.

IMHO this is more evidence that the little engineer who leaked FSR4 knows more about how AMD is being perceived than the actual decision makers at AMD.
 

mpat

Ars Tribunus Angusticlavius
6,645
Subscriptor
I always assumed that FSR4 was coming to RDNA 3.5 at some point, as they won’t be making iGPUs with RDNA 4. RDNA 3 seemed likely, but RDNA 2 was a bit of a surprise as that one lacks WMMA (presumably they’re doing it with DP4a). Obviously it makes sense for the consoles, but I suppose the rest of us RDNA 2 users are no longer something AMD cares about.
AMD finally made it official: FSR 4.1 upscaling will come to Radeon 7000 this summer, and to Radeon 6000 next year. Many sources, for example Videocardz.

An interesting side effect of this is that AMD can then turn TSMC's 7nm and 5nm wafers into products that are contemporarily somewhat competitive, but still somewhat affordable ... if it weren't for RAMageddon.
Will be interesting to see what Valve can do with the 7400 in their latest machine with FSR 4.
 
The rumor mill has weighed in and claims that neither the installed base nor the public riot has had much influence on AMD's decision. Not even Valve, Sony, and Microsoft. Instead, so the story goes, AMD allegedly realized that their upcoming next generation APUs would gain actual value from FSR4, especially against ARM (not just Apple's).

After hearing this take, I am thinking this could be even more cynical. AMD wants to have an A.I. upscaler so that they can say "A.I." in public more often. That would motivate a wide rollout even all the way back to Radeon 6000 series. Because AMD is so committed ... not to the little customers, but to A.I. :\
 
I think it's just that AMD doesn't have the manpower. There's one AMD employee for every four Intel/nVidia employees, despite having the same product portfolio as both combined.
With respect to Nvidia, this argument sounds good. With respect to Intel, AMD employees appear so much more efficient and competent ... of course, Intel's problems are largely caused by historical leadership, not by the rank and file folks.

For GPU software support, AMD is definitely at a notable disadvantage compared to Nvidia. Not so much when compared to Intel's GPU efforts. I am pretty sure that ATI, and later AMD's Radeon group, regarded themselves as a hardware company, and software was a secondary concern.

Not an afterthought, quite the contrary, but ATI/AMD never seemed to solve software problems by throwing more people at them. For example, ATI/Radeon has always put a lof of effort into generic optimizations: shader compilers, schedulers, synchronization primitives, caching strategy, that sort of thing. But ATI/AMD never had an army of people to do game-specific tweaks and tuning, so they could only do that for a few major releases each year, at the most.

I guess the decision makers at ATI were hardware people, or they viewed game specific tweaks as cheats or as benchmarketing of little lasting value.
 

IceStorm

Ars Legatus Legionis
26,255
Moderator
With respect to Intel, AMD employees appear so much more efficient and competent
Eh, I don't think that's true. Most of that is catering to very specific market segments and brushing issues with the rest under the rug. It's like Thread Director vs Xbox Game Bar - whenever AMD can foist responsibility for something onto another party, they do. Intel tends to keep things like that in-house, even if the finance bros in charge let the engineering division degrade to "C" team engineers.

For GPU software support, AMD is definitely at a notable disadvantage compared to Nvidia. Not so much when compared to Intel's GPU efforts.
Oh, I would count AMD 3rd out of the three. They're slower on the game specific tweaks than even Intel is. They are more stable than the disaster that nVidia's "vibe coded" drivers have become, but when there's a problem, they're slow to react (Subnautica 2 with the 9060 XT is a recent example).

I am pretty sure that ATI, and later AMD's Radeon group, regarded themselves as a hardware company, and software was a secondary concern.
I agree, which is why the RX 580 had ~30% more transistors than the 1060, but performed the same. AMD throws hardware at a problem that software could solve. It's great for mining as mining can make full use of all those transistors, but it's bad for gaming.

This was fine in the past (using hardware to compensate for a lack of driver engineering). It's not fine now, and it stopped being acceptable a few years ago once AMD was on firm financial footing. Still, it plays into their strategy of hiding behind #2 in many metrics/behaviors.
 
Last edited:

mpat

Ars Tribunus Angusticlavius
6,645
Subscriptor
I agree, which is why the RX 580 had ~30% more transistors than the 1060, but performed the same. AMD throws hardware at a problem that software could solve. It's great for mining as mining can make full use of all those transistors, but it's bad for gaming.
NVidia made a fantastic efficiency improvement with Maxwell and then just shrunk it to Pascal (and arguably not even that, since Maxwell was designed for TSMC 20nm and backported to 28nm when TSMC cancelled most of the 20nm node. 16nm was TSMC 20nm adapted to FinFET, so Nvidia got "back" to the process they should have had for Maxwell). Meanwhile Polaris was essentially a straight shrink of Tonga (Polaris 10 being Tonga + 1 CU in each SE), and Tonga had only minor updates since Tahiti (7970). Tahiti was ludicrously inefficient compared even to Pitcairn (7870), but for some reason AMD liked that floorplan and used it again and again. It worked for them because they got their 14nm chips cheap from GF, because GF was extremely late. AMDs fix to all this was the one-two of Vega and RDNA1, which fixed their core inefficiencies and moved them to the great TSMC 7nm instead of the terrible GF/Samsung 14nm.

And honestly this should be in the "old hardware" thread by now, because it is ancient history.
 
  • Like
Reactions: continuum

IceStorm

Ars Legatus Legionis
26,255
Moderator
9070 XT has 53.9B transistors.

5070 Ti has 45.6B transistors.

AMD is using 18% more transistors, on a newer process node/refinement, to barely match nVidia. They don't even have an efficiency benefit - they had to push those transistors so hard that the 9070 XT uses an extra ~60W over a 5070 Ti system.

It's not "ancient history", it's still happening. AMD should invest in drivers.
 
Last edited:

evan_s

Ars Tribunus Angusticlavius
7,503
Subscriptor
I don't think it's just drivers. The GPU architecture is also part of it. Some of it may be that their architecture just isn't as good as Nvidia but it might also be different choices made in designing those architectures. Nvidia also has the resources and volume to design and manufacture more chips up and down the stack. AMD only has 2 chips this generation and they share a lot of design. The 9070/9070 xt are very much like a stretched 9060xt die. Architecture choices or issues can make it a lot harder for drivers to keep things utilized well.

HUB also noted in their recent 9070xt vs 5070ti retest that the 9070xt typically did better in console ports so there's probably also a bit of an optimization contribution. Nvidia is the dominant player for discrete desktop GPUs so games targeting that primarily get more testing and optimization towards Nvidia's cards and AMD or Intel suffer just because they are different. Not necessarily because they aren't or can't be as efficient. For Consoles AMD is the dominate player providing the GPU on both the Playstation and xBox consoles so you see a bit of the opposite situation. Nvidia also provides lots of help on the optimizing front.

The thing that makes the 9070XT vs 5070Ti comparison even worse is the 9070xt is the top bin for that chip while the 5070ti is the lower bin and the top bin is the 5080.
 

IceStorm

Ars Legatus Legionis
26,255
Moderator
For Consoles AMD is the dominate player providing the GPU on both the Playstation and XBox consoles so you see a bit of the opposite situation.
No, you don't. The Switch and Switch 2 are nVidia. Switch sold ~154M units. It was the best selling console out of the PS4/Xbox One era. Switch 2, in under a year, has cleared ~19M units. nVidia is in the best selling console.

AMD cannot even get their acts together to support FSR4 DLL injection on Xbox PC Game Pass titles, which you would think would be a no-brainer, but then you remember that AMD can't code their way out of a paper bag.

Nvidia also provides lots of help on the optimizing front.
It's all software. Either AMD can step up and provide support, they can step up and try to fix it in their drivers, or they can continue being 2nd place.

The thing that makes the 9070XT vs 5070Ti comparison even worse
How does this matter? AMD used more transistors and more power to barely equal nVidia's offerings.

You can turn around and reverse the comparison with the next product down the stack - the 5070. The 5070 is the top tier GB205 consumer product, while the 9070 is a cut down Navi 48. 9070 is ~10% faster than the 5070, but the die is a whopping 35% larger.

It all comes back to a lack of manpower on the software side for AMD.
 
9070 XT has 53.9B transistors.

5070 Ti has 45.6B transistors.

AMD is using 18% more transistors, on a newer process node/refinement, to barely match nVidia. They don't even have an efficiency benefit - they had to push those transistors so hard that the 9070 XT uses an extra ~60W over a 5070 Ti system.

It's not "ancient history", it's still happening. AMD should invest in drivers.
AMD does need better drivers and software but that won't always save you. Kepler cut back on transistors from Fermi which was hardware rich, today Fermi can (barely) support parts of DX12, Kepler cannot. Sorry for the ancient history lesson, but those who don't learn from history are doomed to repeat its failures.
 
The Switch and Switch 2 are nVidia. Switch sold ~154M units. It was the best selling console out of the PS4/Xbox One era. Switch 2, in under a year, has cleared ~19M units. nVidia is in the best selling console.
Hmm, Switch 1 is based on silicon that was too underpowered even for smartphones. Nvidia was sitting on a mountain of unsellable chips, so Nintendo was getting a very good deal. Switch 1 did incentivize lots of optimization, but not so much optimization to Nvidia specifically as to the Switch's very limited resources overall.

Switch 2 might change things a bit, once it has reached critical mass and developers can abandon Switch 1. And yes, that point looks to be reached sooner rather than later - with RAMmageddon possibly causing some delays.
 
Hmm, Switch 1 is based on silicon that was too underpowered even for smartphones. Nvidia was sitting on a mountain of unsellable chips, so Nintendo was getting a very good deal. Switch 1 did incentivize lots of optimization, but not so much optimization to Nvidia specifically as to the Switch's very limited resources overall.

Switch 2 might change things a bit, once it has reached critical mass and developers can abandon Switch 1. And yes, that point looks to be reached sooner rather than later - with RAMmageddon possibly causing some delays.
No, switch 1 aka the tegra silicon it is based on was too powerful for smartphones, not underpowered, it would cook in a closed in smartphone style housing. Even phone chipsets can't sustain high performance for long due to the heat problem. I doubt even the best smartphones today could run the switch 1 library as well as the switch 1 itself does, let alone power smart cars, be a tv entertainment box and everything else the tegra hardware has done besides being a switch 1.
 

Scandinavian Film

Ars Tribunus Militum
1,541
Subscriptor++
9070 XT has 53.9B transistors.

5070 Ti has 45.6B transistors.

AMD is using 18% more transistors, on a newer process node/refinement, to barely match nVidia. They don't even have an efficiency benefit - they had to push those transistors so hard that the 9070 XT uses an extra ~60W over a 5070 Ti system.

It's not "ancient history", it's still happening. AMD should invest in drivers.
Transistor counts can be misleading, since there are different methods you can use to count them, and that's before you get into trade offs between the different transistor types, which effects both performance per transistor and transistor density. You could just as well note that the 5070 Ti (378 mm²) is using 6% more die area to "barely match" the 9070 XT (357 mm²). The reality is that both companies are investing in the performance of their drivers, they just have different approaches to their hardware design.
 
Yes, in addition to delivering too little performance, Tegra also required too much power. It missed the target smartphone market on both fronts.
Nonsense, you can't have Tegra's performance without needing more power. There is a reason why mobile phones thermal throttle and can't sustain their peak performance for more than brief periods while the Switch 1 can sustain its performance over time... Turns out Tegra didn't need the mobile phone market in order to be successful anyway, it found multiple other markets for itself where it did very well. Switch 1 is the 2nd best selling game console of all time, beaten only by the Playstation 2.
 
I'll wait until my timeline re-connects to that other one where most smartphones are using the unbeatable Nvidia silicon. After that nexus event has happened, I'll consider continuing this thread of debate.
Hmm, Switch 1 is based on silicon that was too underpowered even for smartphones. Nvidia was sitting on a mountain of unsellable chips, so Nintendo was getting a very good deal. Switch 1 did incentivize lots of optimization, but not so much optimization to Nvidia specifically as to the Switch's very limited resources overall.

Switch 2 might change things a bit, once it has reached critical mass and developers can abandon Switch 1. And yes, that point looks to be reached sooner rather than later - with RAMmageddon possibly causing some delays.
You were the one making the claims about Tegra/switch 1, not me... I merely pointed out that nvidia didn't need the smartphone market for Tegra and that modern phones would struggle to match the switch playing the same games. I could get snarky on the mountain of unsellable chips claim and bring up the ATi r200/rv200, but I'll avoid going there.
 

MadMac_5

Ars Praefectus
3,996
Subscriptor
To try and wrestle this back on topic, I am curious how FSR 4.1 will work on RDNA 2 hardware like the Steam Deck, the consoles, or the 6800 XT/6900 cards. From what I recall the INT8 hardware in those chips is pretty minimal to do things like global illumination ray-tracing, and they'll likely work best at "Balanced" or "Performance" settings. I'm wondering if the 6800 XT or 6900 can just brute-force their way through to keep a performance boost in Quality mode, though?
 

mpat

Ars Tribunus Angusticlavius
6,645
Subscriptor
To try and wrestle this back on topic, I am curious how FSR 4.1 will work on RDNA 2 hardware like the Steam Deck, the consoles, or the 6800 XT/6900 cards. From what I recall the INT8 hardware in those chips is pretty minimal to do things like global illumination ray-tracing, and they'll likely work best at "Balanced" or "Performance" settings. I'm wondering if the 6800 XT or 6900 can just brute-force their way through to keep a performance boost in Quality mode, though?
Main issue is that RDNA 2 doesn’t have WMMA, and FSR 4 is all about that. What AMD will have to do is hack it up with DP4a (which more or less calculates a matrix multiplication one result value at a time). RDNA 2 also has a pretty good utilization, unlike RDNA 3 where the dual issue functionality is barely used so there is plenty of hardware idle, so I expect FSR 4 to eat quite a lot of performance on RDNA 2.
 
I am curious how FSR 4.1 will work on RDNA 2 hardware like the Steam Deck, the consoles, or the 6800 XT/6900 cards.
Steam Deck will have the most trouble, because of limited RAM bandwidth. Good AI models contain quite a bit of information (in the form of coefficients or weights) and those need to be accessed repeatedly.

Radeon 6000 will, IMHO, be fine in the end; the hardware is sufficiently capable - but I don't envy the engineers who have to tune the shader code in order to realize as much of the potential as possible.
 

evan_s

Ars Tribunus Angusticlavius
7,503
Subscriptor
I think the 6000 series and RDNA 2 iGPUs will just get less performance benefit at a given quality level than RDNA 3 or 4 based stuff will. That's pretty much unavoidable. You gain performance because rendering is quicker at a lower resolution but you burn some of that savings in the time it takes to complete the upscaling process. With less Int 8 performance available for the RDNA 2 based architectures the time the upscaling will take will be longer so net benefit will be lower. I wouldn't be surprised to see little to no benefit in quality mode. Balanced or performance mode will probably still be worth using and at least having the option will be nice.

Look at the DLSS 4.5 on Nvidia 2000 and 3000 series cards as a similar type of situation. The older cards don't have the performance acceleration for the newer format and heavier model and the resulting benefits aren't great.
 

IceStorm

Ars Legatus Legionis
26,255
Moderator
The reality is that both companies are investing in the performance of their drivers, they just have different approaches to their hardware design.
Neither is investing in their drivers. nVidia farmed out driver code to LLMs in 2025, which is why they've all been garbage since January, 2025. AMD just doesn't have enough people working on drivers in general. AMD is perfectly happy just trailing the efforts that nVidia puts forth. nVidia has checked out, so AMD's just treading water.

Main issue is that RDNA 2 doesn’t have WMMA, and FSR 4 is all about that. What AMD will have to do is hack it up with DP4a
AMD doesn't have to do a goddam thing. FSR4 INT8 runs fine on my RX 6700 XT, today, using OptiScaler, in Pragmata, ~12hrs after Pragmata launched. It both looks significantly better than the built-in FSR3 option and runs faster than native.
 
Last edited:
  • Like
Reactions: Baenwort
Wasn't the Tegra X1 aimed more at tablets rather than smartphones? It was pretty powerful for its time, but the TDP was such that you'd have to seriously underclock it to fit in a phone. The competition at the time would have been the Apple A8 and A9 later in the year. Apple would have dominated in single core but the X1 had 4 performance cores and 4 efficiency cores compared to Apple's dual core designs. In terms of GPU, I don't have a direct comparison, but Maxwell was an absolutely stonking architecture for the time so I'd be surprised if Apple truly beat it, especially since the A8 wasn't all that great.
 
  • Like
Reactions: grstanford

mpat

Ars Tribunus Angusticlavius
6,645
Subscriptor
IceStorm, please check your quotes. I did not wrote the first bit you quoted.

(There is a bug with multiquotes in the forum software, and I think you have hit it. Just keep an eye on it, please)
Wasn't the Tegra X1 aimed more at tablets rather than smartphones? It was pretty powerful for its time, but the TDP was such that you'd have to seriously underclock it to fit in a phone. The competition at the time would have been the Apple A8 and A9 later in the year. Apple would have dominated in single core but the X1 had 4 performance cores and 4 efficiency cores compared to Apple's dual core designs. In terms of GPU, I don't have a direct comparison, but Maxwell was an absolutely stonking architecture for the time so I'd be surprised if Apple truly beat it, especially since the A8 wasn't all that great.
Nvidia made a lot of chips in that era that turned out to draw too much power, which made them retroactively designated for other things that phones. It was pretty obvious that they were just putting makeup on a pig.

Apple’s GPUs were indeed not particularly powerful back then, but it is hard to compare when the approach to how to design a GPU is so different.
 
Nvidia made a lot of chips in that era that turned out to draw too much power, which made them retroactively designated for other things that phones. It was pretty obvious that they were just putting makeup on a pig.
So they drew too much power for a phone, but were there better performing SOCs, graphics wise, for the same power envelope? When they were designing chips like the X1, there was still a lot of anticipation that tablets would be a bigger market than they turned out to be, so a chip with a powerful GPU would seem to be playing to NVIDIA's strengths.

To give a rough comparison, the X1 had double the raw theoretical TOPS of the Apple A9. The A9X, which went in the iPad Pro, was comparable in theoretical numbers. However, as you pointed out, the design philosophies were different, so it's hard to say how that translated in actual game performance, for example.