OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

pug fugly · Feb 12, 2026

Affordable GPUs soon? Maybe?
please?

CatNamedHugs · Feb 12, 2026

Aka: the cancer has spread to a new species of hardware.

NoReallyJustSaying · Feb 12, 2026

I'd like to know what actual serious work can be done with a 128K token window. I don't have anything in our code base that would qualify -- outside of trivial services that a monkey could maintain anyway.

Hell, that kind of window barely allows for our main database schema for crying out loud.

mschira · Feb 12, 2026

$200/month
Wow.
Well somehow you need to create the concept of large income amounts if you want to continue justify massive money burn AI companies are doing.

koolraap · Feb 12, 2026

The Cerebras’ Wafer Scale Engine 3 is huge!

viscious+ars2 · Feb 12, 2026

is this really a wafer sized chip? Because that sounds…difficult and expensive…and very difficult. A manufacturing or packaging defect is going to be very expensive when you have to throw the entire wafer away.

**RTS** · Feb 12, 2026

viscious+ars2 said:
is this really a wafer sized chip? Because that sounds…difficult and expensive…and very difficult. A manufacturing or packaging defect is going to be very expensive when you have to throw the entire wafer away.

They have redundancy and employ yield improvement techniques so that they can ship a viable product even though portions of each chip are defective.

MailDeadDrop · Feb 12, 2026

viscious+ars2 said:
is this really a wafer sized chip? Because that sounds…difficult and expensive…and very difficult. A manufacturing or packaging defect is going to be very expensive when you have to throw the entire wafer away.

Maybe. Wafer-scale "chips" can have enormous amounts of programmable redundancy. This happens even with individual dies in more normal situations.

coma24 · Feb 12, 2026

So... still not useful for much of anything but hobby coding and in-house temporary tools then? Got it, call me when it can actually do something useful

Rector · Feb 12, 2026

ChatGPT: You have a bug in your code. You should be doing this:

uint32_t new = (old >> 22) << 2;

But you did this:

uint32_t new = (old >> 20) & 0xFFC;

Me: Are you sure that's a bug?

ChatGPT: Yes it's a bug. By accident, your code produces the correct result, but you can't rely on that all the time.

Me: It wasn't an accident.

ChatGPT: Ok, sorry. Then it's not a bug.

iollmann · Feb 12, 2026

CatNamedHugs said:
Aka: the cancer has spread to a new species of hardware.

This was inevitable. ASICs were always going to beat GPUs just as they have always beat GPUs for other problem areas. It just takes a decade or so for the algorithms to settle down to the point that you can use a ASIC rather than a more programmable thing like a GPU.

We note however that we’ve moved
from cnns to llms and it isn’t entirely clear what the next AI neural network model will be, so ASICs will always be trailing until that settles down. Presumably by the time this has been reduced down to robots, the asics will be in play. They can’t afford to be burning multiple kW on compute, because they are probably on battery.

Varste · Feb 12, 2026

"With this baby we can burn $1 billion every day!"

pauleyc · Feb 12, 2026

I'm sure this diversification will only incentivize Nvidia to invest further billions into OpenAI (on top of those $20bn...providing that happens).

Varste · Feb 12, 2026

mschira said:
$200/month
Wow.
Well somehow you need to create the concept of large income amounts if you want to continue justify massive money burn AI companies are doing.

Per this Techcrunch article, that $200/mo plan loses money because people are using it too much.

chaos215bar2 · Feb 12, 2026

viscious+ars2 said:
is this really a wafer sized chip? Because that sounds…difficult and expensive…and very difficult. A manufacturing or packaging defect is going to be very expensive when you have to throw the entire wafer away.

The trick would be to design the chip such that every part is redundant and can be disabled if there's a defect. Tricky, but certainly not impossible.

It's still going to be a massively expensive chip, but it should be extremely rare that you actually have to discard an entire wafer once the early production issues are worked out.

The real key for adoption, aside from price, will be just how flexible the programming model is and how difficult it is to put into production.

iollmann · Feb 12, 2026

odikweos said:
Bigger die size means more rejects from the wafer, or a high acceptance of defects. That's.. a huge die size. They can't be making many fully functional chips of that size. There just aren't enough lines at TSMC.

You could design a single control region and multiple compute areas that can be individually fused off. As long as the control region is good, then you can bin the parts according to available compute clusters.

NoReallyJustSaying · Feb 12, 2026

Rector said:
ChatGPT: You have a bug in your code. You should be doing this:

Oh, I can do you one better:

Me: you're failing the unit test.
Claude: let me fix that.
Me: you just fudged the unit test.
Claude: let me fix that.
Me: you just fudged the unit test in a different way. Revert the unit test and fix the code.
Claude: let me fix that.
Me: you just fudged the unit test in yet another different way.
...

No shit, this happened -- we actually tried this after Ars reported on this a while back.

In essence, you cannot let the LLM have write access to your unit tests. So you either write them yourself from scratch, or have them be generated and then have to monitor whether they cover whatever the LLM spits out. Which means you have to fully review every line from scratch anyway, and...

Wait, wasn't this bullshit supposed to make me MORE productive?

By Grabthar's Hammer.... .... ... ....

What a savings.

Bongle · Feb 12, 2026

pug fugly said:
Affordable GPUs soon? Maybe?
please?

They're all still using the extremely-limited space on TSMC's best node.

The challenge for gaming GPU's pricing is that top-end units are going to need the smallest features to maximize performance, and so they're always going to be competing against higher-demand crap like AI chips (2026) or bitcoin miners (2021).

If a vendor can make $300k profit by buying a wafer of GPU chips but $1M profit by buying a wafer of AI chips, then GPU chip supply will remain limited.

NoReallyJustSaying · Feb 12, 2026

iollmann said:
Presumably by the time this has been reduced down to robots

I'm sorry, I have to type this through tears of guffawing laughter.

When do you figure this will be?

GFKBill · Feb 12, 2026

NoReallyJustSaying said:
I'm sorry, I have to type this through tears of guffawing laughter.

When do you figure this will be?

It's already here

https://80.lv/articles/furby-hooked-to-chatgpt-reveals-its-plan-to-take-over-the-world

Random_stranger · Feb 12, 2026

NoReallyJustSaying said:
Oh, I can do you one better:

Me: you're failing the unit test.
Claude: let me fix that.
Me: you just fudged the unit test.
Claude: let me fix that.
Me: you just fudged the unit test in a different way. Revert the unit test and fix the code.
Claude: let me fix that.
Me: you just fudged the unit test in yet another different way.
...

Ok, so our in-house "trained n our codebase" Augment DOES have SOME value. It can search our codebase / make simple changes - and the "smarter auto-correct" saves some time. Copying functionality and having it auto-insert the new variables into the new debug statement (a bit like having a spreadsheet auto-adjust relatively-indexed cells when copying a formula) IS helpful.

But then I tried to as it: In the following method, I need to you create and instance of "X" and forward it into this function call, and then down to the next level function call.

The first iteration was very off. I gave it some additional comments, and it was "I see now that I was wrong. ok, let me look up how X works, I see now, let me insert X... I see now that I was wrong. ok, let me look up how X works.. I see now that I was wrong..." after 4 or 5 iterations, I stopped it and wrote it by hand.

Mardaneus · Feb 12, 2026

Subheading:

OpenAI’s new GPT‑5.3‑Codex‑Spark is 15 times faster at coding than its predecessor.

What is the cost for a similar program on this iteration compared to the predecessor?
It doesn't really matter if it is faster if the cost to generate an answer don't drop by an order of magnitude.

Note: Back of napkin math, you've been warned

That is based on Oracle having a -220% margin on the 2025 generation of NVIDIA LLM GPU . Which translates to about a price increase for use to about 4.5 times, current cost * 3.2 * 1.4 (28.5% margin which is fairly low, do note margin is based on sales price not own costs), seeing that GPU time is a commodity this would roughly be the price anyone would have to pay if renting time.
That would balloon costs for OpenAI to over $25 billion (based on that at least 75% of their costs is GPU time). If they then want the same margin as Oracle they reach a cost that is ~10x their current revenue.

NoReallyJustSaying · Feb 12, 2026

Random_stranger said:
The first iteration was very off. I gave it some additional comments, and it was "I see now that I was wrong. ok, let me look up how X works, I see now, let me insert X... I see now that I was wrong. ok, let me look up how X works.. I see now that I was wrong..." after 4 or 5 iterations, I stopped it and wrote it by hand.

That's because you asked it to do something it hadn't done before. "Ooh, it can crank out a crypto exchange platform in Rust in 20 minutes!" "Oh, you mean like these 20 open-source ones it is cribbing off of?"

LLMs are very good at cranking out trivial shit. The real question is how bad your dev team is when that is a productivity boost.

NoReallyJustSaying · Feb 12, 2026

Mardaneus said:
If they then want the same margin as Oracle they reach a cost that is ~10x their current revenue.

-- IF they stop right now and don't train any new models. Yeah, it's that hilarious.

NoReallyJustSaying · Feb 12, 2026

Random_stranger said:
It can search our codebase / make simple changes - and the "smarter auto-correct" saves some time. Copying functionality and having it auto-insert the new variables into the new debug statement (a bit like having a spreadsheet auto-adjust relatively-indexed cells when copying a formula) IS helpful.

Side note: tools like ReSharper have been happily doing this, and more, for about two decades now. No LLM required, this is called proper static code analysis. For old folks, it's semantic linting.

Resistance · Feb 12, 2026

koolraap said:
The Cerebras’ Wafer Scale Engine 3 is huge!

View attachment 128196

Wafer scale tech has always excited me, one day they will be stacking them multiple cm high with internal cooling channels.

Resistance · Feb 12, 2026

NoReallyJustSaying said:
Oh, I can do you one better:

Me: you're failing the unit test.
Claude: let me fix that.
Me: you just fudged the unit test.
Claude: let me fix that.
Me: you just fudged the unit test in a different way. Revert the unit test and fix the code.
Claude: let me fix that.
Me: you just fudged the unit test in yet another different way.
...

No shit, this happened -- we actually tried this after Ars reported on this a while back.

In essence, you cannot let the LLM have write access to your unit tests. So you either write them yourself from scratch, or have them be generated and then have to monitor whether they cover whatever the LLM spits out. Which means you have to fully review every line from scratch anyway, and...

Wait, wasn't this bullshit supposed to make me MORE productive?

By Grabthar's Hammer.... .... ... ....

What a savings.

Once a LLM produces a bad output best practice is to scrap that context window or revert it to an earlier state. Model and tool providers should really work to encourage users to do this, or to partially automate the process.

mrkite77 · Feb 12, 2026

NoReallyJustSaying said:
Oh, I can do you one better:

Me: you're failing the unit test.
Claude: let me fix that.
Me: you just fudged the unit test.
Claude: let me fix that.
Me: you just fudged the unit test in a different way. Revert the unit test and fix the code.
Claude: let me fix that.
Me: you just fudged the unit test in yet another different way.

Reminds me of this:

fuzzyfuzzyfungus · Feb 12, 2026

viscious+ars2 said:
is this really a wafer sized chip? Because that sounds…difficult and expensive…and very difficult. A manufacturing or packaging defect is going to be very expensive when you have to throw the entire wafer away.

It's not the sort of thing that gets a public price sheet; but estimates are 2-3 million per unit for the the CS-3, which is a single one of the wafer scale CPUs along with cooling and I/O.

There's some amount of redundancy so that they don't have to toss entire wafers for individual defects; but not at all cheap. Probably a fair bit of NFE in the price tag, given how niche it is; but an entire 5nm wafer with suitably low defects per system is just pricy.

SittingDuc · Feb 12, 2026

Dinner-plate sized chips, using an entire wafer. Okay, cool. Can it run Crysis?

Given the prices Cerberus is throwing around, would that make it the most expensive Crysis-playing-thing ever?

NoReallyJustSaying · Feb 12, 2026

fuzzyfuzzyfungus said:
estimates are 2-3 million per unit for the the CS-3

fuzzyfuzzyfungus said:
not at all cheap

I award you one "understatement of the week" award. Seriously, 15 times faster! You mean 15 times faster than something like an H200, which costs about 500 times less?

By Grabthar's Hammer....

NoReallyJustSaying · Feb 12, 2026

Resistance said:
Once a LLM produces a bad output best practice is to scrap that context window or revert it to an earlier state. Model and tool providers should really work to encourage users to do this, or to partially automate the process.

1. LLM shits out a metric shit-ton of code for the low low price of a nice car
2. Senior devs have to spend significant time evaluating it for the low low price of several nice cars;

It turns out the code is utter shite from the get-go

3. Automated processes trash the entire thing and have it start from scratch; GOTO 1
4. ???
5. Profit!

IronHam · Feb 12, 2026

NoReallyJustSaying said:
I'd like to know what actual serious work can be done with a 128K token window. I don't have anything in our code base that would qualify -- outside of trivial services that a monkey could maintain anyway.

Hell, that kind of window barely allows for our main database schema for crying out loud.

I don’t know why you’re being downvoted. You’re right.

John.Flick · Feb 12, 2026

NoReallyJustSaying said:
I'd like to know what actual serious work can be done with a 128K token window. I don't have anything in our code base that would qualify -- outside of trivial services that a monkey could maintain anyway.

Hell, that kind of window barely allows for our main database schema for crying out loud.

Personally I use VSCode and Gemini 3 pro for my "i'd usually delegate this to a junior developer" prompts. I give it the right files for context and some plain english psuedo code to do a small task at a time while I'm thinking about the next step. I think most serious professionals use the models in this way. The "magic bullet, I one shotted another TODO list or Minesweeper clone" stuff is non-sense, ignore it, but as for helping accelerate your work the tooling they've implemented in VSCode is pretty mature at this point and I get good results from Gemini. Think of it as your pair programmer who is eager to do the grunt work while you figure out the big picture. We work primarily in PHP/Laravel on the backend and React-Typescript-Tailwinds on the front end and get few hallucinations and good results without needing to be super verbose in the prompting at this point. Our backend monolith is quite big but I always give it a head start by including the relevant files in my prompt.

One of the things it was very useful for and saved us a ton of time was rapidly building out a new custom front end component library using Tailwinds 4 and popular, mature React libraries like ReactSelect. We paid for the Tailwinds license (they deserve it and none of this would work as well as it does without Tailwind). You can do things like "Using the installed ReactSelect library, create a controlled form component called DropdownSelect that works in a similar way the current Dropdown component. It should have similar props and use Headless were relevant" and let it rip. You're going to be impressed how well it correlates what you already have into something similar but new. This is essentially ALL these things do.

Another recent example: A new API endpoint where users answer question forms in steps (6 steps 20 questions each). Each step submits their 20 answers. I simple prompted it to write the method mergeAnswers(Survey $section), giving it a few other things for context like "use question_id for the add or update" and it wrote it. If you're using types/interfaces this will all work a lot better because those give it more context. I then quickly iterated on it like "ensure the answers are sorted by question_id before saving" and it'll quickly write that without you needing to reference the docs yourself.

Fancy auto-complete is probably the worst tool at this point. I think about disabling it all the time, or at least getting good at enabling it with a hotkey only when I'm ready for it to attempt it. For instance you might type out a well named atomic function with the arguments then turn it on and wait to see if it can guess what you intend to do. It's sitting there guessing wrong constantly otherwise and it's super annoying.

Asbestos Muffins · Feb 12, 2026

kind of impressed cerebras actually worked, AI researchers tried to do this back decades ago but couldn't make the chips. the yield was atrocious

Asbestos Muffins · Feb 12, 2026

viscious+ars2 said:
is this really a wafer sized chip? Because that sounds…difficult and expensive…and very difficult. A manufacturing or packaging defect is going to be very expensive when you have to throw the entire wafer away.

ya the last company that tried this couldn't make it work but that was like the pentium era

fractl · Feb 12, 2026

Asbestos Muffins said:
kind of impressed cerebras actually worked, AI researchers tried to do this back decades ago but couldn't make the chips. the yield was atrocious

Even Cerebras' first-gen chips had the ability to map around broken tiles. They aren't getting perfect yield, but all they need is enough tiles to be a competitive product. It's not like Nvidia is getting a lot of perfect chips with their reticle-limited GPUs, either, so being able to make viable products from less-than-perfect parts is crucial.

TetsFR · Feb 12, 2026

You may have mentioned the aquisition (ok not an aquisition legally but in reality it is) of Groq by Nvidia last year to dev fast inference as well. Let's see what they deliver on the back of that, bearing in mind that those chips, do not support training capabability, neither light training tasks like Lora fine tuning I believe. So depending on what capabilities next gen SOTA AI models leverage, we will see what design makes more sense. But Nvidia has hedged itself with its blood sucking deal on Groq talents and tech.

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

Ars Tribunus Militum

Account Banned

Wise, Aged Ars Veteran

Ars Tribunus Militum

Ars Tribunus Militum

Ars Centurion

Ars Praetorian

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Praetorian

Ars Praetorian

Ars Praetorian

Ars Praefectus

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Praefectus

Wise, Aged Ars Veteran

Ars Tribunus Militum

Ars Praefectus

Ars Tribunus Militum

Wise, Aged Ars Veteran

Wise, Aged Ars Veteran

Wise, Aged Ars Veteran

Wise, Aged Ars Veteran

Wise, Aged Ars Veteran

Ars Tribunus Militum

Ars Legatus Legionis

Smack-Fu Master, in training

Wise, Aged Ars Veteran

Wise, Aged Ars Veteran

Smack-Fu Master, in training

Ars Centurion

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Praefectus

Ars Scholae Palatinae