OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

koolraap

Ars Tribunus Militum
2,236
The Cerebras’ Wafer Scale Engine 3 is huge!

1770937948055.png
 
Upvote
197 (198 / -1)
is this really a wafer sized chip? Because that sounds…difficult and expensive…and very difficult. A manufacturing or packaging defect is going to be very expensive when you have to throw the entire wafer away.
They have redundancy and employ yield improvement techniques so that they can ship a viable product even though portions of each chip are defective.
 
Upvote
167 (167 / 0)

MailDeadDrop

Ars Scholae Palatinae
1,139
Subscriptor
is this really a wafer sized chip? Because that sounds…difficult and expensive…and very difficult. A manufacturing or packaging defect is going to be very expensive when you have to throw the entire wafer away.
Maybe. Wafer-scale "chips" can have enormous amounts of programmable redundancy. This happens even with individual dies in more normal situations.
 
Upvote
75 (76 / -1)

Rector

Ars Tribunus Militum
1,570
Subscriptor++
ChatGPT: You have a bug in your code. You should be doing this:

uint32_t new = (old >> 22) << 2;

But you did this:

uint32_t new = (old >> 20) & 0xFFC;

Me: Are you sure that's a bug?

ChatGPT: Yes it's a bug. By accident, your code produces the correct result, but you can't rely on that all the time.

Me: It wasn't an accident.

ChatGPT: Ok, sorry. Then it's not a bug.
 
Upvote
153 (157 / -4)
Post content hidden for low score. Show…

iollmann

Ars Scholae Palatinae
1,301
Aka: the cancer has spread to a new species of hardware.
This was inevitable. ASICs were always going to beat GPUs just as they have always beat GPUs for other problem areas. It just takes a decade or so for the algorithms to settle down to the point that you can use a ASIC rather than a more programmable thing like a GPU.

We note however that we’ve moved
from cnns to llms and it isn’t entirely clear what the next AI neural network model will be, so ASICs will always be trailing until that settles down. Presumably by the time this has been reduced down to robots, the asics will be in play. They can’t afford to be burning multiple kW on compute, because they are probably on battery.
 
Last edited:
Upvote
82 (89 / -7)
is this really a wafer sized chip? Because that sounds…difficult and expensive…and very difficult. A manufacturing or packaging defect is going to be very expensive when you have to throw the entire wafer away.
The trick would be to design the chip such that every part is redundant and can be disabled if there's a defect. Tricky, but certainly not impossible.

It's still going to be a massively expensive chip, but it should be extremely rare that you actually have to discard an entire wafer once the early production issues are worked out.

The real key for adoption, aside from price, will be just how flexible the programming model is and how difficult it is to put into production.
 
Upvote
48 (48 / 0)

iollmann

Ars Scholae Palatinae
1,301
Bigger die size means more rejects from the wafer, or a high acceptance of defects. That's.. a huge die size. They can't be making many fully functional chips of that size. There just aren't enough lines at TSMC.
You could design a single control region and multiple compute areas that can be individually fused off. As long as the control region is good, then you can bin the parts according to available compute clusters.
 
Upvote
27 (27 / 0)
ChatGPT: You have a bug in your code. You should be doing this:
Oh, I can do you one better:

Me: you're failing the unit test.
Claude: let me fix that.
Me: you just fudged the unit test.
Claude: let me fix that.
Me: you just fudged the unit test in a different way. Revert the unit test and fix the code.
Claude: let me fix that.
Me: you just fudged the unit test in yet another different way.
...

No shit, this happened -- we actually tried this after Ars reported on this a while back.

In essence, you cannot let the LLM have write access to your unit tests. So you either write them yourself from scratch, or have them be generated and then have to monitor whether they cover whatever the LLM spits out. Which means you have to fully review every line from scratch anyway, and...

Wait, wasn't this bullshit supposed to make me MORE productive?

By Grabthar's Hammer.... .... ... ....

What a savings.
 
Upvote
197 (202 / -5)

Bongle

Ars Praefectus
4,486
Subscriptor++
Affordable GPUs soon? Maybe?
please?
They're all still using the extremely-limited space on TSMC's best node.

The challenge for gaming GPU's pricing is that top-end units are going to need the smallest features to maximize performance, and so they're always going to be competing against higher-demand crap like AI chips (2026) or bitcoin miners (2021).

If a vendor can make $300k profit by buying a wafer of GPU chips but $1M profit by buying a wafer of AI chips, then GPU chip supply will remain limited.
 
Upvote
75 (75 / 0)

Random_stranger

Ars Praefectus
5,383
Subscriptor
Oh, I can do you one better:

Me: you're failing the unit test.
Claude: let me fix that.
Me: you just fudged the unit test.
Claude: let me fix that.
Me: you just fudged the unit test in a different way. Revert the unit test and fix the code.
Claude: let me fix that.
Me: you just fudged the unit test in yet another different way.
...

Ok, so our in-house "trained n our codebase" Augment DOES have SOME value. It can search our codebase / make simple changes - and the "smarter auto-correct" saves some time. Copying functionality and having it auto-insert the new variables into the new debug statement (a bit like having a spreadsheet auto-adjust relatively-indexed cells when copying a formula) IS helpful.

But then I tried to as it: In the following method, I need to you create and instance of "X" and forward it into this function call, and then down to the next level function call.

The first iteration was very off. I gave it some additional comments, and it was "I see now that I was wrong. ok, let me look up how X works, I see now, let me insert X... I see now that I was wrong. ok, let me look up how X works.. I see now that I was wrong..." after 4 or 5 iterations, I stopped it and wrote it by hand.
 
Upvote
49 (51 / -2)

Mardaneus

Ars Tribunus Militum
2,054
Subheading:
OpenAI’s new GPT‑5.3‑Codex‑Spark is 15 times faster at coding than its predecessor.
What is the cost for a similar program on this iteration compared to the predecessor?
It doesn't really matter if it is faster if the cost to generate an answer don't drop by an order of magnitude.

Note: Back of napkin math, you've been warned ;)
That is based on Oracle having a -220% margin on the 2025 generation of NVIDIA LLM GPU . Which translates to about a price increase for use to about 4.5 times, current cost * 3.2 * 1.4 (28.5% margin which is fairly low, do note margin is based on sales price not own costs), seeing that GPU time is a commodity this would roughly be the price anyone would have to pay if renting time.
That would balloon costs for OpenAI to over $25 billion (based on that at least 75% of their costs is GPU time). If they then want the same margin as Oracle they reach a cost that is ~10x their current revenue.
 
Upvote
15 (16 / -1)
The first iteration was very off. I gave it some additional comments, and it was "I see now that I was wrong. ok, let me look up how X works, I see now, let me insert X... I see now that I was wrong. ok, let me look up how X works.. I see now that I was wrong..." after 4 or 5 iterations, I stopped it and wrote it by hand.
That's because you asked it to do something it hadn't done before. "Ooh, it can crank out a crypto exchange platform in Rust in 20 minutes!" "Oh, you mean like these 20 open-source ones it is cribbing off of?"

LLMs are very good at cranking out trivial shit. The real question is how bad your dev team is when that is a productivity boost.
 
Upvote
91 (94 / -3)
It can search our codebase / make simple changes - and the "smarter auto-correct" saves some time. Copying functionality and having it auto-insert the new variables into the new debug statement (a bit like having a spreadsheet auto-adjust relatively-indexed cells when copying a formula) IS helpful.
Side note: tools like ReSharper have been happily doing this, and more, for about two decades now. No LLM required, this is called proper static code analysis. For old folks, it's semantic linting.
 
Upvote
94 (94 / 0)

Resistance

Wise, Aged Ars Veteran
548
Oh, I can do you one better:

Me: you're failing the unit test.
Claude: let me fix that.
Me: you just fudged the unit test.
Claude: let me fix that.
Me: you just fudged the unit test in a different way. Revert the unit test and fix the code.
Claude: let me fix that.
Me: you just fudged the unit test in yet another different way.
...

No shit, this happened -- we actually tried this after Ars reported on this a while back.

In essence, you cannot let the LLM have write access to your unit tests. So you either write them yourself from scratch, or have them be generated and then have to monitor whether they cover whatever the LLM spits out. Which means you have to fully review every line from scratch anyway, and...

Wait, wasn't this bullshit supposed to make me MORE productive?

By Grabthar's Hammer.... .... ... ....

What a savings.
Once a LLM produces a bad output best practice is to scrap that context window or revert it to an earlier state. Model and tool providers should really work to encourage users to do this, or to partially automate the process.
 
Upvote
58 (59 / -1)

mrkite77

Ars Tribunus Militum
1,782
Oh, I can do you one better:

Me: you're failing the unit test.
Claude: let me fix that.
Me: you just fudged the unit test.
Claude: let me fix that.
Me: you just fudged the unit test in a different way. Revert the unit test and fix the code.
Claude: let me fix that.
Me: you just fudged the unit test in yet another different way.
Reminds me of this:
 
Upvote
13 (14 / -1)
is this really a wafer sized chip? Because that sounds…difficult and expensive…and very difficult. A manufacturing or packaging defect is going to be very expensive when you have to throw the entire wafer away.

It's not the sort of thing that gets a public price sheet; but estimates are 2-3 million per unit for the the CS-3, which is a single one of the wafer scale CPUs along with cooling and I/O.

There's some amount of redundancy so that they don't have to toss entire wafers for individual defects; but not at all cheap. Probably a fair bit of NFE in the price tag, given how niche it is; but an entire 5nm wafer with suitably low defects per system is just pricy.
 
Upvote
32 (32 / 0)
Upvote
27 (29 / -2)
Once a LLM produces a bad output best practice is to scrap that context window or revert it to an earlier state. Model and tool providers should really work to encourage users to do this, or to partially automate the process.
1. LLM shits out a metric shit-ton of code for the low low price of a nice car
2. Senior devs have to spend significant time evaluating it for the low low price of several nice cars;

It turns out the code is utter shite from the get-go

3. Automated processes trash the entire thing and have it start from scratch; GOTO 1
4. ???
5. Profit!
 
Upvote
26 (37 / -11)

IronHam

Smack-Fu Master, in training
39
I'd like to know what actual serious work can be done with a 128K token window. I don't have anything in our code base that would qualify -- outside of trivial services that a monkey could maintain anyway.

Hell, that kind of window barely allows for our main database schema for crying out loud.
I don’t know why you’re being downvoted. You’re right.
 
Upvote
15 (24 / -9)
I'd like to know what actual serious work can be done with a 128K token window. I don't have anything in our code base that would qualify -- outside of trivial services that a monkey could maintain anyway.

Hell, that kind of window barely allows for our main database schema for crying out loud.
Personally I use VSCode and Gemini 3 pro for my "i'd usually delegate this to a junior developer" prompts. I give it the right files for context and some plain english psuedo code to do a small task at a time while I'm thinking about the next step. I think most serious professionals use the models in this way. The "magic bullet, I one shotted another TODO list or Minesweeper clone" stuff is non-sense, ignore it, but as for helping accelerate your work the tooling they've implemented in VSCode is pretty mature at this point and I get good results from Gemini. Think of it as your pair programmer who is eager to do the grunt work while you figure out the big picture. We work primarily in PHP/Laravel on the backend and React-Typescript-Tailwinds on the front end and get few hallucinations and good results without needing to be super verbose in the prompting at this point. Our backend monolith is quite big but I always give it a head start by including the relevant files in my prompt.

One of the things it was very useful for and saved us a ton of time was rapidly building out a new custom front end component library using Tailwinds 4 and popular, mature React libraries like ReactSelect. We paid for the Tailwinds license (they deserve it and none of this would work as well as it does without Tailwind). You can do things like "Using the installed ReactSelect library, create a controlled form component called DropdownSelect that works in a similar way the current Dropdown component. It should have similar props and use Headless were relevant" and let it rip. You're going to be impressed how well it correlates what you already have into something similar but new. This is essentially ALL these things do.

Another recent example: A new API endpoint where users answer question forms in steps (6 steps 20 questions each). Each step submits their 20 answers. I simple prompted it to write the method mergeAnswers(Survey $section), giving it a few other things for context like "use question_id for the add or update" and it wrote it. If you're using types/interfaces this will all work a lot better because those give it more context. I then quickly iterated on it like "ensure the answers are sorted by question_id before saving" and it'll quickly write that without you needing to reference the docs yourself.

Fancy auto-complete is probably the worst tool at this point. I think about disabling it all the time, or at least getting good at enabling it with a hotkey only when I'm ready for it to attempt it. For instance you might type out a well named atomic function with the arguments then turn it on and wait to see if it can guess what you intend to do. It's sitting there guessing wrong constantly otherwise and it's super annoying.
 
Last edited:
Upvote
42 (44 / -2)
is this really a wafer sized chip? Because that sounds…difficult and expensive…and very difficult. A manufacturing or packaging defect is going to be very expensive when you have to throw the entire wafer away.
ya the last company that tried this couldn't make it work but that was like the pentium era
 
Upvote
4 (4 / 0)

fractl

Ars Praefectus
3,511
Subscriptor
kind of impressed cerebras actually worked, AI researchers tried to do this back decades ago but couldn't make the chips. the yield was atrocious
Even Cerebras' first-gen chips had the ability to map around broken tiles. They aren't getting perfect yield, but all they need is enough tiles to be a competitive product. It's not like Nvidia is getting a lot of perfect chips with their reticle-limited GPUs, either, so being able to make viable products from less-than-perfect parts is crucial.
 
Upvote
21 (22 / -1)

TetsFR

Ars Scholae Palatinae
910
You may have mentioned the aquisition (ok not an aquisition legally but in reality it is) of Groq by Nvidia last year to dev fast inference as well. Let's see what they deliver on the back of that, bearing in mind that those chips, do not support training capabability, neither light training tasks like Lora fine tuning I believe. So depending on what capabilities next gen SOTA AI models leverage, we will see what design makes more sense. But Nvidia has hedged itself with its blood sucking deal on Groq talents and tech.
 
Upvote
2 (4 / -2)