How OpenAI is using GPT-5 Codex to improve the AI tool itself

maxoakland · Dec 12, 2025

AndrewZ said:
OpenAI has a long history of "grand exaggeration" when it comes to AI's supposed capabilities and achievements. Could it be that they are pumping up their coming IPO?

Can we see an independent review of these capabilities??

Great questions! The kinds of questions journalists are paid to ask. I wonder why the journalist writing so many of these AI articles never thinks to ask these types of questions. Even when the comment section is full of suggestions for this very thing

And why doesn't Ars hire a better journalist since their current writer is failing at basic journalism? They could easily find someone like you right in the comment section if they wanted to! Why are they ignoring great options for hard-hitting journalism?

Guess we'll never know!

_crane · Dec 12, 2025

motytrah said:
I use them everyday. Github Co-Pilot (Claude, ChatGPT, etc.) If you're getting AI slop then you need to change what you're doing. I'm looking like a super star with fairly typical human in the middle stuff. Python scripts, java, javascript, etc. Just fine. Also great for Teraform and Devops stuff.

Hell, I threw it at a bunch of COBOL and it did really well.

AI slop is a problem, but it's one addressable by understanding prompting techniques, context compression, and session management.

I'm going to get a good grade in asking the computer to make something for me, something that is both normal to want and possible to achieve.

cgo_12345 · Dec 12, 2025

raxadian said:
Pop the bubble.

More like lancing a carbuncle at this point.

terrydactyl · Dec 12, 2025

Glad I'm retired and can let the next generation figure this out

OpenAI’s approach treats Codex as what Bayes called “a junior developer” that the company hopes will graduate into a senior developer over time. “If you were onboarding a junior developer, how would you onboard them? You give them a Slack account, you give them a Linear account,” Bayes said. “It’s not just this tool that you go to in the terminal, but it’s something that comes to you as well and sits within your team.”

In my teams, the “junior developer” sat in on all meeting, including with stakeholders. a lot was written down, but a lot was conveyed that was not. The team communicated in writing and verbally.

The developer was not just coding against an API, but (hopefully) had a broader sense of why he was doing it.

Asbestos Muffins · Dec 12, 2025

its 30% faster after they added several gigawatts of data centers to it...

Missing Minute · Dec 12, 2025

enlightened.doggo said:
This is a wild statement. Corporations employ armies of workers for what basically boils down to communication and operations.

I deliberately included some implausible explanations to reinforce the plausibility of other explanations.

the.amazing.null · Dec 12, 2025

Missing Minute said:
It's interesting that despite there being countless different jobs that require composing text, the one job that appears to have most widely adopted LLMs to increase productivity, is developers. Can't be a coincidence. Possible explanations:

Developers are the knowledge workers that are the most adaptable to new ways of doing things.

Developers are the most inefficient knowledge workers.

Something about development makes LLMs particularly useful and effective.

LLMs don't actually provide enough advantages to justify the incredibly wide adoption they have seen and developers are particularly vulnerable to believing that a given technology improves output when it doesn't actually do so.

Developers aren't using LLMs as widely as it seems and instead they are just the loudest about it.

No, it is the number of engineering managers and CTOs telling devs to use them. Then, we do, because a mandate is a mandate.

C.M. Allen · Dec 12, 2025

"We have all this amazing, world-changing, incredibly value and useful technology. We're just, uh, waiting for the right time to reveal it. Yeah, that's it." -- OpenAI

That's the smell of bullshit and desperation...

rleighto · Dec 12, 2025

I am getting close to retirement as a software engineer. I am now really looking forward to it. I have really enjoyed the creative process inherent in software engineering. That creative process is being eroded away by these so-called advancements.

MonKaiju · Dec 12, 2025

Unfortunately this "beat" of Benj's is more of an infomercial than critical journalism...

crepuscularbrolly · Dec 12, 2025

WewusLaddeus said:
Shit-in and shit-out

This is like the human centipede; or AI centipede.

MrRtd · Dec 12, 2025

Article or ad? Seems more so the latter.

GraveDigger · Dec 12, 2025

You should have titled this "OpenAI is eating its own poop".

JonathanSmith · Dec 12, 2025

The Sheep Look Up said:
Bro. If you want a softball interview podcast, do that. If you want to do journalism, this ain't it. Even just a verbatim transcript would be of more worth.

Alas, this has been a problem at ars for years, at least. They really don't like pushing or contradicting their interviewees, or independently confirming things they say. Once you realize that it's hard to stop seeing it. I like ars - I've been here a long time now - but I really wish they'd commit more to doing research.

maxoakland said:
Great questions! The kinds of questions journalists are paid to ask. I wonder why the journalist writing so many of these AI articles never thinks to ask these types of questions. Even when the comment section is full of suggestions for this very thing

And why doesn't Ars hire a better journalist since their current writer is failing at basic journalism? They could easily find someone like you right in the comment section if they wanted to! Why are they ignoring great options for hard-hitting journalism?

Guess we'll never know!

It's not just AI though. I'm a neuroscience nerd, which is where I first noticed the lack of investigation or skepticism. Their reporting on things I knew something about always just seems to be repeating what one source said about a topic. They rarely go to an independent source to try and fact check.

mathguru · Dec 12, 2025

Respectfully, this is incredibly uncritical and even naive coverage.

I’ll tell you right now I absolutely do not believe these claims. This feels like marketing.

danrien · Dec 12, 2025

All humanity is not gonna open an IDE or even know what a terminal is

Man I use this thing, I type commands into it, and the computer follows the commands. Some might call it a chat interface, some might call it a terminal. When I want to make full programs, I open up an editor that natively understands what I type into it, like it's integrated the things needed to develop the programs. What are you going to call your alternative?

rachel612 · Dec 12, 2025

AndrewZ said:
OpenAI has a long history of "grand exaggeration" when it comes to AI's supposed capabilities and achievements. Could it be that they are pumping up their coming IPO?

I do not believe anything that comes out of OpenAI.

It occurred to me that post-IPO Altman will have to be a lot more careful about the truthfulness of his statements.

Then I remembered that for that to matter the US would need a functioning SEC and Department of Justice.

Oh well. The IPO (and the cash out for early investors that goes with it) may be the thing that pops the bubble.

Missing Minute · Dec 12, 2025

ej24 said:
Or developers make LLMs so LLMs are uniquely tailored (biased) for developers. If chemists, biologists or physicists knew enough programming to make an LLM it may be wildly different than what software developers have made. LLMs are a product of their creators.

Yeah the AI would likely be substantially different, experts in those fields often have safety and ethics training.

Dusfud · Dec 12, 2025

randomcat said:
shipping != improvement

Bra, we just started to force push to main, and we started shipping new code in record time!

Stanistani · Dec 12, 2025

Dusfud said:
Bra, we just started to force push to main, and we started shipping new code in record time!

Now these points of data make a beautiful line
And we're out of beta, we're releasing on time

Stanistani · Dec 12, 2025

maxoakland said:
Seriously. I'm getting tired of Arstechnica articles that could've been a press release. For some reason, they're always about AI too.

I came to Ars because it had in-depth journalism about tech. The writers were knowledgable and not easily swayed by market speak.

Maybe Ars isn't the place to find that kind of journalism anymore?

It isn't. Their parent corporation made a devil's deal with OpenAI, and its reporting on the subject has never been the same since.

Link: Condé Nast Announces Partnership with OpenAI

faffod · Dec 13, 2025

maxoakland said:
Seriously. I'm getting tired of Arstechnica articles that could've been a press release. For some reason, they're always about AI too.

I came to Ars because it had in-depth journalism about tech. The writers were knowledgable and not easily swayed by market speak.

Maybe Ars isn't the place to find that kind of journalism anymore?

But the commentariat is second to none. I saw the article headline and thought "Ohhhh... interesting! Let's read the comments" Sometimes the comments make me think that the article is also worth reading. The comments are always worth the visit.
Thank you, and everyone else.

macosandlinux · Dec 13, 2025

AndrewZ said:
OpenAI has a long history of "grand exaggeration" when it comes to AI's supposed capabilities and achievements. Could it be that they are pumping up their coming IPO?

Can we see an independent review of these capabilities??

Sam Altman's previous eye-scanning venture literally partnered with criminals. Want the sources, I am happy to PM you.

gg555 · Dec 13, 2025

Robin-3 said:
As a side note, I'd love to know how much of the increased use of AI is from employees being told in no uncertain terms "management expects everyone to integrate AI into their daily tasks ASAP," whether that integration makes any sense or not.

This is definitely what's happening at some companies:

View: https://bsky.app/profile/alexhanna.bsky.social/post/3m4g6eti5mk2m

A company taking the position that whether or not you use "AI" coding tools will be considered in your performance review.

Dadlyedly · Dec 13, 2025

If these tools were actually any good, I would be quite concerned about letting an AI code itself.

Starouscz · Dec 13, 2025

Missing Minute said:
It's interesting that despite there being countless different jobs that require composing text, the one job that appears to have most widely adopted LLMs to increase productivity, is developers. Can't be a coincidence. Possible explanations:

Developers are the knowledge workers that are the most adaptable to new ways of doing things.

Developers are the most inefficient knowledge workers.

Something about development makes LLMs particularly useful and effective.

LLMs don't actually provide enough advantages to justify the incredibly wide adoption they have seen and developers are particularly vulnerable to believing that a given technology improves output when it doesn't actually do so.

Developers aren't using LLMs as widely as it seems and instead they are just the loudest about it.

developers are the most expensive text workers

on code you can easily check if it is "correct" - atleast if it compiles.

Nasty errors in a text are more difficult to spot and handle

gg555 · Dec 13, 2025

iollmann said:
This should be utterly unsurprising to anyone familiar with build tools history. In an effort to eat their own dogfood, build tools have traditionally been used to build themselves, whenever possible. Clang-llvm builds clang-llvm. It is the natural next step for a AI code development project to author and build itself. If it’s producing poor quality code that is clearly a bug in the code base / weights itself and we can fix that and then hypothetically see the improvement over the problem in question but also the rest of the code base. It is the sensible way to move forward.

When the time comes we would also expect to see robots built not in factories but by other robots from a bucket of spare parts. A factory can only deliver the throughput that it was specced for and is forever limited to that until you build another. Robots building robots can grow exponentially and is basically only limited by the logistics of delivering parts to an ever expanding body of robots. (Insert comical descriptions of rabbits with an unlimited supply of food reproducing faster than the speed of sound.) While a factory can bootstrap this process, ultimately the repeated doubling will dwarf its capacity into irrelevance. The early winner in the robot arms race will be the one who makes robot assembling robots with the shortest generational time and cheapest parts list.

See also 3D printers.

I mean, that's the dream/hype/pr-bull. Where is the evidence that anything like that is happening or possible? Still, if it does happen, we know it's going to be paper clips.

gg555 · Dec 13, 2025

JonathanSmith said:
Alas, this has been a problem at ars for years, at least. They really don't like pushing or contradicting their interviewees, or independently confirming things they say. Once you realize that it's hard to stop seeing it. I like ars - I've been here a long time now - but I really wish they'd commit more to doing research.

It's not just AI though. I'm a neuroscience nerd, which is where I first noticed the lack of investigation or skepticism. Their reporting on things I knew something about always just seems to be repeating what one source said about a topic. They rarely go to an independent source to try and fact check.

To be fair, this is true of a lot of journalism, even at places like the NY Times. Anytime they report on something that I actually know something about, I'm amazed at how wrong the reporting is. But it's especially bad when it comes to reporting on "AI" (a term that is not meaningful unless it is in quotes). Pretty much all journalism is just regurgitating the "AI" hype.

gg555 · Dec 13, 2025

Stanistani said:
It isn't. Their parent corporation made a devil's deal with OpenAI, and its reporting on the subject has never been the same since.

Link: Condé Nast Announces Partnership with OpenAI

The reporting at Ars on "AI" is been pure uncritical hype since always. It long predates Conde Nast's deal with OpenAI.

JMTronicHobbyist · Dec 13, 2025

At least they didn't go and do that one thing that you do when the AI is about to take over, and it's that thing that it uses to gain the abilities. Damnit, what is that one thing? I'm so forgetful sometimes.

Zeppos · Dec 13, 2025

I've been playing with the idea to make a program that can modify itself when I was a student somewhere in the year 2000. No Ai, just a genetic algorithm. Would be nice if it found a way to exit the testing environment through some back door and conquer the world. I never got to it, as you probably figured out by now. I could not find a good scoring function. Not even close. The closest I got was to replicate itself as hell, inspired by rabbits. But the scoring function (not sure what the proper English name is) is actually the big challenge. I bet I am not the first one to think of something like this. Does anyone have any pointers on this subject? No worries, my ambitions to conquer the world have faded away together with a big part of my hair. Cheers!

John.Flick · Dec 13, 2025

I think about turning it off almost everyday. I'm fighting it 75% of the time. It's gotten maybe 5% better a trillion dollars later too...

Uncivil Servant · Dec 13, 2025

Fabermetrics said:
We've reached the NFTs of NFTs phase of the circlejerk.

What worries me is that my experience with multibillion-dollar clusterfucks is that it feels like we've not yet reached the end of the beginning. We're at the part of the horror movie where it's perfectly safe to go down to the basement alone to grab some stuff.

The thing about multibillion-dollar clusterfucks is that unlike in horror movies, people will just keep going down into that metaphorical basement long after people have started to go missing, and just consider it the price of doing business. No scary music, no ironic "I'll be right back", just an honest, "if I return".

On the other hand, who knows, maybe they'll actually find an inflection point where clusterfucks scale up to the point that they somehow become useful? Like the clusterfuck equivalent of a defibrillator?

TheOldChevy · Dec 13, 2025

Today actual programmers can solve issues generated by AI.

Next step will be AI generates its own programming language for "better efficiency", language that will not be understandable by actual programmers. And goes straight in the wall. Just faster.

TheOldChevy · Dec 13, 2025

Missing Minute said:
It's interesting that despite there being countless different jobs that require composing text, the one job that appears to have most widely adopted LLMs to increase productivity, is developers. Can't be a coincidence. Possible explanations:

Developers are the knowledge workers that are the most adaptable to new ways of doing things.

Developers are the most inefficient knowledge workers.

Something about development makes LLMs particularly useful and effective.

LLMs don't actually provide enough advantages to justify the incredibly wide adoption they have seen and developers are particularly vulnerable to believing that a given technology improves output when it doesn't actually do so.

Developers aren't using LLMs as widely as it seems and instead they are just the loudest about it.

Unfortunately, LLMs are also used a lot in HR, Finance and other critical domains. And with far less control than in development.

alxx · Dec 13, 2025

Starouscz said:
developers are the most expensive text workers

on code you can easily check if it is "correct" - atleast if it compiles.

Nasty errors in a text are more difficult to spot and handle

Compiling isn't a valid test , that's what unit tests , integration tests , loading testing etc is for.

AlexisR200X · Dec 13, 2025

This sounds like it will enable bad actors to cheaply pump out more scams and malware with less sophistication requirements rather than the useful high quality work developers are expected to make for their clients. Much like the deluge of scams image and video generation are enabling in the online advertising industry.

How OpenAI is using GPT-5 Codex to improve the AI tool itself

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Ars Tribunus Angusticlavius

Ars Praetorian

Wise, Aged Ars Veteran

Seniorius Lurkius

Ars Tribunus Angusticlavius

Seniorius Lurkius

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Centurion

Ars Centurion

Ars Tribunus Militum

Ars Centurion

Wise, Aged Ars Veteran

Ars Centurion

Wise, Aged Ars Veteran

Ars Praetorian

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Praetorian

Ars Praefectus

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Centurion

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Tribunus Militum

Ars Praefectus

Ars Praefectus