How OpenAI is using GPT-5 Codex to improve the AI tool itself

Post content hidden for low score. Show…

maxoakland

Ars Scholae Palatinae
1,309
OpenAI has a long history of "grand exaggeration" when it comes to AI's supposed capabilities and achievements. Could it be that they are pumping up their coming IPO?

Can we see an independent review of these capabilities??
Great questions! The kinds of questions journalists are paid to ask. I wonder why the journalist writing so many of these AI articles never thinks to ask these types of questions. Even when the comment section is full of suggestions for this very thing

And why doesn't Ars hire a better journalist since their current writer is failing at basic journalism? They could easily find someone like you right in the comment section if they wanted to! Why are they ignoring great options for hard-hitting journalism?

Guess we'll never know!
 
Upvote
38 (52 / -14)

_crane

Wise, Aged Ars Veteran
214
I use them everyday. Github Co-Pilot (Claude, ChatGPT, etc.) If you're getting AI slop then you need to change what you're doing. I'm looking like a super star with fairly typical human in the middle stuff. Python scripts, java, javascript, etc. Just fine. Also great for Teraform and Devops stuff.

Hell, I threw it at a bunch of COBOL and it did really well.

AI slop is a problem, but it's one addressable by understanding prompting techniques, context compression, and session management.
I'm going to get a good grade in asking the computer to make something for me, something that is both normal to want and possible to achieve.
 
Upvote
-9 (9 / -18)

terrydactyl

Ars Tribunus Angusticlavius
7,871
Subscriptor
Glad I'm retired and can let the next generation figure this out

OpenAI’s approach treats Codex as what Bayes called “a junior developer” that the company hopes will graduate into a senior developer over time. “If you were onboarding a junior developer, how would you onboard them? You give them a Slack account, you give them a Linear account,” Bayes said. “It’s not just this tool that you go to in the terminal, but it’s something that comes to you as well and sits within your team.”
In my teams, the “junior developer” sat in on all meeting, including with stakeholders. a lot was written down, but a lot was conveyed that was not. The team communicated in writing and verbally.

The developer was not just coding against an API, but (hopefully) had a broader sense of why he was doing it.
 
Upvote
78 (78 / 0)
It's interesting that despite there being countless different jobs that require composing text, the one job that appears to have most widely adopted LLMs to increase productivity, is developers. Can't be a coincidence. Possible explanations:
  1. Developers are the knowledge workers that are the most adaptable to new ways of doing things.
  2. Developers are the most inefficient knowledge workers.
  3. Something about development makes LLMs particularly useful and effective.
  4. LLMs don't actually provide enough advantages to justify the incredibly wide adoption they have seen and developers are particularly vulnerable to believing that a given technology improves output when it doesn't actually do so.
  5. Developers aren't using LLMs as widely as it seems and instead they are just the loudest about it.
No, it is the number of engineering managers and CTOs telling devs to use them. Then, we do, because a mandate is a mandate.
 
Upvote
61 (61 / 0)
Post content hidden for low score. Show…
Bro. If you want a softball interview podcast, do that. If you want to do journalism, this ain't it. Even just a verbatim transcript would be of more worth.
Alas, this has been a problem at ars for years, at least. They really don't like pushing or contradicting their interviewees, or independently confirming things they say. Once you realize that it's hard to stop seeing it. I like ars - I've been here a long time now - but I really wish they'd commit more to doing research.

Great questions! The kinds of questions journalists are paid to ask. I wonder why the journalist writing so many of these AI articles never thinks to ask these types of questions. Even when the comment section is full of suggestions for this very thing

And why doesn't Ars hire a better journalist since their current writer is failing at basic journalism? They could easily find someone like you right in the comment section if they wanted to! Why are they ignoring great options for hard-hitting journalism?

Guess we'll never know!
It's not just AI though. I'm a neuroscience nerd, which is where I first noticed the lack of investigation or skepticism. Their reporting on things I knew something about always just seems to be repeating what one source said about a topic. They rarely go to an independent source to try and fact check.
 
Last edited:
Upvote
74 (74 / 0)

danrien

Wise, Aged Ars Veteran
172
Subscriptor
All humanity is not gonna open an IDE or even know what a terminal is
Man I use this thing, I type commands into it, and the computer follows the commands. Some might call it a chat interface, some might call it a terminal. When I want to make full programs, I open up an editor that natively understands what I type into it, like it's integrated the things needed to develop the programs. What are you going to call your alternative?
 
Upvote
13 (13 / 0)

rachel612

Ars Centurion
383
Subscriptor++
OpenAI has a long history of "grand exaggeration" when it comes to AI's supposed capabilities and achievements. Could it be that they are pumping up their coming IPO?
I do not believe anything that comes out of OpenAI.

It occurred to me that post-IPO Altman will have to be a lot more careful about the truthfulness of his statements.

Then I remembered that for that to matter the US would need a functioning SEC and Department of Justice.

Oh well. The IPO (and the cash out for early investors that goes with it) may be the thing that pops the bubble.
 
Upvote
64 (64 / 0)

Missing Minute

Wise, Aged Ars Veteran
1,386
Or developers make LLMs so LLMs are uniquely tailored (biased) for developers. If chemists, biologists or physicists knew enough programming to make an LLM it may be wildly different than what software developers have made. LLMs are a product of their creators.
Yeah the AI would likely be substantially different, experts in those fields often have safety and ethics training.
 
Upvote
33 (33 / 0)

Stanistani

Smack-Fu Master, in training
86
Seriously. I'm getting tired of Arstechnica articles that could've been a press release. For some reason, they're always about AI too.

I came to Ars because it had in-depth journalism about tech. The writers were knowledgable and not easily swayed by market speak.

Maybe Ars isn't the place to find that kind of journalism anymore?
It isn't. Their parent corporation made a devil's deal with OpenAI, and its reporting on the subject has never been the same since.

Link: Condé Nast Announces Partnership with OpenAI
 
Upvote
94 (98 / -4)

faffod

Ars Praetorian
562
Subscriptor
Seriously. I'm getting tired of Arstechnica articles that could've been a press release. For some reason, they're always about AI too.

I came to Ars because it had in-depth journalism about tech. The writers were knowledgable and not easily swayed by market speak.

Maybe Ars isn't the place to find that kind of journalism anymore?
But the commentariat is second to none. I saw the article headline and thought "Ohhhh... interesting! Let's read the comments" Sometimes the comments make me think that the article is also worth reading. The comments are always worth the visit.
Thank you, and everyone else.
 
Upvote
123 (123 / 0)
OpenAI has a long history of "grand exaggeration" when it comes to AI's supposed capabilities and achievements. Could it be that they are pumping up their coming IPO?

Can we see an independent review of these capabilities??
Sam Altman's previous eye-scanning venture literally partnered with criminals. Want the sources, I am happy to PM you.
 
Upvote
28 (28 / 0)

gg555

Ars Scholae Palatinae
1,146
As a side note, I'd love to know how much of the increased use of AI is from employees being told in no uncertain terms "management expects everyone to integrate AI into their daily tasks ASAP," whether that integration makes any sense or not.
This is definitely what's happening at some companies:

View: https://bsky.app/profile/alexhanna.bsky.social/post/3m4g6eti5mk2m

A company taking the position that whether or not you use "AI" coding tools will be considered in your performance review.
 
Upvote
29 (29 / 0)

Starouscz

Ars Scholae Palatinae
860
Subscriptor
It's interesting that despite there being countless different jobs that require composing text, the one job that appears to have most widely adopted LLMs to increase productivity, is developers. Can't be a coincidence. Possible explanations:
  1. Developers are the knowledge workers that are the most adaptable to new ways of doing things.
  2. Developers are the most inefficient knowledge workers.
  3. Something about development makes LLMs particularly useful and effective.
  4. LLMs don't actually provide enough advantages to justify the incredibly wide adoption they have seen and developers are particularly vulnerable to believing that a given technology improves output when it doesn't actually do so.
  5. Developers aren't using LLMs as widely as it seems and instead they are just the loudest about it.
developers are the most expensive text workers

on code you can easily check if it is "correct" - atleast if it compiles.

Nasty errors in a text are more difficult to spot and handle
 
Upvote
-15 (9 / -24)

gg555

Ars Scholae Palatinae
1,146
This should be utterly unsurprising to anyone familiar with build tools history. In an effort to eat their own dogfood, build tools have traditionally been used to build themselves, whenever possible. Clang-llvm builds clang-llvm. It is the natural next step for a AI code development project to author and build itself. If it’s producing poor quality code that is clearly a bug in the code base / weights itself and we can fix that and then hypothetically see the improvement over the problem in question but also the rest of the code base. It is the sensible way to move forward.

When the time comes we would also expect to see robots built not in factories but by other robots from a bucket of spare parts. A factory can only deliver the throughput that it was specced for and is forever limited to that until you build another. Robots building robots can grow exponentially and is basically only limited by the logistics of delivering parts to an ever expanding body of robots. (Insert comical descriptions of rabbits with an unlimited supply of food reproducing faster than the speed of sound.) While a factory can bootstrap this process, ultimately the repeated doubling will dwarf its capacity into irrelevance. The early winner in the robot arms race will be the one who makes robot assembling robots with the shortest generational time and cheapest parts list.

See also 3D printers.
I mean, that's the dream/hype/pr-bull. Where is the evidence that anything like that is happening or possible? Still, if it does happen, we know it's going to be paper clips.
 
Upvote
11 (12 / -1)

gg555

Ars Scholae Palatinae
1,146
Alas, this has been a problem at ars for years, at least. They really don't like pushing or contradicting their interviewees, or independently confirming things they say. Once you realize that it's hard to stop seeing it. I like ars - I've been here a long time now - but I really wish they'd commit more to doing research.


It's not just AI though. I'm a neuroscience nerd, which is where I first noticed the lack of investigation or skepticism. Their reporting on things I knew something about always just seems to be repeating what one source said about a topic. They rarely go to an independent source to try and fact check.
To be fair, this is true of a lot of journalism, even at places like the NY Times. Anytime they report on something that I actually know something about, I'm amazed at how wrong the reporting is. But it's especially bad when it comes to reporting on "AI" (a term that is not meaningful unless it is in quotes). Pretty much all journalism is just regurgitating the "AI" hype.
 
Upvote
36 (37 / -1)

gg555

Ars Scholae Palatinae
1,146
Upvote
36 (39 / -3)

Zeppos

Ars Tribunus Militum
2,864
Subscriptor
I've been playing with the idea to make a program that can modify itself when I was a student somewhere in the year 2000. No Ai, just a genetic algorithm. Would be nice if it found a way to exit the testing environment through some back door and conquer the world. I never got to it, as you probably figured out by now. I could not find a good scoring function. Not even close. The closest I got was to replicate itself as hell, inspired by rabbits. But the scoring function (not sure what the proper English name is) is actually the big challenge. I bet I am not the first one to think of something like this. Does anyone have any pointers on this subject? No worries, my ambitions to conquer the world have faded away together with a big part of my hair. Cheers!
 
Upvote
12 (13 / -1)
Post content hidden for low score. Show…

Uncivil Servant

Ars Scholae Palatinae
4,667
Subscriptor
We've reached the NFTs of NFTs phase of the circlejerk.

What worries me is that my experience with multibillion-dollar clusterfucks is that it feels like we've not yet reached the end of the beginning. We're at the part of the horror movie where it's perfectly safe to go down to the basement alone to grab some stuff.

The thing about multibillion-dollar clusterfucks is that unlike in horror movies, people will just keep going down into that metaphorical basement long after people have started to go missing, and just consider it the price of doing business. No scary music, no ironic "I'll be right back", just an honest, "if I return".

On the other hand, who knows, maybe they'll actually find an inflection point where clusterfucks scale up to the point that they somehow become useful? Like the clusterfuck equivalent of a defibrillator?
 
Upvote
33 (33 / 0)

TheOldChevy

Ars Tribunus Militum
1,538
Subscriptor
It's interesting that despite there being countless different jobs that require composing text, the one job that appears to have most widely adopted LLMs to increase productivity, is developers. Can't be a coincidence. Possible explanations:
  1. Developers are the knowledge workers that are the most adaptable to new ways of doing things.
  2. Developers are the most inefficient knowledge workers.
  3. Something about development makes LLMs particularly useful and effective.
  4. LLMs don't actually provide enough advantages to justify the incredibly wide adoption they have seen and developers are particularly vulnerable to believing that a given technology improves output when it doesn't actually do so.
  5. Developers aren't using LLMs as widely as it seems and instead they are just the loudest about it.
Unfortunately, LLMs are also used a lot in HR, Finance and other critical domains. And with far less control than in development.
 
Upvote
31 (31 / 0)

alxx

Ars Praefectus
4,982
Subscriptor++
developers are the most expensive text workers

on code you can easily check if it is "correct" - atleast if it compiles.

Nasty errors in a text are more difficult to spot and handle
Compiling isn't a valid test , that's what unit tests , integration tests , loading testing etc is for.
 
Upvote
49 (49 / 0)
This sounds like it will enable bad actors to cheaply pump out more scams and malware with less sophistication requirements rather than the useful high quality work developers are expected to make for their clients. Much like the deluge of scams image and video generation are enabling in the online advertising industry.
 
Upvote
31 (31 / 0)