New Codex features include the ability to use your computer in the background

this is to compete with anthropic cowork, MS also included similar functionality in copilot (copilot cowork)

based on my experience with Codex we moved it into its own container so it didn't do things it shouldn't. Was interesting to watch it install software from the internet and then delete files without asking for approval

IIRC the sandboxing implementation for windows codex isn't as strong as the linux and mac versions
 
Upvote
24 (24 / 0)

worldeight

Seniorius Lurkius
12
Subscriptor
1776365371258.png


I trust my buddy more than this…
 
Upvote
21 (23 / -2)
Post content hidden for low score. Show…
I switched to Codex last month and the improvement has been noticeable.

Our company used to work in conjunction with a team in India and Europe. After Codex, we haven't used the India team since, even though we still have a few months on contract. Europe is handling VR and design work, which we haven't figured out a workflow for on Codex yet.

Note, debugging has seriously improved since last year... Codex was able to catch several bugs we left in our code and seriously speed up our sloppy runtime.

Now I just wish it wasn't blowing a hole in our budget. Codex cost is now several times higher than AWS. On the other hand, one dev has produced a million lines of code since we started and cleared our entire features queue... for the first time ever, admin staff is silent :LOL:.
So what you're saying is...is that at least one of your developers is going to be sacked for poor work quality when the entire thing melts down because no one audited it for bugs before hitting prod.
 
Upvote
21 (25 / -4)

PghMike4

Smack-Fu Master, in training
93
I switched to Codex last month and the improvement has been noticeable.

Our company used to work in conjunction with a team in India and Europe. After Codex, we haven't used the India team since, even though we still have a few months on contract. Europe is handling VR and design work, which we haven't figured out a workflow for on Codex yet.

Note, debugging has seriously improved since last year... Codex was able to catch several bugs we left in our code and seriously speed up our sloppy runtime.

Now I just wish it wasn't blowing a hole in our budget. Codex cost is now several times higher than AWS. On the other hand, one dev has produced a million lines of code since we started and cleared our entire features queue... for the first time ever, admin staff is silent :LOL:.
My experience with Claude Code is that someone needs to review every change it makes. Who exactly is reviewing your dev’s One Million lines of code?

Or is this an Austin Powers thing?
 
Upvote
21 (22 / -1)
My experience with Claude Code is that someone needs to review every change it makes. Who exactly is reviewing your dev’s One Million lines of code?

Or is this an Austin Powers thing?
NGL.

I actually had some consultants do a presentation on a data processing system at work. Which there were lots of usability problems (it was still being designed). There was also a whole problem with being far too trusting of user-inputs and leading-the-witness in said inputs when you absolutely shouldn't (never trust the witness, so to speak). But their idea of coding input data was to have an AI do it. Uh--okay. Their second step of data processing was, not joking, and they were 100% serious, have another different AI check the first AI's homework.

And that...was last year. And amazingly--these guys doing this cross-country Zoom presentation didn't get laughed out of the room. Guessing all the other SMEs in the Zoom were too mortified to say what they thought. Because it was all SMEs.
 
Upvote
11 (11 / 0)
I switched to Codex last month and the improvement has been noticeable.

Our company used to work in conjunction with a team in India and Europe. After Codex, we haven't used the India team since, even though we still have a few months on contract. Europe is handling VR and design work, which we haven't figured out a workflow for on Codex yet.

Note, debugging has seriously improved since last year... Codex was able to catch several bugs we left in our code and seriously speed up our sloppy runtime.

Now I just wish it wasn't blowing a hole in our budget. Codex cost is now several times higher than AWS. On the other hand, one dev has produced a million lines of code since we started and cleared our entire features queue... for the first time ever, admin staff is silent :LOL:.
You.. my friend, are why we can't have nice things.
 
Upvote
11 (13 / -2)
i've been trying to brush up my skills for a personal project and man, it's never been so frustrating to learn something now. before, i just had to take part in the humiliation ritual that was to ask for help on stack overflow. now, people just tell me to use [insert LLM here] and it pisses me the fuck off, not because i think i'm above using the tools, but because when i do, it doesn't help much. plus, i find the idea of having to learn how to craft the perfect prompt to receive the result that i want is beyond exhausting; it feels like a waste of time and i learn absolutely nothing. i don't wanna vibe code something i can't maintain later, man. the issue could be that i'm just really stupid, yes, but also i hate that the zeitgeist now is removing as much of your agency as possible. sure, allow the bot to control your computer entirely, who cares.

everything sucks. i'm tired, boss. anyway, rant over. godspeed to the brave souls who use this stuff, i guess.
 
Upvote
16 (16 / 0)
So what you're saying is...is that at least one of your developers is going to be sacked for poor work quality when the entire thing melts down because no one audited it for bugs before hitting prod.
We haven't had issues so far, this is all internal workflows...

I am handling the agent stack so I do have a bunch of little agents running around stress testing everything and logging for the main model to pick up.
 
Upvote
-1 (3 / -4)
My experience with Claude Code is that someone needs to review every change it makes. Who exactly is reviewing your dev’s One Million lines of code?

Or is this an Austin Powers thing?
I tried using Claude and also found it needed far too much handholding..

GPT 5.4 high is great. I don't know how the other model worked, but try it! Burns through tokens.

I wouldn't try to design a website but building processes for staff... stuff that employees used to kludge in Zapier together... the LLM makes it a breeze.

this is to compete with anthropic cowork, MS also included similar functionality in copilot (copilot cowork)

based on my experience with Codex we moved it into its own container so it didn't do things it shouldn't. Was interesting to watch it install software from the internet and then delete files without asking for approval

IIRC the sandboxing implementation for windows codex isn't as strong as the linux and mac versions

wow, Codex has never done that to me, but I've got it running in its own AWS sandbox that is siloed from production... the only thing it does locally is keep a copy of the code.

These are all internal tools though ...

Just not having to type commands in the aws cli make it worth its weight in gold.

Absolutely pricey though! It's costing about as much as our team in India used to... so we will transfer that budget.
 
Last edited:
Upvote
2 (4 / -2)
i've been trying to brush up my skills for a personal project and man, it's never been so frustrating to learn something now. before, i just had to take part in the humiliation ritual that was to ask for help on stack overflow. now, people just tell me to use [insert LLM here] and it pisses me the fuck off, not because i think i'm above using the tools, but because when i do, it doesn't help much. plus, i find the idea of having to learn how to craft the perfect prompt to receive the result that i want is beyond exhausting; it feels like a waste of time and i learn absolutely nothing. i don't wanna vibe code something i can't maintain later, man. the issue could be that i'm just really stupid, yes, but also i hate that the zeitgeist now is removing as much of your agency as possible. sure, allow the bot to control your computer entirely, who cares.

everything sucks. i'm tired, boss. anyway, rant over. godspeed to the brave souls who use this stuff, i guess.

You don't have to craft the perfect prompt, you need to think critically, like you're writing a research paper or like a lawyer preparing a brief.

Break down what you want done, do some research on possible methods of doing it because the LLM may be too literal and miss something easier to implement...

Then build out an outline for yourself.... once you have that, create a step by step workflow for the llm, making sure that it saves each step into its schema markdown file. Also keep a separate schema that it keeps for shell access. Then you can run docker and give it access to a silo'd folder on your hard drive or on a virtual machine somewhere.

After that, as long as you made your outline comprehensive, you can keep the LLM on track. Do one step a time, it will lose its context quickly without the schema and markdown files. I find it a better method than having dozens of different markdown files in different chats... I focus the LLM on what it needs to work on.


I prefer Codex but Claude Code is probably easier to start with. You should go on the forum and visit programmer's symposium... no one there will dismiss your questions.
 
Upvote
1 (6 / -5)

MikeWise1618

Seniorius Lurkius
32
Subscriptor++
For anyone who is paying attention, Claude Code is far ahead of Codex and OpenAI is playing catch-up, and not likely to get there either. The talent and momentum is at Anthropic now. And other places.

The whole article looks like a bought and paid-for ad for OpenAI, and the fact that it doesn't mention that similar features have been available on competing (Perplexity, Anthropic) and open-source (too many to list, but lets start with OpenClaw) platforms for months already is lamentable.
 
Upvote
5 (6 / -1)
For anyone who is paying attention, Claude Code is far ahead of Codex and OpenAI is playing catch-up, and not likely to get there either. The talent and momentum is at Anthropic now. And other places.

The whole article looks like a bought and paid-for ad for OpenAI, and the fact that it doesn't mention that similar features have been available on competing (Perplexity, Anthropic) and open-source (too many to list, but lets start with OpenClaw) platforms for months already is lamentable.
you're behind the times... Anthropic kneecapped Opus (maybe to divert resources to Mythos and the government). Search on reddit, you'll read the litany of complaints.
 
Upvote
-2 (2 / -4)
We haven't had issues so far, this is all internal workflows...
You had a request queue for internal workflows that took a million lines of code to empty?

Oh no, let me guess... you have a multi-tenant setup and your colleague had your LLM crank out (almost) identical scripts to manage them, right? Or make (semi) identical code changes to 50 different repos or branches?

You are absolutely fucked. I say this with love. Run away while you can.
 
Upvote
6 (6 / 0)
Break down what you want done, do some research on possible methods of doing it because the LLM may be too literal and miss something easier to implement...

Then build out an outline for yourself.... once you have that, create a step by step workflow for the llm, making sure that it saves each step into its schema markdown file. Also keep a separate schema that it keeps for shell access. Then you can run docker and give it access to a silo'd folder on your hard drive or on a virtual machine somewhere.

After that, as long as you made your outline comprehensive, you can keep the LLM on track. Do one step a time, it will lose its context quickly without the schema and markdown files. I find it a better method than having dozens of different markdown files in different chats... I focus the LLM on what it needs to work on.
If doing research, writing an outline, setting up guardrails, setting up separate sandboxes, then riding herd on the AI step by step by step, revising your outline, code reviewing and code reviewing and updating and updating and reviewing.... if doing all of that is faster than just making the damned change, what the hell were you doing before? Did you write code with your elbows based on smoke signals?

As I've mentioned elsewhere, we've had ALL the models have a crack at making a relatively minor change/addition (5 tables added, 5 affected, add 15 procs, change 10, maybe 15 ts files -- you know, nothing to sneeze at, but something a single dev could do in a week or two) . We wound up throwing 12 weeks of work at it, between senior architects, senior devs, and recent "AI expert" hires. After the 9th revision was still mostly unusable, we decided just yesterday to throw it all out. I just spent a few hours looking at the database side (because of course all of a sudden the schedule is farked and it needs to be done yesterday), and I can keep about 80%. It's about a toss-up what's faster: redoing it from scratch the way I would have done it anyway, or fixing the most glaring issues in that remaining 20%. It would just NOT do what we asked, prompting and iterating be damned.
 
Upvote
4 (4 / 0)
For anyone who is paying attention, Claude Code is far ahead of Codex and OpenAI is playing catch-up, and not likely to get there either. The talent and momentum is at Anthropic now. And other places.
And next week, it may be the other way around. Or maybe one works better for some problems, and suck at other kinds.

That's why we did this one comprehensive test at work, because every other time we tried and reached out for what we did wrong, the replies ALWAYS came down to "you're using the wrong model, idiot". We gave all the prime ones a shot at our problem, and they all sucked. Some more than others, and all of them in subtly different ways, but they all sucked.

That being said, anyone letting Grok near their code (politics aside) is clinically insane.
 
Upvote
3 (3 / 0)
I actually had some consultants do a presentation on a data processing system at work.
If you think that's bad, a big part of the push for AI in our company is because one of our competitors demonstrated a feature that they coded using AI (or at least, that's what they said).

Our app has had that feature for over a decade.
 
Upvote
3 (3 / 0)
As I've mentioned elsewhere, we've had ALL the models have a crack at making a relatively minor change/addition (5 tables added, 5 affected, add 15 procs, change 10, maybe 15 ts files -- you know, nothing to sneeze at, but something a single dev could do in a week or two) . We wound up throwing 12 weeks of work at it, between senior architects, senior devs, and recent "AI expert" hires. After the 9th revision was still mostly unusable, we decided just yesterday to throw it all out. I just spent a few hours looking at the database side (because of course all of a sudden the schedule is farked and it needs to be done yesterday), and I can keep about 80%. It's about a toss-up what's faster: redoing it from scratch the way I would have done it anyway, or fixing the most glaring issues in that remaining 20%. It would just NOT do what we asked, prompting and iterating be damned.
I am not experiencing the issues that you have and neither is anyone else on our team. I suspect our workflow is very different since you're working with typescript. We tried to make some on the fly changes in our design and it was less work to just pass it on to the EU team.

Still, I can't understand how 5 tables and 15 ts files could break GPT 5.4 High in Codex. Working with codex has made us hyper organized with our code...otherwise it will break more than it fixes.
 
Last edited:
Upvote
-3 (0 / -3)
You had a request queue for internal workflows that took a million lines of code to empty?

Oh no, let me guess... you have a multi-tenant setup and your colleague had your LLM crank out (almost) identical scripts to manage them, right? Or make (semi) identical code changes to 50 different repos or branches?

You are absolutely fucked. I say this with love. Run away while you can.

Yeah we decided to make a custom agentic stack for all the employees, a custom reference model that ingested all of our data (tens of thousands of pages of text and images) and semi-autonomous agents that are updating each other based on some very strict criteria.

The employees can use the agents for specific use cases that we used to use automations/zapier but the great thing is that now the agents send us reports on what everyone is up to. We also migrated our customer support to agents.. with daily reports sent to a team member who confirms the responses.. so far no mistakes yet. There are dozens more agents we've built and I keep coming up with more ideas. Some of our work requires critical analysis and I am testing a few agents that I coded with some unusual decision making mechanisms to see if we can get them to be even more autonomous. The agents have taken about 70% of the workload off our team, who now can focus on what they are supposed to be doing, face time with clients and administration. I think 60-70% is the sweet spot.

It's so exciting..I haven't felt this way since I was 16 and trying to build a media server in windows NT without any documentation. It's the wild wild west out there but for the first time in years, anything feels possible.

For me, it's not AGI, it's not OpenAI or Claude, it's about making programming exciting again.
 
Upvote
-3 (1 / -4)
i've been trying to brush up my skills for a personal project and man, it's never been so frustrating to learn something now. before, i just had to take part in the humiliation ritual that was to ask for help on stack overflow. now, people just tell me to use [insert LLM here] and it pisses me the fuck off, not because i think i'm above using the tools, but because when i do, it doesn't help much. plus, i find the idea of having to learn how to craft the perfect prompt to receive the result that i want is beyond exhausting; it feels like a waste of time and i learn absolutely nothing. i don't wanna vibe code something i can't maintain later, man. the issue could be that i'm just really stupid, yes, but also i hate that the zeitgeist now is removing as much of your agency as possible. sure, allow the bot to control your computer entirely, who cares.

everything sucks. i'm tired, boss. anyway, rant over. godspeed to the brave souls who use this stuff, i guess.
Yes, I was trying to use the new hotness of Claude Code for developing custom actions for a complex InstallShield Basic MSI setup and it kept giving me stale answers that might have worked back in InstallShield 2010, even though I included InstallShield Premier 2025 in my prompts.

"Oh, right. that error tells us that you need to do these new steps instead. Of course they won't work either but after I've wasted another half-hour of your time I'll give you a third set of steps that somehow works even less!"

It doesn't help that Revenra's own online KB articles are years out of date or incomplete, for example the KB for passing properties to a deferred custom action.
 
Upvote
0 (0 / 0)

hatfarm

Ars Scholae Palatinae
1,134
Subscriptor++
I'm Senior Software Engineer, and honestly, printers are one of the things I'd rather just be rid of entirely. They rarely seem to work, and I've had one get straight up compromised (I only knew because nothing on my router worked unless I disconnected the printer). I'm fairly automated here, but I don't run anything that has to call outside of my home, so it's definitely limited.
 
Upvote
1 (1 / 0)
NGL.

I actually had some consultants do a presentation on a data processing system at work. Which there were lots of usability problems (it was still being designed). There was also a whole problem with being far too trusting of user-inputs and leading-the-witness in said inputs when you absolutely shouldn't (never trust the witness, so to speak). But their idea of coding input data was to have an AI do it. Uh--okay. Their second step of data processing was, not joking, and they were 100% serious, have another different AI check the first AI's homework.

And that...was last year. And amazingly--these guys doing this cross-country Zoom presentation didn't get laughed out of the room. Guessing all the other SMEs in the Zoom were too mortified to say what they thought. Because it was all SMEs.

With those recommendations, I'd ask said consultants why they expected to be paid. An AI can apparently do their job, too.
 
Upvote
1 (1 / 0)