The Ars Technica AI coding agent test: Minesweeper edition

While I agree AI can't literally be a boss, because nobody wants to report to a bot, I disagree that bosses aren't affected.

Middle Management is on the chopping block for two reasons.
1) the number of managers you need is proportional to the workers you have.
2) if you look at the job stealing benchmark-- gdpval-- you'll see many (most?) of the tasks actually are middle management tasks, creating powerpoints, spreadsheets, checklists, training docs, proposals, purchasing decisions, etc.

Really, I think AI is coming for lots of jobs that we don't bother talking about, because they aren't special to us like coding and artists...

Jobs realistically on the chopping block right now: customer service, help desk, legal-aids, proofreaders, transcribers, translators, webpage designers, call centers... basically any semi-predictable, low-stakes desk job.

And in the long run are a lot of other jobs like accounting, that nobody talks about because nobody cares about.
I don't see it. The systems don't have any judgment. Only people have that. And no one is actually working on new tech that will have judgment.

Sure, we'll need less accountants per unit of accounting work, but we won't dispense with the expertise. And in middle management, the same story.
 
Upvote
0 (0 / 0)
I don't see it. The systems don't have any judgment. Only people have that. And no one is actually working on new tech that will have judgment.

Sure, we'll need less accountants per unit of accounting work, but we won't dispense with the expertise. And in middle management, the same story.
Details of what the systems can and can't theoretically do aside, the work is all somewhat fungible.

Less per unit, means less people employed and/or inventing new work for them to do.
 
Upvote
3 (3 / 0)
Post content hidden for low score. Show…

artimusprime

Wise, Aged Ars Veteran
102
currently, the best way to use these products is to not one shot but to take advantage of the ability to design and think through a problem, then work with it to generate a list of test cases and then to methodically work through that list. you should also have a specific context management strategy. these tools are not smart enough to manage context on their own. working in this way, im able to get really good results very quickly.

To me, this experiment basically was: lets use a tool in a non-opitmal way and then judge it.
 
Upvote
0 (1 / -1)
currently, the best way to use these products is to not one shot but to take advantage of the ability to design and think through a problem, then work with it to generate a list of test cases and then to methodically work through that list. you should also have a specific context management strategy. these tools are not smart enough to manage context on their own. working in this way, im able to get really good results very quickly.

To me, this experiment basically was: lets use a tool in a non-opitmal way and then judge it.
Which IMHO is still a valid use case. Not all code is rigorously implemented with CI/CD or even separate test/prod environments. In the workplace, one big use case for a lot of these tools will be from business people who make a tool to automate some process they do daily or even things like office automation.

These kinds of tools are so big in my company they call them end user tools. LLMs will expand the kinds of tools that get built like this and none of them will start out with a well thought out plan or any kind of test-driven development.

And that's not even considering home use cases or small office tools. A ton of use will be prompts like in this article. Something like "make me an app that lets me track X" and then they'll iterate through features that they want to add.
 
Upvote
3 (4 / -1)
I pasted the prompt from the article verbatim into Gemini Pro (3) and it gave me the code for a single html file which contained the HTML, CSS, and JavaScript. The game functioned perfectly well and handled flagging using long presses on mobile and right-clicks for desktop. The added bonus feature it added was that one random safe space contained a shield. When it's clicked, you now have the shield active (it shows in the header at the top). With the shield active, if you make one mistake by clicking a bomb, it automatically protects you from that bomb and just flags the square instead. So, it's like a one-time-use "get out of jail" feature that you first have to find in a safe square. It was a nice touch. The sounds also worked well, though they were the typical primitive beeps and blips.
 
Upvote
1 (1 / 0)
Details of what the systems can and can't theoretically do aside, the work is all somewhat fungible.

Less per unit, means less people employed and/or inventing new work for them to do.
Yes and no. If the economy is a crystal frozen in time that never expands or contracts, then yes. If things change (eg more jobs, or less jobs, due to events, ancillary inventions, tastes) then no.

For example, 200 years ago the number of jobs in software was zero. Tomorrow, who knows? Things change.
 
Upvote
3 (3 / 0)

42Kodiak42

Ars Scholae Palatinae
1,488
If one person can do that much work how is the work valuable? Sounds like this is going to turn into outsourced work like everything else with a handful of people running the show. What's going to happen to all the juniors? The web developers? The people who just spent $10k+ to go to a bootcamp with the promise of a job? All that is dead now and if you can't see that I don't know what to tell you. I'm not trying to discredit the people at Ars since I like Ars but I did just read an article where two journalists produced 99% working code from a minimal prompt. This will be a bloodbath that will leave a handful at the end.
Bear in mind, the Journalists asked an AI to do something that is akin to a learning exercise with no productive value. The minimalism of the prompts actually indicates this more than it builds a strong case for the AI. "Clone Minesweeper" is not a realistic set of requirements, and it offers the AI an opportunity to evade in the actual problem solving involved in software development.

What this test leaves totally absent is requirements refinement. The AI was given a task that involves no need to interface with a human being whatsoever, no need to break down a problem written in English, no need to deal with a customer who isn't aware of what is and isn't reasonable in terms of software development.

This test really only shows the "best-case-scenario" for whether an AI can write functional code, but is way too detached from what makes programming a valuable skill. I'm a software engineer myself, and writing code that I already know how to write is a menial part of my job. Most of my work involves figuring out what my software actually needs to accomplish in the real world and understanding an incredibly vast project so I can modify it to make it accomplish those objectives.

With no need to refine requirements, minimal need to figure out how to accomplish a real world effect, I cannot take this as an indication of AI's ability to work on anything more advanced than the reinvention of common software tasks. Minesweeper is just too well known and easily referenced for its recreation to be evidence of the AI completing valuable legwork. The surprise features range from utterly trivial to ill-conceived: The sorts of features that would be tacked on because they're easy. All of this is, at best, a project for a sophomore software student meant to teach more than to demonstrate ability.

A demonstration of AI being able to replace a software developer will at least require it to refine requirements with a layman to create a project that hasn't been publicly dissected a million times already.
 
Upvote
2 (4 / -2)

jparsly

Smack-Fu Master, in training
1
Used to play a lot of minesweeper. Does anyone remember that there was a way to cheat?
It involved typing XYZZY and some specific actions with the mouse. Once the cheat was activated, a pixel in the top corner of the screen (not the game window) would light up when the
mouse was hovering over a square that concealed a mine.
 
Upvote
0 (0 / 0)

S4WRXTTCS

Ars Scholae Palatinae
1,394
You guys should really start including the helpline number on these articles. As someone who's been coding for my entire life these stories make me feel so depressed and really make me think that I've wasted my life on nothing. I wish I would have built cabinets or just stuck with that retail job.
I share a lot your same feelings. But, there is a different way of looking at this.

You participated in an industry that documented so well and accomplished so much that you enabled the code to make the code.

This is not an accomplishment that any cabinet maker can claim.

At this point why would we not tax AI coding agents to give a fixed income salary to old coders?
 
Last edited:
Upvote
0 (0 / 0)

EarlD

Smack-Fu Master, in training
39
Subscriptor
it's also the best moment in history to start your own software business!
A great place to start your own software business would be to create software to support Volunteer Fire Departments.

Check out this recent article about the lack of options in this area:
Link
 
Upvote
1 (2 / -1)
A great place to start your own software business would be to create software to support Volunteer Fire Departments.

Check out this recent article about the lack of options in this area:
Link
What a random tip. I have family who are volunteer firefighters in CT. I should ask what kind of software they need.
 
Upvote
1 (1 / 0)

NCG_Mike

Wise, Aged Ars Veteran
117
Unrelated, kind of, I asked ChatGPT if it could reverse engineer the C64 game “Drop Zone”. It offered to look at the 6502, interrupt (VBL) code, how to export the graphics and create a player to emulate the sfx. To convert to C++ and use Cocos as a framework.

I suspect, with guidance and a month, it could get it working.
 
Upvote
-1 (0 / -1)

zogus

Ars Tribunus Angusticlavius
7,269
If it's a codebase you intend to work with for a long time, you should care because AI is more likely to get tripped up by complex unmaintainable code.

Your compiler example isn't analogous because the compiler operates on your high level code as its source of truth. The problem with AI code is that the output becomes the input next time. That invites rot and drift away from the original intent. Maybe if you wrote and maintained a spec and the AI worked from scratch every time, the code quality wouldn't really matter.
I explained to my colleagues that I’d be on board with vibe coding for internal applications once they invent vibe product management. As it stands, vibe coded stuff seems headed for exactly the same pit in hell where all those write-once-read-never VBA apps ended up, except the pit is much deeper this time.
 
Upvote
7 (7 / 0)

rr6013

Ars Scholae Palatinae
691
AI isn't going away for computer code. BUT, I think corporations are making the same mistakes they made in the 2000s with offshore and H1B all over again.

They aren't investing in college grads. It lays a trap that screws the industry 5-10 years down the road. Especially since a lot of Gen X can retire early. You need the trusted and experienced people to oversee this all.

Instead of replacing, it should augment.

But if you want to keep AI from replacing all the workforce the biggest thing you can do is make sure AI output isn't able to get copyright or trademark claims.
Colleges and Universities failed Businesses… no fan of higher education and no Booster of Business, I went through the ComSci program early 70’s - mainframe era. Took what was offered at local college; 13th dimensional Calculus researcher on Sabbatical teaching programming languages.
That was pre unix*. In 1991, went back to Colombia to refresh on the unix bits.
Notably, studies of experienced coders found that people who used AI tools felt more productive, but were actually slower than people who didn't use AI "assistance."
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

In other words, if you're working on a serious project, AI "vibe coding" turns out to be a placebo that fools your brain's dopamine system into thinking you're more productive.
WHAM!!

Immediately, AI hits the proverbial wall – "12 lines code/ programmer/ day"

This article highlights in what way the technology best operates - at low granularity level i.e. Minecraft .vs. functional description.
Modularity, AI existentially favors as its preferred eco-system. That may foster an organizing efficiency multiplier i.e. reuse for future integration with developer environs.
As a test case building block that’s my useful takeaway. How language, name space dependency is projecting a new "patent" system for softwares is AI' limiting orthogonal factor.
 
Upvote
0 (0 / 0)

stifle

Wise, Aged Ars Veteran
164
Subscriptor
What was the actual cost of this? I see it mentioned a private subscription was used, but I never saw any figures. One of the things (among many) that I find lame about AI coding is that it's pay to play.
The cost to the environment was higher than the dollar cost.
 
Upvote
2 (3 / -1)

MilesArcher

Ars Centurion
340
Subscriptor
You expected Claude to come up with a whole deep learning/reinforcement learning pipeline on its own?

That seems... optimistic.
You kind of miss the point. I expected it to steal one that it found elsewhere in Github. But the main point is that it claimed to do so, appeared to run, and wasn't useful.
 
Upvote
1 (1 / 0)

SraCet

Ars Legatus Legionis
17,208
You kind of miss the point. I expected it to steal one that it found elsewhere in Github. But the main point is that it claimed to do so, appeared to run, and wasn't useful.
You're right, I wasn't aware that those were your points.

Are there a lot of RL pipeline examples to steal from on Github? I guess I just googled it and found one for Connect 4 specifically but it's unclear how well (or if) it works.

What you're asking for is pretty complicated stuff. If anything is sub-optimal about the neural network architecture, or the hyperparameters, you're not going to get a good result.

I just asked ChatGPT what neural network architecture it would use for Connect 4 and it's assuring me that three convolutional layers is plenty, which strikes me as extremely suspect, since that wouldn't even allow information to propagate from one side of the 7x6 board to the other side...
 
Upvote
2 (2 / 0)

zogus

Ars Tribunus Angusticlavius
7,269
Yeah... It's called vibe-playing. You don't need the game to play it.
That’s ancient news. The classic open-source roguelike Angband gained a “Borg mode” all the way back in 1996, with the goal of programming a bot that can finish the game unsupervised. At the time, players joked that the game was so advanced you no longer needed to play it.
 
Upvote
1 (1 / 0)
I'd like to see a similar experiment with more iteration. LLMs seem okayish at spitting out somewhat reasonable code on the first go around but basically every LLM I've tried (and similar results in discussions with other people and online content) points to them slowly destroying the programs when asked to debug themselves. Usually after one or two fixes they just start degrading existing features until the program eventually stops working entirely and the AI enters a death spiral.
 
Upvote
1 (2 / -1)
This is the same way I feel about generative AI and image generation.

I have to come to terms with the fact that this stuff has gotten really good. On the back of stealing everyone's work, but that ship has sailed. It will generate the fuck out of some images for you.

But when you actually want to do client work, with specific needs and revisions and long term thinking?

Suddenly it starts to get super wonky. Things that would be easy with a layered Photoshop file can become a dance of "no, just change that part, stop fucking up the rest of it".

To really use it well you basically need to also be good at Photoshop (or whatever similar software). A smart user could use prompt generation to build things in pieces and composite and adjust in layers and be able to stay responsive.

I suspect coding will be similar, in that yes, it can "make a program", but if you want something that's actually useful in the long term it will need a programmer who can drive and oversee and do things in pieces etc.

To go beyond that would require a generational leap in tech that we have not seen yet, and may or may not ever happen.
Bolding mine.
That shit pisses me off more than any computer game ever has. Spend an hour getting a pic aaaaalmoooost right and suddenly there's a random icecream truck and the dwarf king's face has gone from beard to Wookie vagina.

I'm just an occasional/light hobbyist, but yeah, I do more work in Gimp2 than in my prompt engineering. And actually have been trying out different ways to build mosaic-type/layered pics as you mentioned (thanks for the validation on that).

Back on topic: My coding skills are exactly equal to "use Notepad to edit computer game files that open in Notepad / use Weidu to make my own NPCs for Baldur's Gate 1&2 /figure out what works through trial and error."

But even I look at someone just copy/pasting generated code (or frikken legal documents you're handing to a judge, FFS) without checking it first as "god I hope they don't breed".
 
Upvote
0 (1 / -1)
If one person can do that much work how is the work valuable? Sounds like this is going to turn into outsourced work like everything else with a handful of people running the show. What's going to happen to all the juniors? The web developers? The people who just spent $10k+ to go to a bootcamp with the promise of a job? All that is dead now and if you can't see that I don't know what to tell you. I'm not trying to discredit the people at Ars since I like Ars but I did just read an article where two journalists produced 99% working code from a minimal prompt. This will be a bloodbath that will leave a handful at the end.
I get the stress, more than most since I've been homeless before. Many places have free legal help, and you would be surprised how long you can tie it up in court just to keep a roof over your head a little bit longer, so if you haven't already I strongly advise looking into that.

But when it comes down to it: Can you think of a single person on their deathbed that ever uttered "I wish I spent more time working" as they shuffled off this mortal coil?
 
Upvote
2 (2 / 0)

Bobb Ansig

Smack-Fu Master, in training
65
I get the stress, more than most since I've been homeless before. Many places have free legal help, and you would be surprised how long you can tie it up in court just to keep a roof over your head a little bit longer, so if you haven't already I strongly advise looking into that.

But when it comes down to it: Can you think of a single person on their deathbed that ever uttered "I wish I spent more time working" as they shuffled off this mortal coil?
The problem is not time spent working, which I'd like to reduce myself, but time one can get paid for working. AI (and robotics, and conventional programs) are going to continue to displace human labor. Will they generate new job opportunities as they do so? Will these be as many as the ones lost? Will humans be better than machines at the new jobs? Or will they just go immediately to machines?

My answers are Some, No, Mostly not, and Mostly yes.

Things will be worst across the board for new entrants to fields like coding, art, writing, teaching, long-haul trucking, radiology, and so on. Senior experts / seasoned pros will be needed for some time to come in most of those fields, certainly long enough for senior incumbents to reach retirement, but AI is going to hoover up the entry-level work at a rate that may actually eliminate the entry level in multiple present-day professions. There is a coming crisis for new graduates, and nothing resembling a plan to address it in any nation I'm aware of.
 
Upvote
1 (1 / 0)
The problem is not time spent working, which I'd like to reduce myself, but time one can get paid for working. AI (and robotics, and conventional programs) are going to continue to displace human labor. Will they generate new job opportunities as they do so? Will these be as many as the ones lost? Will humans be better than machines at the new jobs? Or will they just go immediately to machines?

My answers are Some, No, Mostly not, and Mostly yes.

Things will be worst across the board for new entrants to fields like coding, art, writing, teaching, long-haul trucking, radiology, and so on. Senior experts / seasoned pros will be needed for some time to come in most of those fields, certainly long enough for senior incumbents to reach retirement, but AI is going to hoover up the entry-level work at a rate that may actually eliminate the entry level in multiple present-day professions. There is a coming crisis for new graduates, and nothing resembling a plan to address it in any nation I'm aware of.
Personally, I'm hoping they displace enough human labor that we finally get the point and just move past this outdated "everybody has to have a job or starve in the streets" crap. I always kinda figured the whole point of technology was to do our work for us, and with fiat currency literally able to be produced at will in any denominations or amounts needed (thus the "fiat" part), the scarcity of $ is artificial and arbitrary.

I seriously doubt it will happen, because deep down us Homo sapiens are not Good or Evil (we're just mammals) and are probably going to go extinct soon because we're also a pretty stupid mammals that generally only think about ourselves and the content of our own lives.

But I'm also one of those weirdos that considers short-term future to be "next 100 million years". And I'm autistic, so beyond the immediate "gotta eat to live" concept, ya'lls weird obsession with jobs is honestly just insane to me.

And thus, if your job is the most important thing in your life, you should probably try to get a better life.

And finally, to all those dipshit libertarians out there that are extremely offended by these beliefs: :ROFLMAO:
 
Upvote
1 (1 / 0)
You guys should really start including the helpline number on these articles. As someone who's been coding for my entire life these stories make me feel so depressed and really make me think that I've wasted my life on nothing. I wish I would have built cabinets or just stuck with that retail job.
Why? just because some AI built a game that has been implemented hundreds of times over on publicly available codebases as open source? Come on, now...
 
Upvote
-1 (0 / -1)
As a programmer, you have a lot of options to change industries. Most IT infrastructure has scripting support. Powershell is easy to pick up. Scripting for SANs, VMware, Citrix, Azure things, Ansible things, all these are available to you.

Don't freak out just yet. You can scaffold things in AI, but trying to make fine grained changes on all of these games just with prompts will eventually fail, and result in non-working code. You've probably got 2 or 3 years before someone solves the context window problem. I've changed industries 5 times since the 2000s internet bubble, riding various other bubbles. Be adaptable, look for oblique ways to expand your skills. Understanding what good code looks like, how APIs work, the restful nature of the internet, etc. lets you understand modern systems in a way that non programmers don't.
hold on... you think SCRIPTING is safe? BWAHAHAHAHAHAHAHA
 
Upvote
-4 (0 / -4)

Bobb Ansig

Smack-Fu Master, in training
65
Personally, I'm hoping they displace enough human labor that we finally get the point and just move past this outdated "everybody has to have a job or starve in the streets" crap. I always kinda figured the whole point of technology was to do our work for us, and with fiat currency literally able to be produced at will in any denominations or amounts needed (thus the "fiat" part), the scarcity of $ is artificial and arbitrary.

I seriously doubt it will happen, because deep down us Homo sapiens are not Good or Evil (we're just mammals) and are probably going to go extinct soon because we're also a pretty stupid mammals that generally only think about ourselves and the content of our own lives.

But I'm also one of those weirdos that considers short-term future to be "next 100 million years". And I'm autistic, so beyond the immediate "gotta eat to live" concept, ya'lls weird obsession with jobs is honestly just insane to me.

And thus, if your job is the most important thing in your life, you should probably try to get a better life.

And finally, to all those dipshit libertarians out there that are extremely offended by these beliefs: :ROFLMAO:
I'm with you, though I'm much more concerned with humanity surviving over the next several centuries. Beyond that, technological and external events are so uncertain that planning is impossible.

IME, humanity is inherently greedy and lacking in empathy. As a race, we have known what is good" for millennia, and yet our society would collapse into violent anarchy overnight but for systems of police and courts and prisons.

It's pretty simple, IMO. On an individual level, treat everyone as you would wish to be treated yourself (The Golden Rule). On a societal level, design systems of government and laws as if you had no idea if you were going to be poor / weak or rich / powerful (Veil of Ignorance). But humanity doesn't even come close to this, and we're only able to approach it in a few places where there is plentiful wealth on a per-capita basis.
 
Upvote
0 (1 / -1)

OrvGull

Ars Legatus Legionis
11,931
That’s ancient news. The classic open-source roguelike Angband gained a “Borg mode” all the way back in 1996, with the goal of programming a bot that can finish the game unsupervised. At the time, players joked that the game was so advanced you no longer needed to play it.
There was also Progress Quest, the granddaddy of modern idle games.
 
Upvote
0 (0 / 0)