OpenAI’s math breakthrough played to AI’s strengths

graylshaped · 2026-06-01T12:54:30-0400

wildsman said:
You joking? They are the ones that did it!

The guy I quoted confidently wrote that 'they used a specialised model' - he provided no source and yet you swallowed that whole.

I provide the only source you can trust and you question it.

If you don't believe OpenAI, why even believe that a model did this? Maybe a mathematician did it and they just took credit for it - in that case, this whole article by Ars is based on a press release.

Conspiracy theorists are truly nut cases...

What source did you provide? All I see is that this was a general purpose model trained for this task. Many people believe that represents the best use of this technology: bespoke models specifically and exhaustively trained on curated data to perform well-considered and defined tasks.

The chain of thought document describes the model taking a largely brute-force approach, pulling up various existing theories that fit the problem and running through the exercise, in a manner that calls to mind Tsimerman's comment that his work on an approach like this was so tedious it "wasn't worth the effort." Lo and behold, here the model proves its true value: it does not value its time the same way humans do and will stick at mechanically plugging away until something seems to fit. According to the article and the links, with all the resources and tokens OpenAI can throw at it, it only found the answer half the time during its trials.

This is a worthy milestone in the development of these models; the downside is it will of course spawn more people to mistakenly conclude adding more monkeys to the group throwing things at the wall to see what sticks is "thinking." And heaven knows OpenAi can use the $500 prize for proving why Erdös himself struggled to prove his theory, even while falling short of solving the question itself.

edit: I see as I work my way through the comments MilanKraft also calls out the infinite monkeys analogy...

graylshaped · 2026-06-01T13:24:49-0400

Uncivil Servant said:
On the one hand, this sounds a lot like the old spam post you used to see on poorly-moderated fora along the lines of "my friend quit her job to work from home, and now the makes over $5,000 a month using this one simple trick!"

I strongly suspect there is a reason why it sounds like spam marketing fluff.

graylshaped · 2026-06-01T13:31:12-0400

Uncivil Servant said:
Dude named after Lenin posts about his desire to "get rid of people". If you're not trolling, then I'd strongly suggest some introspection.

I was finding interesting the dude named after Lenin's confidence that he was among the enlightened and therefore suitable to cast judgment on the perspectives of others on how society's resources are best employed.

graylshaped · 2026-06-01T13:32:30-0400

wildsman said:
It's a spectrum of abilities and it's very jagged - a genius physicist or surgeon might struggle to understand basic things only slightly adjacent to their field.

Let's see if you have the ability to read your own words and apply introspection.

graylshaped · 2026-06-01T13:43:48-0400

SomewhereAroundBarstow said:
... I'm not anti-AI. I'm anti-abuse of AI.

This seems to me like a perfect use case for AI. It extends human capabilities and human knowledge. No one was able to solve this problem until they used AI as a tool to help do it.

What I object to is using AI to do things that humans not only can do but which are expressions of what it means to be human.

I'll subscribe to this newsletter, with a note to the editor that it is less that I object to using these tools to do those things, it is misrepresenting what using those tools means AND that our choice to do so without consideration affects who we are.

I can:
1) Paint something, take a photograph, look at my pantry and employ my life experience to put what's there together for a tasty dinner;
2) Buy a picture or photograph, a paint-by-number set, use GIMP to manipulate and mask the errors in my photography skills, look up a recipe or buy one of those pre-assembled kits and cook dinner with it;
3) Commission a painting or photograph--leaving out as irrelevant to this point while certainly relevant to society--whether my patronization is of a human artist or a generative model, have them revise it to the taste I failed to express when originally requesting it because I didn't know what I really wanted, go to a restaurant and have someone cook for me, buy a package at the store and stick it in the microwave, etc...

All of these things are valid approaches to obtaining a product of creation. How we feel about which tier of approaching this is meaningful in an individual and collective sense.

graylshaped · 2026-06-01T13:50:05-0400

Uncivil Servant said:
You're taking as a given that these LLMs think like us, that they are indistinguishable from us, and that they have our capabilities and better.

And you cannot consider why people are dismissing your ideas? You've created a godhead and you can't even recognize it.

Hey, it's Zen. Enlightenment will come to us when we embrace ~~nirvana~~ the singularity.

graylshaped · 2026-06-01T13:52:47-0400

Albino_Boo said:
The real question is if AI recommends pineapple pizza because pineapple pizza is the best pizza

Now THAT's poking the bear.

graylshaped · 2026-06-01T13:58:55-0400

S_T_R said:
Economics calls this the law of diminishing marginal return.
...
Extracting further improvements from the existing paradigm is going to get exponentially more expensive if they can't figure out a new avenue. They've already reached diminishing returns on training and recall phases of the process. I'm not sure where they're going to find more juice to squeeze.

Acknowledged and glossed over even in detailed discussions with the major players, the plan has always been:

1) Maximize what can be achieved using current methods and all available resources;
2) [Discontinuity Breakthrough on the order of discovering Heechee technology]
3) Profit and World Joy for All!

graylshaped · 2026-06-01T14:13:19-0400

bugsbony said:
Check the last paragraph:

"At the same time, we haven’t fully explored what current models can achieve in math. Soon after OpenAI’s announcement, University of Michigan postdoc Xiao Ma found that GPT-5.5 was also able to prove Erdős wrong if given a small hint. If a generally available model could disprove this famous conjecture and no one noticed, what other discoveries could happen today that no one has thought to try?"

It's not that much of a leap to think that the next gen general model can do that without a hint.

Think about that. "After it was announced" other models could get there if pointed in the right direction. Future models will have that knowledge in their training data. Incremental improvement is to be expected if not required just to keep pace--which is why training of models shouldn't be considered one-time start-up outlay. It is an ongoing (and opaque) component of operating cost, measured over lifecycles that are pretty darn short.

graylshaped · 2026-06-01T14:17:43-0400

GenericAnimeBoy said:
Do I need to adjust my understanding of decidability, or are the LLM salesmen just obfuscating things with handwaving and "do it in an AI model"?

Is this before or after we spend eight pages of comments regurgitating debates in other threads about what definition of "understanding" we start with before people who really want us to accept that "AI" is Just Like Real Folks move the goalposts on us?

Which some will do to avoid admitting the answer to your question, at the end of the day, is "the latter."

[ducks]

graylshaped · 2026-06-01T14:34:20-0400

wildsman said:
You didn't look up the thread?

Two people put this without any sources:

I asked what source YOU provided. Are you referring to saying "RTFA"? The article AND the links in it all say it's an unreleased general purpose model, but say nothing else, and I suggest you are likely over-stating how generic this model and its training may have been. Very few rational people are going to accept that this one time OpenAI is giving us all the information.

graylshaped · 2026-06-01T14:35:21-0400

hanharal said:
OpenAI said it.

Proof: Planar Point Sets with Many Unit Distances
Models chain of though (abridged): Rewritten Chain of Thought for the Solution to the Unit Distance Problem
Remarks by mathematicians: Remarks on the Disproof of the Unit Distance Conjecture

Disproof and proof are different things. The article and the ones you link to explain this.

graylshaped · 2026-06-01T14:38:16-0400

hanharal said:
First it was autocomplete.

Pretty amazing that autocomplete could solve a mathematical problem which has stumped mathematicians for 80 years.

Dunno about you, but I had a lot of math and many of the questions involved filling in a blank of some sort.

edit: in no way am I suggesting that the filling of many blanks is non-trivial. I am saying, and have alluded to this in comments above this one, that the ability to frame the right question, and of whom to ask it, is often far more important. Yes, that acknowledges the value in so-called "prompt engineering" AND the value in knowing which knife to use to pare the apple and which knife to use to debone the trout.

graylshaped · 2026-06-01T14:45:53-0400

StevenTMuc said:
Agreed. The reaction to, in essence, a million monkeys smashing the keys on typewriters 24/7 putting out a paragraph from Romeo and Juliet word for word is a bit over the top imo, especially considering what it is costing us to have those monkeys typing away 24/7.

It's weird that they don't mention anywhere how many tries this took (I'm willing to bet it wasn't a one-shot).

Somewhere in there is a reference to a table provided by OpenAI indicating it was a fifty/fifty process, which I read to mean on multiple trials of an unspecified n about half the time it found this answer, and half the time it didn't. Also unspecified is who did the initial validation before declaring Eureka! and sending this out to the mathematicians whose evaluations were subsequently part of their PR campaign. In the comments of those mathematicians were references to buzzes in that community over the model's apparent success before it went wide.

graylshaped · 2026-06-01T14:52:41-0400

idontevenexercise said:
Great write-up.

Not to lessen the accomplishments of the engineers (and presumably mathematicians) who created this internal OpenAI model, but there is a key phrase in your article that sticks out: "But it didn’t pioneer any genuinely new techniques. "

You are right, of course. And this to me indicates a distinct possibility as it relates to these LLMs that many people are so enamored with. It's possible that it actually cannot create anything genuinely new, not now, and maybe not ever - just mashups of what it has been trained on. I

I am going to stun people who want to put me in a box of their design and say something positive about these tools: They are excellent at identifying possible connections that represent a true synthesis, in the spirit of what Johansson called the Medici Effect.

What they have yet to show is that they have any ability to tell if the work they bring home adds true value, or if it's something only a parent would hang on their refrigerator.

graylshaped · 2026-06-01T15:05:24-0400

wildsman said:
Are you missing the poster I was responding to made a confident statement about the type of llm without providing any source?

Boy you guys are really a cult now aren't you? You won't even try to look at anyone in-group...

You won't even look at what YOU wrote. You said you provided a source. Which source? I found none offered by you in this thread other than the "RTFA" comment. The "FA" does not go as far as to say what you assert, either.

graylshaped · 2026-06-01T15:19:08-0400

nitsujmai said:
This reads as cope. Assuming it was 50/50, it still accomplished the end goal and served as an incredibly useful tool for the humans working on the problem. These models have not plateaued and clever people will continue to use them to advance science and math. Just as humans have always used tools.

This is why I equate these comments to anti-vaxxers. The mRNA vaccines have side effects: "Not perfect, ban it!!" "Scary new tech made by evil pharma lizard people!". Meanwhile an mRNA vaccine just brought us to striking distance of curing pancreatic cancer. We are just scratching the surface with generalized LLMs. Maybe it leads nowhere, but there is enough evidence now that makes it worth exploring.

A "cope." Noted for future reference.

OpenAi has been vague on details here. What we can glean is that the process they used had an error rate of fifty percent. Using the same data, the same model, getting the right answer from a tool capable of producing it both before and after they knew they could do it was a coin flip. The target did not move. The tool was the same. Did the model they used even identify "This one works. This one does not"? We don't know. The overall process, which tends to be what I care more about, did make that determination, and this is a milestone. Absolutely this was a problem whose solution, for longer than my lifetime, was impractical to solve using other methods.

I applaud the developers of the tool and the mathematicians guiding it, validating it, and helping us understand what this means, and ask the reasonable question: "Can we know more about the true cost of this coin flip tool?"

I see this success in that context, rather from the context of "Eat it, anti-AI people!"

Sorry if that harshes your mellow.

graylshaped · 2026-06-01T15:26:15-0400

nitsujmai said:
Wasn't there just an article about how AI didn't change it's stance when presented with new evidence? And commenters like you were using it as evidence of your view? And now the same people are in denial of this well written article, even going so far as to pull an unrelated analogy from 18 years ago? So funny.

https://meincmagazine.com/ai/2026/05/...en-after-explicit-warnings-that-theyre-false/

That's not at all what that article was saying. It was saying when provided with assumptions identified as false, the models were prone to using the false assumptions in forming their output without regard for their lack of validity.

graylshaped · 2026-06-01T15:30:17-0400

nitsujmai said:
It's happening slowly, and bifurcating into those interested in learning and those that autocomplete the comment box with nonsense they hear in their 'information' bubble.

There are two kinds of people in this world: Those that insist on putting everybody into boxes.

nitsujmai said:
I applaud Ars for doing a deep dive on one of the many fascinating stories showing the usefulness of these tools in augmenting human progress.

Agreed!

graylshaped · 2026-06-01T16:18:33-0400

CyrixInstead said:
I wonder what the token budget actually was?

According to the article, "maximum":

... an OpenAI chart revealed that even with the maximum token budget, the internal model solves the problem only half of the time.

OpenAI’s math breakthrough played to AI’s strengths

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis