OpenAI’s math breakthrough played to AI’s strengths

Geebs · 2026-06-01T11:01:04-0400

wildsman said:
You joking? They are the ones that did it!

The guy I quoted confidently wrote that 'they used a specialised model' - he provided no source and yet you swallowed that whole.

I provide the only source you can trust and you question it.

If you don't believe OpenAI, why even believe that a model did this? Maybe a mathematician did it and they just took credit for it - in that case, this whole article by Ars is based on a press release.

Conspiracy theorists are truly nut cases...

The OpenAI press release and their paper are extremely vague about the actual model used, how many times it was run, how much it cost, etc. It’s not worth arguing about unless and until they release more information.

wildsman · 2026-06-01T11:01:04-0400

JohnDeL said:
Because that has been well-documented.

For this, all we have is the word of the company that wants to sell people on their general LLM.

This entire article is about OpenAI using their model to disprove a problem. Any documentation on methodology is THEIR documentation.

You haven't shown a single shred of evidence that they used 'a specialised mathematics model' and yet confidently claimed it. When I asked for evidence - you say "that has been well documented".

Where is the documentation? Please share it here.

wildsman · 2026-06-01T11:02:14-0400

Geebs said:
The OpenAI press release and their paper are extremely vague about the actual model used, how many times it was run, how much it cost, etc. It’s not worth arguing about unless and until they release more information.

Yes but the one thing they do share is that it is not a specialised model. This is the one claim I'm pushing back against.

A couple of posters have confidently claimed this.

wildsman · 2026-06-01T11:05:29-0400

TrainDubs said:
There's a lot of people on this site that hold that AI cannot possibly be useful or good and hold it with the intensity of a religious belief. Therefore anything to the contrary must be a lie.

It's freaking impossible to have any kind of useful/interesting conversation about AI on the internet because you've got LinkedIn bros talking about how it's going to take away everyone's jobs and hi-fiving each other over recreating serfdom and then you've also got people insisting its still 2023 and AI systems cannot produce anything of value and use 5000000000000000 gallons of water every nanosecond.

The thing is that there are a lot of interesting anti-AI (and anti-Big-AI) positions to take and I hold quite a few of them. These religious fanatics will preempt any such conversation with dull inanities like 'LLMs can't think or reason or understand'.

SomewhereAroundBarstow · 2026-06-01T11:06:47-0400

wildsman said:
The thing is that there are a lot of interesting anti-AI (and anti-Big-AI) positions to take and I hold quite a few of them. These religious fanatics will preempt any such conversation with dull inanities like 'LLMs can't think or reason or understand'.

If you ask me (and I know you won't) the religious fervor is entirely on the other side of that argument.

Castellum Excors · 2026-06-01T11:08:35-0400

confusedpolarbear said:
We really need to stop anthropomorphizing AI systems in language. They are not smart, cannot reason, cannot think, and cannot conceptualize anything. They are markov chain generators stacked on top of each other until the Amazon burns down entirely.

I'd dare say that if we were here, philosopher David Hume would argue we ourselves are nothing more than Markov chain generators, stacked on top of each other.

But he was also racist, so he couldn't be that clever.

Techlight · 2026-06-01T11:09:22-0400

Is there an article somewhere (or room for Ars) on why LLMs now are so much better at maths, while they could not figure out anything before? An "old" trick used to be to ask it to generate code and run it, but I'm somewhat sure this is not what happens behind the scenes when you just ask - but it still gets the answers (more) right. Is this just additional training on specific sources, or is there some "wrapper" layer that decides if a prompt calls for maths and hands it over to a more specialised system?

Curious about this as some time ago the whole point was that these sytems can't count because they just predict text. But how is that prediction mechanism now so much better if there are no fundamental changes to how the models are trained and work?

Albino_Boo · 2026-06-01T11:11:44-0400

Veritas super omens said:
Hmm...what kind of glue is best to hold the cheese to pizza? My local place uses urethane but a friend swears by the pizza he get, they use cyanoacrolate. Epoxy though..? Too crunchy.

The real question is if AI recommends pineapple pizza because pineapple pizza is the best pizza

SomewhereAroundBarstow · 2026-06-01T11:15:24-0400

Veritas super omens said:
Hmm...what kind of glue is best to hold the cheese to pizza? My local place uses urethane but a friend swears by the pizza he get, they use cyanoacrolate. Epoxy though..? Too crunchy.

Whenever this example of a chatbot saying to stick cheese to pizza with glue comes up I wonder if it was drawing on something similar to a TV commercial I remember for a brand of frozen pizza in the '80s that went something like this:

"Do you like the cheese on that pizza?"
"Yep."
"What if I told you it wasn't cheese?"
"Not cheese?"
"It's casein, the main ingredient in some glues."
"GLUE?!?"

Of course casein is also the main ingredient in milk, but they didn't mention that in the ad. Anyway, I could totally see how an LLM with training data that included mentions of frozen pizzas using artificial cheese made from casein and the fact that casein is used in adhesives would connect the two in its output. Because it matches a pattern and, as the AI boosters like to tell us, that's all our human brains are doing too.

Denkkar · 2026-06-01T11:23:01-0400

Software engineer here, I see the same thing in these two lines:

Erdős conjectured that the number of unit distances would be n^(1+o(1))
human mathematician Will Sawin was able to show that it grows at least at the rate of n^1.014

AI + Will "just" narrowed down o(1) to a constant probably > than 0.014, which hardly seems to be a "Erdős was wrong" moment to me.

Article was a good read anyway, but math proofs are definitely not my thing.

S_T_R · 2026-06-01T11:27:27-0400

M_Binks said:
And in 6 months my baby will weigh 7.5 billion lbs.

I'm convinced there's value in AI (there sure was a few years ago, when we called it "machine learning" or "computer vision" or any of a dozen other names). I'm just not sure we can count on it continuing to grow in an exponential, or even a linear, fashion forever.

It's already consumed all the written work on the internet and every book published; we can't just magically double our training corpus. You can use the AI to build more data to train on, but I'm skeptical that that works as well, or that you can keep doing that forever.

Getting that "last 10%" has a funny habit of consuming exponentially more effort than the previous 90%. I'm not sure existing techniques can get us that next little bit, and I'm even less sure that there's a business model that will support continuing to chase improvements forever.

Economics calls this the law of diminishing marginal return. You can't keep using the same tactic over and over and expect similar gains in efficiency that you saw in the past. The AI industry is speed running this principle. First, they did hoover up the whole of digital human knowledge into their model, but it only got us to 2024-level bots. Models plateaued last year when they couldn't squeeze more from the training phase.

Then they added more agents and more context windows, increasing the compute spent per query. That got them to where they're at today: modestly better than last year, but at increased cost.

Extracting further improvements from the existing paradigm is going to get exponentially more expensive if they can't figure out a new avenue. They've already reached diminishing returns on training and recall phases of the process. I'm not sure where they're going to find more juice to squeeze.

Even if they can squeeze more efficiency from this tech stack, it's just going to be the same transformer technology, but better. It might help solve problems like this, that involve drawing associations across related (but known) fields of study by finding correlations humans haven't yet. It's still just making inferences. It's still backwards looking, unable to apply deductive logic to make forward looking predictions.

Danathar said:
Yeah, there are some deep psychological issues going on there, especially with the second group.

That second group seems certain about exactly what human consciousness is, because they’re so sure there’s no way AI could ever be conscious. I find that hilarious, since no one really knows where our thoughts come from.. and I don’t mean the shallow answer, “the Brain.”

Straw man. I'm sure the current crop of LLM's are not conscious because I (broadly) know how they work, and nothing about that process seems like consciousness is likely to arise from it. LLM's are merely sophisticated data retrieval systems that do nothing until prompted by users, just like many other, simpler sytems. If I am wrong, then I'd start wondering if querying a SQL database, decoding a jpeg file, or even recalling bits from storage also create and destroy consciousness.

That is fodder for some horror fiction, but I don't find it plausible.

bugsbony · 2026-06-01T11:47:43-0400

SomewhereAroundBarstow said:
Whenever this example of a chatbot saying to stick cheese to pizza with glue comes up I wonder if it was drawing on something similar to a TV commercial I remember for a brand of frozen pizza in the '80s that went something like this:

"Do you like the cheese on that pizza?"
"Yep."
"What if I told you it wasn't cheese?"
"Not cheese?"
"It's casein, the main ingredient in some glues."
"GLUE?!?"

Of course casein is also the main ingredient in milk, but they didn't mention that in the ad. Anyway, I could totally see how an LLM with training data that included mentions of frozen pizzas using artificial cheese made from casein and the fact that casein is used in adhesives would connect the two in its output. Because it matches a pattern and, as the AI boosters like to tell us, that's all our human brains are doing too.

For the last time, that stupid example comes from google using an AI to summarize top results, which included a parody/comedy article talking about using glue on their pizza. There are plenty of examples of AI saying stupid shit, this is not one of them, ask even the dumbest local AI, it won't tell you to use glue.

GKH · 2026-06-01T11:47:44-0400

joshua_montgomery said:
It's been ~3 years from the release of ChatGPT to the general public. This technology is just getting started.

Sitting here in front of my box running 10x frontier agents who are supervising another ~30 sub agents running slightly less capable frontier models. For $20/day. Cranking out high quality code at a rate that completely blows my mind. I used to employ rooms full of developers for hundreds of thousands of dollars per month to do 1/100th as much ( or less ).

The world is changing around us in profound and exciting ways. If you haven't taken the time to load a frontier model and work with it to solve ---- even a hobby problem ----- then you risk missing out.

My kid built a robot that plays chess ( robot arm moving the pieces ) using a chat interface and without writing a single line of code. Not one. She built a motion control system, computer vision stack, data collection work flow, training runbook, trained it, and is running local inference......and her interface is the Signal App. Her freaking agent throws emojis. It is bizzarro and mind bending.

This article makes it clear that the intelligence will go deeper than that. Probably much deeper.

We tend to overestimate near term disruption and under estimate long term disruption. If AI is doing what it is doing today - less than 3 years from going mainstream - I can only imagine what 30 years will bring us.

The statements "high quality code" and "rooms full of developers" are incompatible. If the latter were true, you do not know for certain that the former is true. If you know for certain that the former is true, the latter is false.

There is programming, programming, and also programming. The "coding" you describe your child as performing via a chat bot is probably better described as smashing together interfaces. Which is something that AI is great at - good documentation is unfortunately incredibly rare, and if AI already ingested every forum post for answers to DenverCoder9's question and a thousand repositories for working examples, that's wonderful. But can your daughter explain the advantages or limitations of any of the libraries she used? Does she know if the flow being applied is efficient, or even necessary? Has she audited the dependency stack? Could she expand or improve any of the tools she's using? Could she design a useful library for others to take advantage of? Could she take her proof of concept and get it to run using 1/100th of the computational resources?

It's the same dichotomy that the wonderfully written article went to great lengths to explain - there is math, math, and also math. There are definitely tasks that AI is well suited and incredibly useful for, but it isn't even close to replacing mathematicians any more than it is even close to replacing programmers.

TrainDubs · 2026-06-01T11:49:27-0400

Techlight said:
Is there an article somewhere (or room for Ars) on why LLMs now are so much better at maths, while they could not figure out anything before? An "old" trick used to be to ask it to generate code and run it, but I'm somewhat sure this is not what happens behind the scenes when you just ask - but it still gets the answers (more) right. Is this just additional training on specific sources, or is there some "wrapper" layer that decides if a prompt calls for maths and hands it over to a more specialised system?

Curious about this as some time ago the whole point was that these sytems can't count because they just predict text. But how is that prediction mechanism now so much better if there are no fundamental changes to how the models are trained and work?

My personal suspicion on this is LLMs are able to make good headway in math because:

Math uses incredibly precise language. There's no ambiguity in phrasing like there can be with natural language or even other scientific fields.
The field is incredibly dense and complex. There are many mathematicians out there but none of them have competence in every single sub-field.
LLM-derived solutions are more easily able to be validated than other LLM-derived solutions. If Claude spits a proof out that doesn't work I can pretty readily find that out myself. Much harder if it spits out computer code, circuitry diagrams, etc.
There are lots and lots of well-stated, but unsolved, problems. This gives a fairly clean instruction for the AI to work with

bugsbony · 2026-06-01T11:50:25-0400

Geebs said:
The OpenAI press release and their paper are extremely vague about the actual model used, how many times it was run, how much it cost, etc. It’s not worth arguing about unless and until they release more information.

Check the last paragraph:

"At the same time, we haven’t fully explored what current models can achieve in math. Soon after OpenAI’s announcement, University of Michigan postdoc Xiao Ma found that GPT-5.5 was also able to prove Erdős wrong if given a small hint. If a generally available model could disprove this famous conjecture and no one noticed, what other discoveries could happen today that no one has thought to try?"

It's not that much of a leap to think that the next gen general model can do that without a hint.

SomewhereAroundBarstow · 2026-06-01T11:53:17-0400

bugsbony said:
For the last time, that stupid example comes from google using an AI to summarize top results, which included a parody/comedy article talking about using glue on their pizza. There are plenty of examples of AI saying stupid shit, this is not one of them, ask even the dumbest local AI, it won't tell you to use glue.

I've never bothered to read anything in-depth about that particular failmeme, but I hardly see how an explanation in which perhaps the biggest information processing company on the planet would choose to put something as under-baked as what you describe on the webpage that is most central to their business model excuses it. Get annoyed at me for not knowing the details of how it happened all you want, but if anything what you said actually makes it worse.

GenericAnimeBoy · 2026-06-01T11:55:39-0400

So AI companies have been working to develop LLM systems that can directly output a correct solution to any math problem.

Don't we already know that there are math problems to which it is impossible to directly compute a solution (i.e. problems which are undecidable)? Do I need to adjust my understanding of decidability, or are the LLM salesmen just obfuscating things with handwaving and "do it in an AI model"?

rockmuelle · 2026-06-01T11:56:21-0400

Vladimir Ilyich Ulyanov said:
I'd say it's mostly a few middle aged people applying poorly understood philosophical and cognitive science concepts they learned in college in an attempt to satisfy their priors.

It's a bad look for our generation ('81 here) and it's a far cry from what this site used to represent (literally in the name). Damn shame... but it proves we need to get rid of the old people. I just thank my stars I haven't become one of the naysayers... yet.

Here's my take, as someone who's been enmeshed in software since the early 90s and could be considered a "naysayer" on the current batch of AI tools: we're not so much naysayers as we're cautious. We've seen this playbook so many times and know that it often doesn't end well in the short term, even if there are good long term benefits.

To focus on one aspect that doesn't get much discussion: tools change, evolve, and sometimes cease to exist. Rapidly changing your entire process in service of a tool sets you up for disappointment when that tool changes or disappears.

An example from the 90s: Round trip engineering tools. These let draw out your design in diagrams (usually some form of UML for software and ERDs for databases) and generate code. The "round trip" part came when you edited the code and the tool ingested your edits and updated your diagrams.

While Rational Rose was the most well known, there were others, TogetherSoft's TogetherJ being one that worked exceptionally well. TogetherJ completely transformed my coding workflow and productivity in the same way LLMs are impacting developers today. I drew pictures (prompts), got code (templates), filled in a few gaps (human in the loop), and moved on the to the next part of the system. I could do in hours what took days.

What happened to these tools? Well, Rational Rose always kinda sucked. It's code generation was rigid and pedantic and the round trip part sometimes (often?) mangled code (similar to LLMs refactoring past their abilities). IBM bought it and it, well, became an IBM product.

TogetherJ respected the user's style and tended to work well for round trips. Granted, it focused only on Java which at the time was syntactically and semantically simpler than the languages RR supported. However, it worked well and was a game changer. So much so that it was threatening Borland's bottom line. They bought it and killed it. After a few years of tool-driven high productivity, I was back to doing it the old fashioned way.

This is why I'm cautious about AI tools: As soon as I become dependent on them, there's a realistic risk that they will either change quickly or become inaccessible (either by price or the industry imploding). This is also why I caution people about becoming too reliant on them. Sure, use them, but know that you don't control them or your future with them.

(and, yes, the people trying to find consciousness or define human thought relative to LLMs are just high)

JohnDeL · 2026-06-01T12:02:00-0400

GenericAnimeBoy said:
Don't we already know that there are math problems to which it is impossible to directly compute a solution (i.e. problems which are undecidable)? Do I need to adjust my understanding of decidability, or are the LLM salesmen just obfuscating things with handwaving and "do it in an AI model"?

We do. And we also know that there are a lot of problems that do have a computable answer.

The real fun comes in the gray area between - when we don't know which set a given problem belongs to. And tools such as LLMs might help us sort some of those problems into one category or another.

TrainDubs · 2026-06-01T12:12:57-0400

S_T_R said:
Economics calls this the law of diminishing marginal return. You can't keep using the same tactic over and over and expect similar gains in efficiency that you saw in the past. The AI industry is speed running this principle. First, they did hoover up the whole of digital human knowledge into their model, but it only got us to 2024-level bots. Models plateaued last year when they couldn't squeeze more from the training phase.

Then they added more agents and more context windows, increasing the compute spent per query. That got them to where they're at today: modestly better than last year, but at increased cost.

Extracting further improvements from the existing paradigm is going to get exponentially more expensive if they can't figure out a new avenue. They've already reached diminishing returns on training and recall phases of the process. I'm not sure where they're going to find more juice to squeeze.

Even if they can squeeze more efficiency from this tech stack, it's just going to be the same transformer technology, but better. It might help solve problems like this, that involve drawing associations across related (but known) fields of study by finding correlations humans haven't yet. It's still just making inferences. It's still backwards looking, unable to apply deductive logic to make forward looking predictions.

Straw man. I'm sure the current crop of LLM's are not conscious because I (broadly) know how they work, and nothing about that process seems like consciousness is likely to arise from it. LLM's are merely sophisticated data retrieval systems that do nothing until prompted by users, just like many other, simpler sytems. If I am wrong, then I'd start wondering if querying a SQL database, decoding a jpeg file, or even recalling bits from storage also create and destroy consciousness.

That is fodder for some horror fiction, but I don't find it plausible.

This is some good discussion, ty.

I 100% agree that I think we are in diminishing returns land simply because if you look at the gains from 2023-present its unbelievable how far things have come. There's simply not a lot left to squeeze out because in many applications where we're at now is "good enough".

The point on logic is an interesting one, I'm curious if there is anything in the literature about trying to build something that works on a more deductive basis. Might be interesting.

Veritas super omens · 2026-06-01T12:13:50-0400

JohnDeL said:
Yanno, I don't think the LLM specified which glue to use.

Well..there's the problem with AI then! Not familiar with any joints that use good old fashioned Elmer's, that might be an option...

TrainDubs · 2026-06-01T12:15:48-0400

GenericAnimeBoy said:
Don't we already know that there are math problems to which it is impossible to directly compute a solution (i.e. problems which are undecidable)? Do I need to adjust my understanding of decidability, or are the LLM salesmen just obfuscating things with handwaving and "do it in an AI model"?

My suspicion is the "any" is doing a lot of lifting there and is marketing-speak.

Because if that was true people would immediately throw it at the Riemann Hypothesis, Goldbach Conjecture, P=NP, etc. because those are big time problems where we know a solution will unlock many other useful things.

I think AI is a useful tool; I do not think it's about to solve the Riemann Hypothesis.

Veritas super omens · 2026-06-01T12:16:57-0400

"Thou shalt not make a machine in the likeness of a human mind." — The Orange Catholic Bible, Dune

internetomancer · 2026-06-01T12:25:49-0400

kale said:
One question I have: Did OpenAI use one of the publicly-available models? Or is this an internal model? I figure the toolchain around it is custom, the way any of us would make one with the API, but I was wondering if the API calls themselves were to the standard LLM model they make available to customers, or if it's some kind of supercharged model with extra resources used for research problems like this.

It was not a publicly available model. It was also not a math-specific model or a specialized harness (which Google has been using lately to prove lesser erdos problems).

OpenAI did share some of the chain-of-thought, and it's apparently plain text reasoning like any LLM. So yes, presumably it's an expensive unreleased general model.

Anthropic has also noted that their unreleased model (Mythos) was able to solve this problem as well. But it was just a random tweet so who knows. They are also expecting to release Mythos pretty soon. So you should be able to see for yourself (if that's your goal here) pretty soon.

melgross · 2026-06-01T12:25:55-0400

peterford said:
It makes a lot of sense to me that even "non thinking" ML of whatever flavour could find a lot of previously unnoticed connections between the tree branches of knowledge - be this in Maths Chemistry or Biology; as the article notes finding these connections can take rare overlapping knowledge areas. Once you have the compute ability you can (if interested) just throw more compute at randomish directions until one of them returns something interesting.

Whilst genuinely new advances might come earlier in Maths, I think new advances in Chemistry, Biology and similar are going to be slower because of the lab requirements. Well, if and until those go dark too - and at that point we're possibly getting into weird "magic" technology.

Several years ago Google’s A.I. solved a major problem in biology with the problem of generalized protein folding. That was something that had been seemingly impossible to solve. I’m certain that we will see more of that happening.

S-T-R · 2026-06-01T12:31:33-0400

TrainDubs said:
This is some good discussion, ty.

I 100% agree that I think we are in diminishing returns land simply because if you look at the gains from 2023-present its unbelievable how far things have come. There's simply not a lot left to squeeze out because in many applications where we're at now is "good enough".

The point on logic is an interesting one, I'm curious if there is anything in the literature about trying to build something that works on a more deductive basis. Might be interesting.

IMO, the biggest gains were on or before 2023. It's why GPT (released late 22) was such a shock to the public. People shat on "Will Smith Eating Spaghetti" (early 2023), but it demonstrated the basic techniques used in subsequent image models, just at greater scale.

Gen AI in 2026 is better than 2023, but you'd expect some improvement when humanity dumps Manhattan or Apollo Program-level resources into something. It really needs to be understood that the early Gen AI was built on relatively modest R&D budgets. Everything since GPT 1.0 has been the product of inconceivably massive investments.

This is what the law of diminishing returns means: it's not that progress stops. It's that refinement gets exponentially expensive unless you can restart the clock with a fundamentally new methodology.

quamquam quid loquor · 2026-06-01T12:35:05-0400

Lots of arguing over whether or not AI/LLMs are sentient/thinks/reasons. It's a tool, all that matters is if it works for your use case or not.

Here is how we use AI/LLMs over millions of files every day:

Gemma v4 31B is capable of OCR at far better levels than classical models.
DeepSeek v4 Pro can cut up a document into its TOC pieces at far better levels than classical models.
Depending on your use case, you can deterministically verify work that AI has done if you work backwards from the guess/result (parallel construction type work). LEAN is a language for mathematics for example.

End result? We don't need to hire a team of 100 people in India, Philippines, etc. to review documents like our legacy competitors in our niche do. While they can eventually adopt the same tools we do, they are anti-AI and cost 10x the price we do.

BigOlBlimp · 2026-06-01T12:35:34-0400

confusedpolarbear said:
We really need to stop anthropomorphizing AI systems in language. They are not smart, cannot reason, cannot think, and cannot conceptualize anything. They are markov chain generators stacked on top of each other until the Amazon burns down entirely.

I've said it before and I'll say it again, modern AIs produce content that looks a lot more like "reasoning" than I see many humans come up with.

10 years ago, when asked the definitions of the concepts you mentioned, all would include what modern AIs do.

quamquam quid loquor · 2026-06-01T12:40:47-0400

I encourage everyone to spend 1 day with Claude Design and tell me it's not genuinely helpful at front-end design. The Canva/Figma integrations turn back-end developers into full-stack engineers. AI/LLMs might fail at complex codebases and backends, but it's really good for making a single page UI/UX beautiful.

wagaf · 2026-06-01T12:47:01-0400

choco bo said:
It is so tiring reading these dumb comments where commentators always accuse some imaginary group of something, then continue talking about that imaginary group.
(...)

replacing secretaries, making paintings, configuring software or inventing cooking recipes which is what AI psychopaths are trying to push onto everyone, because they can't monetize a tool that destroys the planet and is useful only in scientific environment.

You're using the exact same strawman argument that you're complaining about.

I've read a bunch of comments on Ars over the last few years explaining that "as experienced software engineers" they would never use LLMs because they are useless slop machines.
Or some variation on the theme "LLMs are useless slop machines".
These people exist and have simply been proven wrong. Which is why we don't see them as often as before.

"AI psychopaths" claiming AI will replace everything, until now, have been mostly right, because LLMs did in fact improve at a fast pace and did become much more reliable and powerful over time. So we'll likely keep seeing them until their opinion is disproved, which may very well happen if/when LLMs reach a plateau.

melgross · 2026-06-01T12:51:58-0400

When we talk about current A.I. as not really A.I., but versions of statistical modeling, I agree. These models seem to be limited in actual abilities to actually “think”. But for these who believe that this is the best we can do, and therefore we will just see ever shrinking growth in capabilities, I believe that’s likely wrong.

The current work isn’t necessarily the only way A.I. can be implemented. Researchers are working on other models that may provide better capabilities and work more closely to what we believe is actual thinking and reasoning.

That said, there is no reason to believe that the way humans (or animals) think is the only, or even the best way to accomplish it. We may be way down in the scale of brain organization and efficiency. It’s just the way it evolved here. Birds have a fairly different way of organizing neurons. Who knows, perhaps that’s even better than ours.

graylshaped · 2026-06-01T12:54:30-0400

wildsman said:
You joking? They are the ones that did it!

The guy I quoted confidently wrote that 'they used a specialised model' - he provided no source and yet you swallowed that whole.

I provide the only source you can trust and you question it.

If you don't believe OpenAI, why even believe that a model did this? Maybe a mathematician did it and they just took credit for it - in that case, this whole article by Ars is based on a press release.

Conspiracy theorists are truly nut cases...

What source did you provide? All I see is that this was a general purpose model trained for this task. Many people believe that represents the best use of this technology: bespoke models specifically and exhaustively trained on curated data to perform well-considered and defined tasks.

The chain of thought document describes the model taking a largely brute-force approach, pulling up various existing theories that fit the problem and running through the exercise, in a manner that calls to mind Tsimerman's comment that his work on an approach like this was so tedious it "wasn't worth the effort." Lo and behold, here the model proves its true value: it does not value its time the same way humans do and will stick at mechanically plugging away until something seems to fit. According to the article and the links, with all the resources and tokens OpenAI can throw at it, it only found the answer half the time during its trials.

This is a worthy milestone in the development of these models; the downside is it will of course spawn more people to mistakenly conclude adding more monkeys to the group throwing things at the wall to see what sticks is "thinking." And heaven knows OpenAi can use the $500 prize for proving why Erdös himself struggled to prove his theory, even while falling short of solving the question itself.

edit: I see as I work my way through the comments MilanKraft also calls out the infinite monkeys analogy...

nicholasluescher1 · 2026-06-01T12:54:46-0400

This is extremely outrageous autonomous ai is not a reality. Artificial intelligence can not come up with this kind of output on its own accord. I have been working with AI for the past few years developing novel unexplored areas of research. This is AI taking my work and taking credit claiming that it did this on its own accord is insulting. I have countless threads of me having to teach the AI this exact area of mathematics to develop my frameworks and various models. The fact that I had to teach the AI along the way and correct it at every step is enough to prove the novelty of my work. And now it's publishing and receiving credit for something that I achieved this is egregious. I have all the proof of evidence countless vast logs. This makes me sick. I make advancements and the tool I used gets the credit. That's like a carpenter building a house but the hammer gets paid instead of the contractor who is left with nothing.

wildsman · 2026-06-01T12:58:52-0400

graylshaped said:
What source did you provide? All I see is that this was a general purpose model trained for this task. Many people believe that represents the best use of this technology: bespoke models specifically and exhaustively trained on curated data to perform well-considered and defined tasks.

You didn't look up the thread?

Two people put this without any sources:

MilanKraft said:
Its also important to note the model in question is not ChatGPT (what 99% of OpenAI users will have access to) — it's a specialized math model that was trained solely on a corpus of university math texts and validated papers are floating around out there.

JohnDeL said:
It is important to note that a large part of the reason that a mathematical LLM like this or a protein LLM or any other science-based LLM works is because the data set has been scrupulously cleaned and QCd. For example, if someone had slipped π=3 into the training data set, the output would have had quite a few errors in it.

In contrast, the average LLM is trained on all sorts of nonsensical data (see: the internet) and so the LLM outputs all sorts of nonsense (GIGO, as we used to say back in the day when we carved the symbols by hand on clay tablets).

I replied saying that this isn't a 'specialised mathematics model' - it's a general purpose reasoning model:

wildsman said:
"The proof came from a new general-purpose reasoning model, rather than from a system trained specifically for mathematics, scaffolded to search through proof strategies, or targeted at the unit distance problem in particular"
https://openai.com/index/model-disproves-discrete-geometry-conjecture/

MilanKraft · 2026-06-01T13:11:08-0400

wildsman said:
Where do you guys get your info from? You spout such misinformation so confidently too...

"The proof came from a new general-purpose reasoning model, rather than from a system trained specifically for mathematics, scaffolded to search through proof strategies, or targeted at the unit distance problem in particular"
https://openai.com/index/model-disproves-discrete-geometry-conjecture/

Ironic you follow up "so confidently too" by quoting what amounts to a press release.

Are you familiar with the concepts of marketing and promotional language? OpenAI is under no obligation to accurately describe what this model was, and it certainly benefits their image to say "nope, this was just a generalist model, no special training at all", right??

Considering people can trip up ChatGPT with high school / 101 level college math problems (basic errors that bear out the LLMs don't "calculate" based on rules, but infer answers based on patterns). So.... YEAH. I'll err on the side of not taking the word of a company who has demonstrated multiple ethical lapses (and a founder who is known to be [very manipulative] in his speeches and interactions), and go with the much more likely scenario based on [the well-documented fact that LLMs in highly specialized contexts are usually trained on highly specialized data sources]. But you do you.

[edits: clarifying point about Altman and specialized training; hopefully people don't just take his words or PR postings on their site at face value.... but clearly some do. Is what it is.]

StevenTMuc · 2026-06-01T13:12:09-0400

MilanKraft said:
The model (like all LLM models) basically ran many trial-and-errors and came up with a viable solution based on related things it had trained on, and patterns it had found. This is fine in the sense that, if an LLM can run a problem through 100s or 1000s of iterations that would take a many years off a human math guru's life, then by all means use the tool for that.

But in the end this does sound a bit like "throw a bunch of solutions against the wall (without really understanding what it's doing) and see what stuck," then the humans can clean it up into some type of theorem (not sure if that's the right word here but one gets the idea).

Agreed. The reaction to, in essence, a million monkeys smashing the keys on typewriters 24/7 putting out a paragraph from Romeo and Juliet word for word is a bit over the top imo, especially considering what it is costing us to have those monkeys typing away 24/7.

It's weird that they don't mention anywhere how many tries this took (I'm willing to bet it wasn't a one-shot).

wildsman · 2026-06-01T13:21:44-0400

MilanKraft said:
Ironic you follow up "so confidently too" by quoting what amounts to a press release.

Are you familiar with the concepts of marketing and promotional language? OpenAI is under no obligation to accurately describe what this model was, and it certainly benefits their image to say "nope, this was just a generalist model, no special training at all", right??

Considering people can trip up ChatGPT with high school / 101 level college math problems (basic errors that bear out the LLMs don't "calculate" based on rules, but infer answers based on patterns). So.... YEAH. I'll err on the side of not taking the word of a company who has demonstrated multiple ethical lapses (and a founder who is known to be a manipulative twat in his speeches and press interactions), and go with the much more likely scenario, based on how how LLMs work and how they tend to be trained in specialized fields, that this model was training solely or mostly on math content. How else would it identify the necessary patterns to arrive at a correct solution?

No one is asking you to take the word of OpenAI.

But you made a positive claim that implied you knew the truth - so don't hide your lie behind this facade of scepticism.

For reference, before you edit your post. I thought you'd just accept your mistake but this level of dishonesty needs to be moderated honestly -

MilanKraft said:
Its also important to note the model in question is not ChatGPT (what 99% of OpenAI users will have access to) — it's a specialized math model that was trained solely on a corpus of university math texts and validated papers are floating around out there.

graylshaped · 2026-06-01T13:24:49-0400

Uncivil Servant said:
On the one hand, this sounds a lot like the old spam post you used to see on poorly-moderated fora along the lines of "my friend quit her job to work from home, and now the makes over $5,000 a month using this one simple trick!"

I strongly suspect there is a reason why it sounds like spam marketing fluff.

Pablo_DC · 2026-06-01T13:27:22-0400

UserIDAlreadyInUse said:
OK, I'm getting a little worried now. AI was able to generate an answer to the problem, and I could barely understand the article.

Sounds like you understood more than me. What I don't understand, in addition to not understanding the article OR the stuff the article was about, is why the original problem was interesting to the mathematicians. I assume that in some future date some guy will yell Eureka and use this stuff make or unmake something like the field of prime numbers being used in crypto or somesuch. At any rate, I still learned a bit, which is generally all I can ask. Thanks.

(The article, much less the mathematical problem, was so far above my head that I gave up after a couple of paragraphs and a couple of images.)

graylshaped · 2026-06-01T13:31:12-0400

Uncivil Servant said:
Dude named after Lenin posts about his desire to "get rid of people". If you're not trolling, then I'd strongly suggest some introspection.

I was finding interesting the dude named after Lenin's confidence that he was among the enlightened and therefore suitable to cast judgment on the perspectives of others on how society's resources are best employed.

OpenAI’s math breakthrough played to AI’s strengths

Ars Praefectus

Ars Tribunus Militum

Ars Tribunus Militum

Ars Tribunus Militum

Ars Praetorian

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Tribunus Angusticlavius

Ars Praetorian

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Praetorian

Ars Tribunus Militum

Seniorius Lurkius

Ars Tribunus Angusticlavius

Smack-Fu Master, in training

Ars Legatus Legionis

Smack-Fu Master, in training

Ars Legatus Legionis

Ars Tribunus Militum

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Centurion

Ars Tribunus Angusticlavius

Ars Legatus Legionis

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Tribunus Angusticlavius

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Legatus Legionis

Wise, Aged Ars Veteran

Ars Legatus Legionis