To be pedantic, LLMs and neural nets are just as “smart” as any other smart technology, they perform reasoning as well, and if conception can be given by novel rearrangement, then certainly their outputs lead to new concepts. It’s not anthromorphizing to describe them in these ways. “Them” refers to inanimate objects as well as people. This is a really clean take on AI that doesn’t talk about them feeling—it might be different if we were talking about AI being proud of its accomplishments.We really need to stop anthropomorphizing AI systems in language. They are not smart, cannot reason, cannot think, and cannot conceptualize anything. They are markov chain generators stacked on top of each other until the Amazon burns down entirely.
So it did the equivalent of a uniquely-wide metastudy that still required actual intelligence to have any (potential) value? Am I wrong or is the actual innovation the multi-domain breadth of the human-prompted search?The AI model cleverly applied existing ideas drawn from several subfields of mathematics to create a full proof. But it didn’t pioneer any genuinely new techniques. The result has since been cleaned up and extended by human mathematicians.
Hopefully mathematics don't end up like software engineering, where people thought this too, but the trend seems to be towards Warhammer 40k:Terence Tao seems to think that this is the new research pattern: human experts converse with models, models propose constructions - experts verify, interpret, and either expand on it (often in conversation with these models) - and finally, they verify and integrate them into the field.
C'mon man. Mathematicians fight about shit all the time. They're no less susceptible to hype and marketing than anyone else.Mathematicians can be are about as cold eyed and analytical as one can get and if they’re saying that this was done then it was as far as I’m concerned.
Yes, I've used that term before as well to refer to this tendency. I'm happy to see that more people are seeing the same thing - Terence Tao used exactly that term as well ('god of the gaps') in his latest paper:It looks as if human intelligence is going the way of God.
Theology has increasingly led to the "god of the gaps" - attempting to develop god-based explanations for things we cannot yet explain, only for them to get explained one after another.
The uniqueness of human intelligence seems to be going the same way, with one barrier after another falling.
The questions have to be, is there some bastion of human creativity that is beyond the reach of AI? If Baroque art or Beethoven had never existed, would some future AI somehow reproduce them? Would we get Haydn's Creation by sticking a copy of the Bible and Catholic liturgy into the training data? And would it matter if we didn't.
I would like the answers to be yes, no, no, yes. But I can't prove it, and perhaps some self-serving future AI will provide convincing reasons why I am wrong.
And as AI performance continues to advance, such a human-chauvinistic viewpoint risks degenerating into an increasingly untenable “god of the gaps” philosophy, in which an ever-shrinking list of qualities are touted as indicators of essential human achievement that AI is still not yet able to replicate."
Yes this is a serious risk. I'll give you a real life analogy with chess - at one point, grandmasters used to be good enough to verify moves suggested by AI but since AlphaZero, this is no longer the case.Hopefully mathematics don't end up like software engineering, where people thought this too, but the trend seems to be towards Warhammer 40k:
- The boss tells the
engineerengineseer to do something- The engineseer chants holy mantras to the sacred machine
- The machine produces something that no human understands
Jfc the snark of people who get pulled in by marketing without knowing how the underlying technology works is boundless.The truth hurts feelings, apparently.
I said “can be” !!! Not “ARE”.C'mon man. Mathematicians fight about shit all the time. They're no less susceptible to hype and marketing than anyone else.
Unless I missed a comment somewhere, you're mischaracterizing most of the "LLMs can't reason" sentiments here. While many speak out here, that's because they're knowledgeable about the domain, rather than just being some rando Farcebook group where people are against something to be against it / to be edgy / whatever stupid motives they have.I’m finding what is interesting is the human reaction to all this. The people who are extremely anti-AI are just shaking their heads left and right saying “this didn’t happen. [snip]... It can’t reason it can’t reason it can’t reason.”
well… It can and it did.
I am as skeptical about AI as a next person, but I’m not about to ignore evidence [snip]
You've gotta get hoovered up by one of the behemoths like Goldman who've tried to do that and reported near-zero benefits.It's been ~3 years from the release of ChatGPT to the general public. This technology is just getting started.
Sitting here in front of my box running 10x frontier agents who are supervising another ~30 sub agents running slightly less capable frontier models. For $20/day. Cranking out high quality code at a rate that completely blows my mind. I used to employ rooms full of developers for hundreds of thousands of dollars per month to do 1/100th as much ( or less ).
Yes, but isn’t that basically what people do too? They try different methods and see which one works. How is this fundamentally different?IMHO you're mischaracterizing most of the "LLMs can't reason" comments. Nobody is saying "it didn't happen," they're saying things like this happen in a context different than an LLM understanding and reasoning in the way humans do. And while the author here did an OK job of not anthropomorphizing the model (I think only one allusion to "reasoning"), it's objectively true these companies are misleading people about what these systems are (and are not) when naming their features, functions, and speaking publicly about them (obviously to drive up capital infusions and ultimately as large an IPO as possible).
Here is my overall take on what this article is saying:
The model (like all LLM models) basically ran many trial-and-errors and came up with a viable solution based on related things it had trained on, and patterns it had found. This is fine in the sense that, if an LLM can run a problem through 100s or 1000s of iterations that would take a many years off a human math guru's life, then it's OK for them to use that tool.
But in the end this does sound a bit like "throw a bunch of solutions against the wall and see what stuck, then the humans can clean it up into some type of theorem (not sure if that's the right word here but one gets the idea)."
Its also important to note the model in question is not ChatGPT; it's a specialized math model that was trained on whatever corpus of university math texts and validated papers are floating around out there. Which is also fine, but in the end it still is not (to borrow a ridiculous phrase used by Sam Altman a couple years ago that brings my comment full circle) "a math PhD in your pocket". It doesn't "know math" per se, it simply finds patterns in a very specialized set of training data. Again this is not useless, but it is also not "working the problem" in the way humans do AFAICT.
But but... Daddy altman promised me I could turn off my brainIt is important to note that a large part of the reason that a mathematical LLM like this or a protein LLM or any other science-based LLM works is because the data set has been scrupulously cleaned and QCd. For example, if someone had slipped π=3 into the training data set, the output would have had quite a few errors in it.
In contrast, the average LLM is trained on all sorts of nonsensical data (see: the internet) and so the LLM outputs all sorts of nonsense (GIGO, as we used to say back in the day when we carved the symbols by hand on clay tablets).
And, unlike a person, a LLM is incapable of deleting training data that is erroneous. As a result, those bad inputs end up creating bad outputs; sometimes in obvious ways, sometimes in not so obvious ones.
And that is why LLMs are good as research tools but not for much more. Because in research, the user is usually smart enough to know the limitations of the LLM and wise enough not to take its advice about using glue to hold the cheese on pizza. But in general use, those two qualifiers are more the exception than the rule.

Yes, but isn’t that basically what people do too? They try different methods and see which one works. How is this fundamentally different?
It is fundamentally different because the human mathematicians actually understand the rules, theorems, and other concepts [in a given math domain], and literally think-and-apply their way through the variables and boundaries of a problem. IOW, they are aware of what they are doing and WHY the rules they are applying work.Yes, but isn’t that basically what people do too? They try different methods and see which one works. How is this fundamentally different?
Umm... No. This wasn't a 'mathematical LLM'It is important to note that a large part of the reason that a mathematical LLM like this or a protein LLM or any other science-based LLM works is because the data set has been scrupulously cleaned and QCd....
In contrast, the average LLM is trained on all sorts of nonsensical data (see: the internet) and so the LLM outputs all sorts of nonsense (GIGO, as we used to say back in the day when we carved the symbols by hand on clay tablets).
Where do you guys get your info from? You spout such misinformation so confidently too...Its also important to note the model in question is not ChatGPT (what 99% of OpenAI users will have access to) — it's a specialized math model that was trained solely on a corpus of university math texts and validated papers are floating around out there.
Uh... you said both? Might wanna re-read that sentence.I said “can be” !!! Not “ARE”.
And in 6 months my baby will weigh 7.5 billion lbs.We tend to overestimate near term disruption and under estimate long term disruption. If AI is doing what it is doing today - less than 3 years from going mainstream - I can only imagine what 30 years will bring us.