New results suggest Mythos' cyber threat isn't "a breakthrough specific to one model."
See full article...
See full article...
In a recent interview with the Core Memory podcast, OpenAI CEO Sam Altman criticized what he calls “fear-based marketing”
Every time one of you guys perpetuates the idea (intentionally or otherwise)...
"general improvements in long-horizon autonomy, reasoning, and coding,"
Some of us are going to remind you LLMs can't reason by any plausible definition. When you do this, you are basically acting as a kind of PR amplifier for whichever LLM developer you're writing about. Until such time as there is a truly novel development in this space that indicates actual reasoning going on, please stop. I'm beggin' ya.
(Sadly, I expect this kind of thing from television news outlets, or more generalist online outlets who don't know any better, but given the technical chops of its staff, Ars should be better than this.)
Some people argue, incorrectly in my opinion, that the world is flat. So should Ars give them the benefit of the doubt? Or should we keep an open mind?The problem is that a lot of people have trouble distinguishing between reasoning and a description of reasoning. And a lot of people argue (incorrectly, in my opinion) that there isn't a difference between the two.
However, I think this is a quote from AISI that you're taking objection to? So it's not really Ars's fault here, other than perhaps a lack of challenge to a contentious idea.
OpenAI CEO Sam Altman criticized what he calls “fear-based marketing” in promoting limited releases for certain AI models. While he said he’s “sure Mythos is a great model for cybersecurity,” he added that “it is clearly incredible marketing to say, ‘We have built a bomb. We are about to drop it on your head. We will sell you a bomb shelter for $100 million.’”
Apple just released their latest software update with claude.md, so it's safe to say Apple is using it internally to write their code. They just removed it when caught though to cover their tracks.Do any of these "models" do anything valuable yet?
Some people argue, incorrectly in my opinion, that the world is flat. So should Ars give them the benefit of the doubt? Or should we keep an open mind?
OpenAI CEO Sam Altman criticized what he calls “fear-based marketing” in promoting limited releases for certain AI models.
It's almost as if they are all sociopathic liars, grifters, and hypocrites. Who could possibly have guessed?Pot meet kettle, he was peddling the same fear based marketing not too long ago himself.
"Fear based marketing is bad as long as I'm not the one doing it. If I'm doing it, it's totally fine." - Sam Altman probably.“There will be a lot more rhetoric about models that are too dangerous to release,” Altman continued. “There will also be very dangerous models that will have to be released in different ways.”
Eh, they're more similar than if one scores 60% on the tests and the scores 0%. Example: several years ago I compiled clFFT for both Nvidia and Qualcomm GPUs. I ran the tests that come with the clFFT code. Nvidia passed all the tests. Qualcomm only passed half of them, making it unusable, not very similar. (The Qualcomm gpu's barrier function seemed to be broken, so anything requiring a local thread barrier -- syncthreads in CUDA -- couldn't be trusted. clFFT needs the barrier when processing FFTs with sizes that aren't powers of 2.)Benchmarks aren't reality. Hype also isn't reality. But the tendency to assume because A and B are similar on benchmarks, that that implies they are similar in the real world is not a very safe tendency IMO.
I've been using it a bit at work, and it has made writing tests and some functions a bit easier, it's pretty good at spelunking areas that you're unfamiliar with much faster than you can. But you still have to verify all of what it does.Do any of these "models" do anything valuable yet?
When hooked up to a harness like Claude Code or VS Studio or Claude Cowork, people use this to conduct lots of common white collar employment tasks. I use Claude Cowork every day to build excel models and distill insights into PPT decks.Do any of these "models" do anything valuable yet?
There's some speculation that the whole 'too dangerous to release' was (aside from hype) an excuse to cover that Mythos is too expensive to release (i.e. they'd lose money a lot faster than they are on the current models).Pot meet kettle, he was peddling the same fear based marketing not too long ago himself.
For many years I worked in this industry, and for several years I worked literally on the team that made the benchmarks fast and also made code run fast on important industry customers. We typically led the industry in the late 2000s in benchmark scores for things like Viewperf, which is as far as I'll go toward naming the company. The two types of work (and even frankly the code, though this was kept pretty quiet) were completely separate. Do with that information what you will.Eh, they're more similar than if one scores 60% on the tests and the scores 0%. Example: several years ago I compiled clFFT for both Nvidia and Qualcomm GPUs. I ran the tests that come with the clFFT code. Nvidia passed all the tests. Qualcomm only passed half of them, making it unusable, not very similar. (The Qualcomm gpu's barrier function seemed to be broken, so anything requiring a local thread barrier -- syncthreads in CUDA -- couldn't be trusted. clFFT needs the barrier when processing FFTs with sizes that aren't powers of 2.)
That's why customer "vibes" are more important these days. Sadly anything measured will be gamed. Even more sad is the rampant bot astroturfing to try to sway public opinion on vibes.For many years I worked in this industry, and for several years I worked literally on the team that made the benchmarks fast and also made code run fast on important industry customers. The two types of work (and even frankly the code, though this was kept pretty quiet) were completely separate. Do with that information what you will.
In my opinion the issue isn't whether they can do anything valuable in isolation (they can). The issue is whether what they can do is, on balance, worth the total cost to society.Do any of these "models" do anything valuable yet?
Of course he was, and now he's saying, 'sure we'll release a model that can hack corporate networks, but don't worry about it.' At least Anthropic pretends to be responsible.Pot meet kettle, he was peddling the same fear based marketing not too long ago himself.
FTFY. The cost to society should matter, but it does not. It only matters that it makes someone money. If cost to society were enough to stop stupid things, then we wouldn't still be polluting the atmosphere or burning down the rainforest.In my opinion the issue isn't whether they can do anything valuable in isolation (they can). The issue is whether what they can do is, on balance, worth the total cost tosocietythe elite.
Do any of these "models" do anything valuable yet?
When hooked up to a harness like Claude Code or VS Studio or Claude Cowork, people use this to conduct lots of common white collar employment tasks. I use Claude Cowork every day to build excel models and distill insights into PPT decks.
I basically don't need analyst-level employees anymore.
Awesome, so macOS can't be registered for copyright protection any more in the US, since they aren't disclosing which parts are generated by LLM.Apple just released their latest software update with claude.md, so it's safe to say Apple is using it internally to write their code. They just removed it when caught though to cover their tracks.
AI people projecting that is why we're almost at the point of pitchforks and torches at new data center builds. Despite no new proof that Gen AI is even possible.Once we have Gen AI, anyone not in the supply chain for semiconductors or working at a frontier lab will be a cost to society and we will have think long and hard about whether we want to bear that cost.
You can't say good things about AI in general and OpenAI in particular here on Ars.It's sad how far Anthropic has fallen. Their developer perception and goodwill has plummeted over the past month. They lied about nerfing 4.6, they lied about the capabilities of 4.7, and their API is consistently unstable.
Dario hyped up Mythos like it was the second coming of SkyNet, when the truth came out that 5.5xhigh is literally better than Mythos at cyber security.
Not only is Codex 5.5 xhigh better than Opus 4.7, the plan limits and API token price are way more generous. I can run Codex 5.5 xhigh all day and get nowhere near the $200 plan limits with 10 chats going simultaneously. I only keep my Claude max20 subscription around for Claude Design.
Once we have Gen AI, anyone not in the supply chain for semiconductors or working at a frontier lab will be a cost to society and we will have think long and hard about whether we want to bear that cost.
Despite no new proof that Gen AI is even possible.
If you cannot figure out how to use these agents for coding, I don't know what to say.Maybe with all these advancements, these models will finally be able to find the code I have open in my editor on the first try.
Even on a mature, large codebase, they do inexplicable things sometimes. Like when building some new UI elements, 90% of the time it follows the patterns in the existing code. 10% of the time it does... Something else. It's like a talented junior programmer with ADD. Mostly it's brilliant, but sometimes you're just like "wtf is this non working bullshit?" And you don't really know which one you'll get for any particular prompt; the brilliant, or the bullshit.I've been using it a bit at work, and it has made writing tests and some functions a bit easier, it's pretty good at spelunking areas that you're unfamiliar with much faster than you can. But you still have to verify all of what it does.
Thank you. I have been here for decades and I have never seen such bias against a new technology, nevermind one of the most important advances in computer science... ever.You can't say good things about AI in general and OpenAI in particular here on Ars.
Everything that LLMs stand for is evil/exploitation/bad/ugly by definition according to the audience here.
Never mind that millions of people nowadays use LLMs to simply diagnose and treat themselves to stay alive. Never mind that even tens of millions of Americans cannot afford healthcare.
"LLMs are bad, period."
"LLMs don't reason, period."
"LLMs only synthesize and hallucinate and they do it by using what human beings have created."
"If you use LLMs, you're extremely incompetent and don't deserve your job."
Now you can leave proper comments here. "Oh, and Sam Altman is the worst."
BTW is AlphaFold also ... bad? And a Nobel prize given for it? Ah, never mind. That's a bit confusing. And all the other AI applications.
BTW, here's a nice news piece:
https://www.science.org/content/article/ai-starting-beat-doctors-making-correct-diagnoses
Too bad Ars won't run a story on it.
Sure, and when the AI finds it there's a 50% chance it will just delete all your code.Maybe with all these advancements, these models will finally be able to find the code I have open in my editor on the first try.
It’s mostly defensiveness about the economic/social devaluation of skills built over decades that are a part of their core identity.Thank you. I have been here for decades and I have never seen such bias against a new technology, nevermind one of the most important advances in computer science... ever.
There are so many interesting stories about how various bio/physics/chem etc labs, pharma, aerospace, energy grid companies are modifying and using LLMs. The different ways that next token prediction can be harnessed is completely unexpected and fascinating. It doesn't matter if it is technically 'reasoning' or 'thinking'. But at Ars there is no exploration of that, it’s just an echo chamber of psychosis.