New research shows highly inconsistent performance on a variety of physical reasoning tasks.
See full article...
See full article...
As a friend said, LLMs are great for demos but terrible for production. Some coworkers proudly showed off their vibe coded app, but then quietly admitted that it would take a year of work to get the app into a state where they would be willing to carry an on-call pager to support it.Experience with confabulating LLMs has also shown there's often a large gap between a model generating a correct result some of the time and an upgraded model generating a correct result all of the time.
And has this happened at some point?an upgraded model generating a correct result all of the time.
"are on a path to becoming unified, generalist vision foundation models." But digging into the actual results of those experiments, the researchers seem to be grading today's video models on a bit of a curve and assuming future progress will smooth out many of today's highly inconsistent results.
To me, the missing aspect is having a non-black-box model of the world, that can inform results and can be updated as necessary. It doesn’t have to interact with the world, just be able to understand it.One of the many crucial gaps between any “AI” and actually intelligent creatures is embodiment. This is more than simply “simulating a body” or even “having a body”. This is about being anchored to the world such that visceral changes to the world (which are constant) produce meaningful feedback to the entity in question.
This embedded feedback is vital because our intelligence isn’t just our conscious thought process. It flowers from our social and physical interactions with the actual world.
“AI” has nothing like that at all.
*edit for typo
If you give me twelve chances to answer a maths problem and I only solve it once, does that suggest I have the ability to solve the task?Kyle Orland said:For the rest, the researchers write that "a success rate greater than 0 suggests that the model possesses the ability to solve the task."
Wow… sprinting with those goalposts, and still can’t get there before the ball does.if it was thought impossible a few years ago you would ever solve a math problem, then yeah actually solving it once is a huge advancement
arguably, reasoning is an emergent property we don't understand, and so it might well emerge from whatever additional things they decide to do with the current tech.They do not "reason" about the world at large at all. No extension of this current tech will get them to reasoning.
Which is unfortunate for the AI, since his works are public domain.Feels a bit like someone gave the infinite monkeys on infinite typewriters a bit of a nudge in the right direction, but still not going to get Shakespeare
if it was thought impossible a few years ago you would ever solve a math problem, then yeah actually solving it once is a huge advancement
Does this have eigenvalues and eigenvectors?It's a crazy bubble full of accounting tricks to keep it afloat, there's a flowchart of the relationships between AI companies but it is so absurdly complex that it's unreadable, here's a table:
View attachment 119359
It's all money going in circles and there isn't enough available VC money or budgets available for these companies to build the infrastructure they say they need to build.
It’s coming soon, we promise! We just need another $500 billion to get there.And has this happened at some point?
Five year olds can generalize, and they can learn by watching others. Also, once five year olds learn to do something, they tend to be able to do it repeatedly. There is a developmental effect where people backslide in certain skills, but if the failure is stochastic, they should see a developmental expert or a neurologist.I agree with you I just want to play devil's advocate here - do human beings perform complex tasks correctly the first time, every time without instructions? Imagine asking a five-year-old to fry an egg without breaking the yolk, for instance. Would they get it right 3/5 times? 3/10? How many times would they break the yolk when they crack the egg? And you know even the successful ones will probably have some shell.
AI CEO: "Man, we'll be as profitable as casinos! An addictive product that only produces a happy output intermittently, but keeps people clicking and clicking!"Hey it got it right a couple of times, you just have to spam that CREATE button in Suno or reprompt Midjourney 132 times to get what you want....
This is "solving" problem like Google's old "I'm feeling lucky" button is "solving" search queries. Only it's like a search engine that fails 11 out of 12 times.if it was thought impossible a few years ago you would ever solve a math problem, then yeah actually solving it once is a huge advancement
Oh, fuck you...While the researchers acknowledge that Veo 3's performance is "not yet perfect," they point to "consistent improvement from Veo 2 to Veo 3" in suggesting that future video models "will become general-purpose foundation models for vision, just as LLMs have for language." And the researchers do have some data on their side for this argument.
That video was patently wrong, too. Paper burns from the source of flame, meaning the middle to the outsides, not evenly from the bottom up, and not that slowly, and with a lot more ash and blow-off from the hot gasses in the center.When asked to model a Bunsen burner turning on and burning a piece of paper, it similarly failed nine out of 12 times.
From my perspective, it seems that interacting with the world just is part of understanding the world. I totally agree about the non-black-box aspect, however.To me, the missing aspect is having a non-black-box model of the world, that can inform results and can be updated as necessary. It doesn’t have to interact with the world, just be able to understand it.
They were mostly trained on pictures, not on the real world of which the pictures are a flat projection. They have no actual concept of what the objects look like in 3D space.No, they are 2d models that we extrapolate 3d from. (well, kind of). Here is a good test: "a man punching his fist through a monitor." Try it in a few models. You might find it difficult to make "an arrow going through a laptop monitor" or similar, too. It can layer things on top of each other, but not intersect, since it's... a 2d model.
My oversimplification.
The hands with the ball? Seriously, one of the most telling giveaways it's AI is the slow speed of motion. AI can't handle 4K video, or even video at any resolution, in REAL TIME SPEED. So you get "robotic hands tossing ball in the air on the moon" footage, which would also be nonsense.
This is the thing that really pisses me off about all this. As I've mentioned here before, I own and operate an IT consulting business, albeit one that's just me nowadays, and used to own and operate a small beverage company with a couple dozen employees including some drivers. In either of these businesses, if I got caught pulling the same sort of circular bullshit with finances, I'd have been charged with fraud and rightfully so. I really don't understand how this stuff is different. The claimed end game of making literally more money than the entire world's GDP is literally not possible! That should absolutely be prosecuted as fraud. It's OK, though, because the folks doing it are wealthy?! Christ, I hate this timeline so much.It's a crazy bubble full of accounting tricks to keep it afloat, there's a flowchart of the relationships between AI companies but it is so absurdly complex that it's unreadable, here's a table:
View attachment 119359
It's all money going in circles and there isn't enough available VC money or budgets available for these companies to build the infrastructure they say they need to build.
Well, those are hard instructions to follow. You asked for a person shaking a rope that is 20 feet away.I prompted Gemini (veo) with the following:
"Create a photo realistic video of a person shaking a rope. The rope is tied to a tree at waist height and is about 20 feet away"
Here is a description of the result.
-- There was a segment of rope tied to a tree at roughly waist height. The "other end" of the rope was laying on the ground not connected to anything.
-- There was a person holding a segment of rope that was stretched out in front of them and they were standing about 20 feet away from the tree. The "other end" of their rope was also laying on the ground not connected to anything.
-- The tree was orthogonal to the direction the person was facing.
-- As the person shook their segment of rope (which basically just flopped around on the ground in front of them), the segment tied to the tree also was flopping around.
So much for physical accuracy...
Postscript.. I tried my prompt a 2nd time and it got it correct... so much for consistency![]()
Except fixed neural nets like these don't learn from their mistakes. They entirely lack that feedback loop.I agree with you I just want to play devil's advocate here - do human beings perform complex tasks correctly the first time, every time without instructions? Imagine asking a five-year-old to fry an egg without breaking the yolk, for instance. Would they get it right 3/5 times? 3/10? How many times would they break the yolk when they crack the egg? And you know even the successful ones will probably have some shell.