Anonymous chatbot that mystified and frustrated experts was OpenAI's latest model.
See full article...
See full article...
57.15%I don't play any competitive sports that use ELO rankings, but if I'm understanding this table correctly, a 50 point gap is more like "the new model was judged as better 7.15% of the time".
...
How do you think humans learn? They train on existing content. Should an artist have to pay Getty Images every time they see one of its pictures somewhere online?No, but AI can come up with them for cheaper than a Getty Images subscription!*
* because they trained the model on Getty Images without paying**
** okay, I don't condone it, but I can kind of understand this one
Well, there is now..boy there really is a stock image for almost anything, eh?
Wait till you see the Human leaderboard that they keep.Wait... there's such a thing as a chatbot ... leaderboard? wtf?
Have to disagree.The thing is, that's already happened. And not in the last year, or the last few years -it's been the case pretty much since, well, since people.
Fair point but I don't think using elo for gymnastics would be weird and that has subjective scoring like this system. Granted the judges are trained but since llms are supposed to work for untrained users it doesn't seem weird to me to evaluate them based on the response / experience of untrained users, at least as one benchmark. (obviously there should be safety evaluation by the manufacturers)Chess has objective outcomes. What is the objective test for chatbots?
Ah, you must be talking about China?Wait till you see the Human leaderboard that they keep.
It's going to be much weirder in a couple of years when the chatbots form their own ELO rankings of people.Wait... there's such a thing as a chatbot ... leaderboard? wtf?
You seem to be talking about the explosion of cable media, which preceded the right wing media rise, and which initially was mainly radio. The universal cable channel support for the Iraq war shows the Washington consensus still had a vice grip on public dialog at that time, dissenters like Donahue were kicked off the air. The thing that really changed this was the internet and social media.Have to disagree.
From the time I was born in the mid-60's until - oh, the explosion of right-wing media, vast majority of Americans agreed on general facts. This might have been a unique time period but it was the norm for many. IMO this was due to the limited means of media distribution. There were only 3 TV channels and those 3 channels expressed similar attitudes about rule of law, the value of democracy, etc.
The technology is now there to absolutely maximize fragmentation. Because of the stochastic nature, each query to an LLM produces a unique answer, even to identical input queries. Even if we simply disregard intentional biassing, we're all about to get served at least slightly different information from now on. (If ChatGPT replaces or interfaces to classic search). And think about what happens if everyone can start generating personal entertainment media...Have to disagree.
From the time I was born in the mid-60's until - oh, the explosion of right-wing media, vast majority of Americans agreed on general facts. This might have been a unique time period but it was the norm for many. IMO this was due to the limited means of media distribution. There were only 3 TV channels and those 3 channels expressed similar attitudes about rule of law, the value of democracy, etc.
“Bring out the gimp”.What's wrong with Homework Gimptm? Some random mommy blog rated it the best learning aid of the year.
Especially with a deadline of no more than a minute or two in which to research and compose their answer.And what is the ELO of an average human?
I do think that the limited number of tv channels had a huge impact on society. But I'm not sure it was all that rosy either, I wasn't born then, but I heard about this whole McCarthy thing, and, as a Bob Dylan fan, I often think about the lyrics of "Talkin’ John Birch Paranoid Blues".Have to disagree.
From the time I was born in the mid-60's until - oh, the explosion of right-wing media, vast majority of Americans agreed on general facts. This might have been a unique time period but it was the norm for many. IMO this was due to the limited means of media distribution. There were only 3 TV channels and those 3 channels expressed similar attitudes about rule of law, the value of democracy, etc.
Chat bots are big step down in the realm of automated differential equation solvers.That is, the new bot could be drastically better (or worse!) at solving differential equations, but given that most people won't ask about something that hard...
Perfect metaphor for the de-humanizing aspect of this technology.Gotta say, that might be the weirdest stock image I have seen all year.
I noticed thatI mean... for the photoshoot they didn't even bother to zip up the suit.
Now that I think about it, they probably had trouble breathing. That kind of fabric is somewhat air-tight unless it is stretched enough, and it looks plenty loose.I noticed that![]()
Sure, but the bot can use one of those on the backend. Or even the main chat bot could delegate the task to a different bot that is specifically trained on using sci/math packages.Chat bots are big step down in the realm of automated differential equation solvers.
Eurovision?Wait till you see the Human leaderboard that they keep.
More than that, they're being stupid. People can read something like Seeing Like a State all they like in school, then take away absolutely nothing from it...I don't understand one thing -- if "AI experts" are "frustrated" about non-transparency and non-scientific aspects of LMSYS process then stop complaining about your hurt feelings and make your own objective test. Make it now. Nobody stops you. We can have many many tests. It's probably going to cost money and would require an effort and careful planning, but making a better version of things always does. Versus just criticizing something others have built, tempting though that might be.
And once you release new Awesome Fully Scientific Chatbot Scoring System, tell everyone about it, explain how much better it is and you will be in charge of "the vibe check".
If you can't make one because of a never-ending fight over what would be "an objective" test then it's your problem. Consider "good enough" aspects versus elusive perfect ones. Public will gladly use test that is "good enough" instead of waiting 10+ years before researchers finish duking it out about methodology.
Have to disagree.
From the time I was born in the mid-60's until - oh, the explosion of right-wing media, vast majority of Americans agreed on general facts. This might have been a unique time period but it was the norm for many. IMO this was due to the limited means of media distribution. There were only 3 TV channels and those 3 channels expressed similar attitudes about rule of law, the value of democracy, etc.
Chess has objective outcomes. What is the objective test for chatbots?
I'd kind of expect customer-service-like positions to improve in quality. At least after they figure out how to use the technology.The steam loom wasn't better, it was lower cost per unit. Sure, it cranked out an inferior product, but the bit of that cost drop that got passed on to the consumer meant they were willing to tolerate it. Where do you think it's gonna go when the labor cost can be dropped to near-zero for what are perceived as pure cost positions, no matter how bad quality degrades? How low are you willing to see customer satisfaction drop if it saves you 98% of the cost of an entire division?
Why? At least with a human, you can occasionally push them out of the scripted loop. A chatbot simply can't. "Sorry, even though we clearly screwed you and something sapient can easily determine this edge case is beyond the pale, this is what the policy says. If you'd like to further dispute this outcome, please e-mail sitandspin@utilitycompany.com between the hours of 3pm and 5pm on alternating Thursdays. Is there anything else I can help you with? Is there anything else I can help you with? Is there anything else I can help you with?" Think infuriating IVR, but with less shouting single keywords and more shouting conversationally.I'd kind of expect customer-service-like positions to improve in quality.
I suspect, with a little effort, you could get an LLM to identify what counts as "beyond the pale" better than a typical trained monkey. The question is more how it would be rolled out.Why? At least with a human, you can occasionally push them out of the scripted loop. A chatbot simply can't. "Sorry, even though we clearly screwed you and something sapient can easily determine this edge case is beyond the pale, this is what the policy says. If you'd like to further dispute this outcome, please e-mail sitandspin@utilitycompany.com between the hours of 3pm and 5pm on alternating Thursdays. Is there anything else I can help you with? Is there anything else I can help you with? Is there anything else I can help you with?" Think infuriating IVR, but with less shouting single keywords and more shouting conversationally.
I'd be curious to know which LLMs responded to the phrase "beyond the pale" with "that's racist".I suspect, with a little effort, you could get an LLM to identify what counts as "beyond the pale" better than a typical trained monkey.
I'm just waiting for the day a midsize company uses an AI as its CEO just barely successfully enough so that the owners of larger companies start eyeing their CEOs and questioning if the hundreds of millions of dollars for a single human is worth it.It's not so much that these things are not and will never be useful. That's absurd. Tied in to what you're saying, what we're going to see is bean counters and C-suite types that have huffed the hype cycle and are going to deploy unfit technology that's "good enough", and that will become the new baseline. How much of customer service has migrated from IVR to chatbots?
The steam loom wasn't better, it was lower cost per unit. Sure, it cranked out an inferior product, but the bit of that cost drop that got passed on to the consumer meant they were willing to tolerate it. Where do you think it's gonna go when the labor cost can be dropped to near-zero for what are perceived as pure cost positions, no matter how bad quality degrades? How low are you willing to see customer satisfaction drop if it saves you 98% of the cost of an entire division?
Oh, sweet summer child.Plus at least a computer won't sexually harass employees, adding more cost savings
An impressive achievement nobody was really asking for. Nobody actually wants to talk to a fucking chatbot even if it can chuckle at probabilistically determined times. It's a novelty and a toy, but there's only so many times you can crank the music box before the clown popping out the top loses its punch. So what dos it do? AI is a tool. It's part of a workflow. It's not the workflow itself, it's not the product, and it's not the goal - except in the minds of weirdos like Sam Altman and Marc Andreesen. So once everybody gets bored with it, what does -4o do?
False equivalencies. Those are all technologies that fill a need, A bluetooth headset offers convenience. Videochat has always had plenty of appeal, for obvious reasons. Siri lets you control your phone (badly) if your hands aren't free. With the exception of Siri, which in my experience people mostly tolerate to send texts while driving, all those examples serve a concrete need and use case. They do something. What does a chuckling chatbot.....do?How old are you? Are you willing to learn from experience?
In MY time people have said
- no-one would be willing to talk into a bluetooth headset
- no-one would be willing to use videochat
- no-one would be willing to talk to their phone (eg Siri, then things like Alexa)
Good thing I'm not making that argument, then. I don't think ChatGPT is bullshit because it's a new UI. I think it's bullshit because it's not actually a UI at all. You're not interfacing with anything but a mindless probabilistic generator of bullshit that sounds vaguely like it's written or spoken by a human, until it doesn't. Businesses keep getting bitten in the ass trying to use it for customer service because it keeps giving customers wrong information. Lawyers get reamed out by judges because the model makes up nonexistent precedent. Students get accused of plagiarism. Every model hallucinates, at some point, and every one of them is functionally hamstrung by it.To insist that talking to a device ala ChatGPT will never work because of the reasons you give shows a remarkably clueless attitude to the history of technology. I'm prepared to entertain serious arguments about the value (or not) of ChatGPT, but I'm not going to listen further to an argument that's essentially
"this is a new UI, and new UI's will never succeed, QED".
For whom? "I need all the money, and those pesky minimum wage laws keep these workers cutting into my profit. If only there were a way...."My point is that a glorified chat bot is not actually filling a need.
How do you think humans learn? They train on existing content. Should an artist have to pay Getty Images every time they see one of its pictures somewhere online?
In this case, it's obviously being sold as a new Alexa, which is already a chatbot that chuckles.With the exception of Siri, which in my experience people mostly tolerate to send texts while driving, all those examples serve a concrete need and use case. They do something. What does a chuckling chatbot.....do?