Fine-tuning tests show "bias ... toward confidently representing the claims as true."
See full article...
See full article...
Except for all the times we've tried that, and it doesn't work? LLMs may refuse to follow their directives. it gets worse the longer the model runs. I would love to see someone actually fix that, but it seems inexorably tied to temperature.For something to be marked as false in a way that the LLM can ingest, it would have to be labeled as false in metadata, right? Because otherwise, when you're building the tokens and the statistical relationships, it's very easy for the negated words to fall out depending upon how you've set up the context. It's still true that the statements are statistically similar.
Setting up your context so it always knows where a fact comes from and gives a true citation is I think the only safe way forward for anything of importance.
Statistically similar is just not the same as true or accurate, and that it is able to plausibly pass for that 90% of the time only makes it more dangerous.
You are not an LLM. You are flawed, and useful. I believe in you.If it was the year 1995 and a coworker told you that AltaVista was useful, would you contradict him? Because internet search engines don't always return relevant results, or sometimes they point you to results that aren't factually accurate?
Would you make that person write a 3 page essay about how internet search engines work?
LLMs are undeniably useful. Or, rather, you can deny their usefulness, but you'll look like an idiot.
Things can be flawed but still useful.
Once again proving how embarrassingly dumb Gemini is.How about a simple test of understanding what a letter is and counting how many of them are in a word?
Here's a simple question I asked Gemini a few minutes ago.
How low do you think someone's IQ would have to be before a majority of them couldn't answer this question correctly (i.e. read a word and count to two and not past it)?
I don't agree with the way people are dismissing "statistics" as if that's a valid refutation to "LLMs are thinking*."You're accusing me of not reading or understanding your posts, and then you go and attribute this strawman nonsense to me?
I never "stated" or even implied any such thing.
My whole thing is that saying LLMs are "statistics" is reductive to the point of uselessness. Actually, even talking about "statistics" in the context of LLMs is worse than useless, because it gives stupid people the idea that all LLMs are doing is Bayesian inference or n-gram completion or similar.
Counting letters in words, displaying empathy and compassion when a person is in crisis, and not hallucinating sources.Let me know when you have an objective, falsifiable alternative measure of 'intelligence'/'understanding' and then we can talk.
In the meantime, people who are using operational definitions to continue their work are going to do just that.
This is nobody's first time around the shed with wildsman. I used to think they were reasonable too, and their core point that LLMs are a kind of simulation of thinking has merit. But wildsman is way out there. They think current LLMs, as they are now, are living, thinking entities with agency.Throw that in the bucket of nonsense that people say you're arguing but you aren't.
"Conscious," "self-aware," "alive," and now positive "net utility."
It's just strawmen all the way down.