Amazon must solve hallucination problem before launching AI-enabled Alexa

EthosPathosLegos · Jan 14, 2025

If history is any guide, they will invest heavily, still have hallucinations, give into the sunk cost fallacy, hype the shit out of their new "Alexa AI", and ignore the inevitable tsunami of complaints from people whose Alexa routinely gives false information and accidentally orders 10000 widgets.

Windhaven · Jan 14, 2025

On one hand, I’m glad that at least one massive company seems to care about its products producing nonsense.
On the other hand… isn’t avoiding hallucinations basically impossible, given how LLMs are mainly very clever autocomplete? Even with some effort to provide sources like Google tries to, they’re terrible at picking up tone and cite sarcasm.

TinCoyote · Jan 14, 2025

I think they have seriously overestimated how much people want that.

I have Amazon Echos in my home....the majority of use is home automation activation, weather, and timers. My wife and I also use it to communicate between our two home offices, which are on different floors.

davijoh723 · Jan 14, 2025

At this point, whenever I see a new product release and the main feature is "now includes AI" I'm 100% not interested in that product. It's a step backwards IMO. Sure hope this trend of cramming AI into everything ends sooner rather than later.

afidel · Jan 14, 2025

When you throw that kind of time and resources at a problem and still can't make it work it just might be time to stop and reevaluate why we're doing the thing? Look around at your peers, none of them have managed to stop the problems you're facing, they've just rolled the crap out with the issues in place, and to what end? The industry has spent billions to get us slightly better chatbots that steal the work of others and make things up when they can't find a sufficiently good existing answer to your queries. Wallstreet so wants this tech to work out that they are rewarding companies for dropping vast sums of money into it, but what if you're the first company to point out that the emperor has no clothes and that the AI race is a waste of resources?

flyingember · Jan 14, 2025

AI has one big issue, it has no way to determine the reliability of a source.

It presents information as facts that aren't.

Right now the product is good at understanding language, intelligent it's not. Voice systems should be limited to returning vetted sources such as sports scores from a league website or the weather and limited control sets (like turning lights off and on)

We are nowhere near a LLM being AI. It's all marketing hype

jhodge · Jan 14, 2025

Regardless of whether or not (LLM) AI assistants are a good idea or not, I'm pleased that Amazon is aware of the reliability issue and focused on it before shipping.

'“The reliability is the issue—getting it to be working close to 100 percent of the time,” the employee added. “That’s why you see us . . . or Apple or Google shipping slowly and incrementally.”'

OTOH, I also wonder if it's something that can only be solved incrementally by incorporating feedback from real-world use, like self-driving can't be perfected on closed testing courses. Amazon might be better to release what they've got under a different brand - call it 'Bob' as a nod to history - and hard-code it to preface answers with "I'm still in testing, so please double-check this answer, but..."

Rommi · Jan 14, 2025

Isn't Alexa already losing tons of money? I'm sure throwing an expensive AI model at it will cause them to suddenly be profitable

xPutNameHerex · Jan 14, 2025

"Tech company trying to ensure their product actually works before releasing" is a strange headline, are we sure we got that right?

d_cooper · Jan 14, 2025

I hope they do manage to reach a point where they roll something out to the public just because I'm very curious to see if they will have to neuter it so drastically to make it 'trustworthy' that it is universally irritating to interact with. That's my bet.

Resolute · Jan 14, 2025

The problem both Google and Amazon have with these devices is that they want them to do things that the consumer is not interested in. I don't want a "proactive" assistant. I want a device that will wake up when I call it and do the basic shit that I want: Turn the lights on or off, turn the TV on or off, play music. Otherwise, just stay out of the way.

Of course, that one time purchase doesn't make enough money so they have to keep finding ways to be annoying.

Wheels Of Confusion · Jan 14, 2025

Rohit Prasad, who leads the artificial general intelligence (AGI) team at Amazon, told the Financial Times the voice assistant still needed to surmount several technical hurdles before the rollout.

This includes solving the problem of “hallucinations” or fabricated answers, its response speed or “latency,” and reliability. “Hallucinations have to be close to zero,” said Prasad. “It’s still an open problem in the industry, but we are working extremely hard on it.”

This wording gives them an out to abandon any generative-AI based products, actually. They known hallucinations aren't fixable, so at some point they can simply say "Well we tried but this is never going to be production-ready."

Sadly, I have no faith in that as an outcome.

One current employee said more steps were still needed, such as overlaying child safety filters and testing custom integrations with Alexa such as smart lights and the Ring doorbell.

“The reliability is the issue—getting it to be working close to 100 percent of the time,” the employee added. “That’s why you see us... or Apple or Google shipping slowly and incrementally.”

Personally I wouldn't uphold any of y'all as examples of slow, incremental, responsible AI service roll-outs.

MagicDot · Jan 14, 2025

What timing - I just removed all three Alexa device from my home. They've become an annoyance and AI will only make them worse. 100% agree with other posters - Amazon is seriously overestimating how much people want this crap.

richgroot · Jan 14, 2025

Because of the more personalised, chatty nature of LLMs, the company also plans to hire experts to shape the AI’s personality, voice and diction so it remains familiar to Alexa users, according to one person familiar with the matter.

I always thought that somewhere there must be a woman who sounds exactly like Alexa, and I feel for her kids!

"Clean your room, and when you are done with that, do your homework. Shall I set a timer for you?"

caeldan · Jan 14, 2025

Home automation is such a weird space to develop for, really.

1. Since it's in my home, it's something I want to own and not pay a subscription for.
2. The things I want it to do around the house for me, are not things that lend themselves to monetization(turn on/off lights, broadcast to another room in the house, control the thermostat, look up recipes, act as a radio).
3. The only caveat to the above 2 items is ease of activating entertainment (ie music and video in rooms I am in).
4. I can't really see a use case for 'AI' legitimately improving those items, outside of possibly knowing my entire pantry and providing me recommendations for things to make that I haven't tried but might like for dinner.

MobiusPizza · Jan 14, 2025

I know there is negative sentiment of AI in Ars community, but I for one hope Alexa has AI natural language processing capability. At the moment Alexa is dumb to a point of unusable.

I cannot ask it to add an event to my calendar.
I cannot ask it to turn off my air conditioner unless I say a very specific and unnatural phrase such as turn the downstairs thermostat to off . It doesn't understand even words like HVAC.

I can't tell it say like automatically turn off my Echo show screen at 10pm at night. There is simply no understanding of my instructions.

th3suffering · Jan 14, 2025

You know what, Alexa is so bad at just understanding something simple, how bad could the hallucinations be comparatively?

Alexa cant understand "turn off the downstairs lights" is functionally the same as "turn off the lights downstairs". Yes, i know one is grammatically better than the other, but functionally this is something simple

Bongle · Jan 14, 2025

Problem: Alexa is a money furnace, but people like it for setting alarms or adding to grocery lists
Amazon's solution: Increase the cost of running the service while also reducing accuracy

Like, if OpenAI can't make money off of ChatGPT (Altman recently said even the $200/mth tier loses them money) I don't see how Alexa is going to make money by making it cost Amazon 10x as much per query. The likely-reduced accuracy because of the hallucinations that are inherent to the chosen technology is icing on the cake.

josi_ok · Jan 14, 2025

It'll be interesting to see how Amazon's Echo / Alexa solution compares with Apple's HomePod / Siri. It looks like Apple is preparing new HomePods and Apple TVs. Apple seems to be trying to push a more distributed (on-device) computing solution. Amazon is apparently looking at central servers. Both sides seem to still have steep challenges ahead in deployment.

Personally, I don't see what Alexa will provide that I would be interested in. My own use-case, like 95% of the rest of it's users, is simply to play music, set timers, and home automation, with occasional Wikipedia-type queries. It does these adequately already.

pug fugly · Jan 14, 2025

davijoh723 said:
At this point, whenever I see a new product release and the main feature is "now includes AI" I'm 100% not interested in that product. It's a step backwards IMO. Sure hope this trend of cramming AI into everything ends sooner rather than later.

This! I'm currently shopping for a new TV but many manufacturers have ruled themselves out with this insane BS.

MichaelHurd · Jan 14, 2025

MobiusPizza said:
I know there is negative sentiment of AI in Ars community, but I for one hope Alexa has AI natural language processing capability. At the moment Alexa is dumb to a point of unusable.

I cannot ask it to add an event to my calendar.
I cannot ask it to turn off my air conditioner unless I say a very specific and unnatural phrase such as turn the downstairs thermostat to off . It doesn't understand even words like HVAC.

I can't tell it say like automatically turn off my Echo show screen at 10pm at night. There is simply no understanding of my instructions.

I think the negative sentiment toward AI is almost entirely toward Generative AI because companies are trying to use it for tasks it's not well-suited for, and thus very unreliable*. Using AI for language processing is essentially what Large Language Models were designed for (if I recall, LLMs were built for translation, which requires fast and accurate language processing), so I would expect it to be more reliable and thus accepted in that space.

* Also because the companies are, in common interpretation, stealing and reusing art and stuff like that.

ColdWetDog · Jan 14, 2025

Resolute said:
The problem both Google and Amazon have with these devices is that they want them to do things that the consumer is not interested in. I don't want a "proactive" assistant. I want a device that will wake up when I call it and do the basic shit that I want: Turn the lights on or off, turn the TV on or off, play music. Otherwise, just stay out of the way.

Of course, that one time purchase doesn't make enough money so they have to keep finding ways to be annoying.

'Your plastic pal that's fun to be with' was sarcasm. NOT a road map.

I am unsure what that has slipped past the tech bros. Of course, there is the whole problem with '1984' so I suppose I shouldn't be surprised.

deadman12-4 · Jan 14, 2025

I still find it funny they say "hallucinate" instead of "spews random bullshit".

Also, things like "hallucinations have been near zero" are meaningless. If there is a 0.3% chance that it spews bullshit, but does 10 million tasks a day, that means its gonna be making alot of mistakes every day.

Carewolf · Jan 14, 2025

josi_ok said:
It'll be interesting to see how Amazon's Echo / Alexa solution compares with Apple's HomePod / Siri. It looks like Apple is preparing new HomePods and Apple TVs. Apple seems to be trying to push a more distributed (on-device) computing solution. Amazon is apparently looking at central servers. Both sides seem to still have steep challenges ahead in deployment.

Personally, I don't see what Alexa will provide that I would be interested in. My own use-case, like 95% of the rest of it's users, is simply to play music, set timers, and home automation, with occasional Wikipedia-type queries. It does these adequately already.

You want to compare them to the worst "AI" assistant on the market, why??

OtherSystemGuy · Jan 14, 2025

“The reliability is the issue—getting it to be working close to 100 percent of the time,” the employee added. “That’s why you see us... or Apple or Google shipping slowly and incrementally.”

Exactly the reason I immediately disabled Apple's new Siri AI as soon as it rolled out. It's email categorization, summaries, and alerts were a complete mess - and about what I expected because the AI doing the work doesn't represent me and how I do things (and it's an auto complete AI, not a task learner).

When will this AI hype train end? As was mentioned earlier, these systems are basically VC cash furnaces. It can't be sustainable for much longer.

ubercurmudgeon · Jan 14, 2025

If you're not paying a monthly fee, and perhaps even if you do, then you are not the customer. Instead, every conversation you have in your own home is potentially a source of free training data for a corporation to improve its AI.

educated_foo · Jan 14, 2025

“[T]he most challenging thing about AI agents is making sure they’re safe, reliable, and predictable,” Anthropic’s chief executive, Dario Amodei, told the FT last year.

Well there's your problem! You trained an opaque 700-billion-parameter model on most of the internet, without first considering whether its output would be "predictable." The reason most people only use voice assistants to set timers and turn lights on and off is because that's all that they can predictably do. That's the same reason we program computers using languages defined by formal grammars and semantics instead of telling them to "add up the receipts" and expecting something reasonable.

Lexus Lunar Lorry · Jan 14, 2025

An enduring challenge for Amazon’s Alexa team—which was hit by major lay-offs in 2023—is how to make money. Figuring out how to make the assistants “cheap enough to run at scale” will be a major task, said Jared Roesch, co-founder of generative AI group OctoAI.

Options being discussed include creating a new Alexa subscription service, or to take a cut of sales of goods and services, said a former Alexa employee.

If regular Alexa couldn't generate any meaningful subscription revenue or sales commissions, does the team seriously expect GenAI Alexa (which is presumably far more expensive per query) to be different? It seems like Amazon is doubling down on a failed monetization strategy.

Tam-Lin · Jan 14, 2025

Saying "we want to eliminate hallucinations from LLMs" is a meaningless statement. All LLMs do is "hallucinate". When whatever they make up has some overlap with reality we then attribute intelligence to them, and when it doesn't, we call it a "hallucination", instead of what it really is, which is working as designed.

Bongle · Jan 14, 2025

deadman12-4 said:
I still find it funny they say "hallucinate" instead of "spews random bullshit".

Also, things like "hallucinations have been near zero" are meaningless. If there is a 0.3% chance that it spews bullshit, but does 10 million tasks a day, that means its gonna be making alot of mistakes every day.

And just like Google's "put glue on pizza" or "eat rocks to get your vitamins" or "use a hitachi magic wand on your children", your most-ridiculous outputs will be widely publicized.

Thomas Harte · Jan 14, 2025

davijoh723 said:
At this point, whenever I see a new product release and the main feature is "now includes AI" I'm 100% not interested in that product. It's a step backwards IMO. Sure hope this trend of cramming AI into everything ends sooner rather than later.

I would have agreed with you but we've crossed the threshold where 'AI' means absolutely anything the marketing department decides which may or may not be a step in any direction whatsoever.

E.g. any sort of eye tracking for camera autofocus now seems to be called AI; we had lesser versions of eye tracking autofocus at least a decade ago and nobody felt the need to call it AI but at some point the tag got attached. Including by Sony, which people generally regard as having the best eye-tracking autofocus in the industry.

I really think they just kept improving what they had and put a new label to it.

onychomys · Jan 14, 2025

I work for a very famous hospital in the American midwest. Yesterday we had an AI-in-pathology grand rounds and the speaker told a story about how he had an alexa hooked to his garbage disposal. Even if we set aside the Maximum Overdrive problem (and my goodness, talk about a place you don't want hallucinations!) with doing something like that, I have such a hard time believing that it's somehow easier or more convenient to say "Alexa, turn on the garbage disposal" than it is to just lean over and flip the switch yourself. I just don't get why you'd do something like that even if you were a giant home automation nerd.

dooferlad · Jan 14, 2025

Windhaven said:
On one hand, I’m glad that at least one massive company seems to care about its products producing nonsense.
On the other hand… isn’t avoiding hallucinations basically impossible, given how LLMs are mainly very clever autocomplete? Even with some effort to provide sources like Google tries to, they’re terrible at picking up tone and cite sarcasm.

Yep, the stochastic text generator will just output the next token bases on the statistical model that has been built up from training data (to set the model weights) and previous tokens (hidden prompt + user supplied prompt + tokens it has already outputted).

deadman12-4 · Jan 14, 2025

Bongle said:
And just like Google's "put glue on pizza" or "eat rocks to get your vitamins" or "use a hitachi magic wand on your children", your most-ridiculous outputs will be widely publicized.

Yup.
To be fair, Amazon has a higher bar to clear because Alexa can directly spend your money by ordering stuff, while Google can just hide behind lawyers when it tells a kid to commit suicide...

markgo · Jan 14, 2025

Carewolf said:
You want to compare them to the worst "AI" assistant on the market, why??

Funny how Apple’s caution seems to be borne out by Amazon’s experience.

Personally, I’d rather have tiny, incremental, predictable improvements in on-device agents with abilities to do real-world things.

There’s always ChatGPT if you want to…chat.

JoHBE · Jan 14, 2025

Soooooo many quotes to comment on to clarify what actually needs to be said, but I'll choose this one:

"“[T]he most challenging thing about AI agents is making sure they’re safe, reliable, and predictable,” Anthropic’s chief executive, Dario Amodei, told the FT last year."

So here's the thing: being reliable, predictable and (to some degree) safe, are all fundamental properties of anything we would call an "AI agent" They are not some optional "nice to have" qualities. They are implicit in the concept and term. So this quote translates to: "“[T]he most challenging thing about AI agents is making sure they’re an AI agent" (and not a make-pretend misleading sidekick wannabe that throws you under the bus at the most unexpected moments)

How long did the Ubiquitous Self Driving Cars delusion take before it fizzled out?? I'm getting impatient.

JMTronicHobbyist · Jan 14, 2025

jhodge said:
call it 'Bob' as a nod to history

Or maybe BOB because this trash is unleashing evil on the world

Therblig · Jan 14, 2025

th3suffering said:
You know what, Alexa is so bad at just understanding something simple, how bad could the hallucinations be comparatively?

Alexa cant understand "turn off the downstairs lights" is functionally the same as "turn off the lights downstairs". Yes, i know one is grammatically better than the other, but functionally this is something simple

It can be worse than that. Don't assume everyone's English is educated native quality. My German wife speaks very good English and has a large vocabulary, but still pronounces many words with a foreign accent. She is also prone to translating German literally, resulting in stuff like the old joke, "Throw Mama down the stairs her hat."

internetomancer · Jan 14, 2025

It's funny watching LLMs get smarter and smarter while still being unable to do the only thing anyone wants.

I feel like we're going to wind up accidentally building God trying to get a dumb speaker to tell us who won the football game.

Amazon must solve hallucination problem before launching AI-enabled Alexa

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Praetorian

Wise, Aged Ars Veteran

Ars Legatus Legionis

Ars Legatus Legionis

Ars Tribunus Angusticlavius

Seniorius Lurkius

Ars Centurion

Wise, Aged Ars Veteran

Account Banned

Ars Legatus Legionis

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Centurion

Ars Praefectus

Ars Centurion

Ars Tribunus Militum

Wise, Aged Ars Veteran

Ars Legatus Legionis

Account Banned

Ars Legatus Legionis

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Praetorian

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Praefectus

Ars Tribunus Militum

Ars Praetorian

Seniorius Lurkius

Account Banned

Ars Praefectus

Ars Praefectus

Ars Scholae Palatinae

Ars Centurion

Ars Tribunus Militum