Amazon must solve hallucination problem before launching AI-enabled Alexa

EthosPathosLegos

Smack-Fu Master, in training
63
If history is any guide, they will invest heavily, still have hallucinations, give into the sunk cost fallacy, hype the shit out of their new "Alexa AI", and ignore the inevitable tsunami of complaints from people whose Alexa routinely gives false information and accidentally orders 10000 widgets.
 
Upvote
194 (195 / -1)

Windhaven

Smack-Fu Master, in training
53
On one hand, I’m glad that at least one massive company seems to care about its products producing nonsense.
On the other hand… isn’t avoiding hallucinations basically impossible, given how LLMs are mainly very clever autocomplete? Even with some effort to provide sources like Google tries to, they’re terrible at picking up tone and cite sarcasm.
 
Upvote
170 (174 / -4)

afidel

Ars Legatus Legionis
18,164
Subscriptor
When you throw that kind of time and resources at a problem and still can't make it work it just might be time to stop and reevaluate why we're doing the thing? Look around at your peers, none of them have managed to stop the problems you're facing, they've just rolled the crap out with the issues in place, and to what end? The industry has spent billions to get us slightly better chatbots that steal the work of others and make things up when they can't find a sufficiently good existing answer to your queries. Wallstreet so wants this tech to work out that they are rewarding companies for dropping vast sums of money into it, but what if you're the first company to point out that the emperor has no clothes and that the AI race is a waste of resources?
 
Upvote
105 (111 / -6)
AI has one big issue, it has no way to determine the reliability of a source.

It presents information as facts that aren't.

Right now the product is good at understanding language, intelligent it's not. Voice systems should be limited to returning vetted sources such as sports scores from a league website or the weather and limited control sets (like turning lights off and on)

We are nowhere near a LLM being AI. It's all marketing hype
 
Upvote
95 (101 / -6)

jhodge

Ars Tribunus Angusticlavius
8,661
Subscriptor++
Regardless of whether or not (LLM) AI assistants are a good idea or not, I'm pleased that Amazon is aware of the reliability issue and focused on it before shipping.

'“The reliability is the issue—getting it to be working close to 100 percent of the time,” the employee added. “That’s why you see us . . . or Apple or Google shipping slowly and incrementally.”'

OTOH, I also wonder if it's something that can only be solved incrementally by incorporating feedback from real-world use, like self-driving can't be perfected on closed testing courses. Amazon might be better to release what they've got under a different brand - call it 'Bob' as a nod to history - and hard-code it to preface answers with "I'm still in testing, so please double-check this answer, but..."
 
Upvote
12 (13 / -1)
The problem both Google and Amazon have with these devices is that they want them to do things that the consumer is not interested in. I don't want a "proactive" assistant. I want a device that will wake up when I call it and do the basic shit that I want: Turn the lights on or off, turn the TV on or off, play music. Otherwise, just stay out of the way.

Of course, that one time purchase doesn't make enough money so they have to keep finding ways to be annoying.
 
Upvote
90 (92 / -2)

Wheels Of Confusion

Ars Legatus Legionis
75,398
Subscriptor
Rohit Prasad, who leads the artificial general intelligence (AGI) team at Amazon, told the Financial Times the voice assistant still needed to surmount several technical hurdles before the rollout.

This includes solving the problem of “hallucinations” or fabricated answers, its response speed or “latency,” and reliability. “Hallucinations have to be close to zero,” said Prasad. “It’s still an open problem in the industry, but we are working extremely hard on it.”
This wording gives them an out to abandon any generative-AI based products, actually. They known hallucinations aren't fixable, so at some point they can simply say "Well we tried but this is never going to be production-ready."

Sadly, I have no faith in that as an outcome.

One current employee said more steps were still needed, such as overlaying child safety filters and testing custom integrations with Alexa such as smart lights and the Ring doorbell.

“The reliability is the issue—getting it to be working close to 100 percent of the time,” the employee added. “That’s why you see us... or Apple or Google shipping slowly and incrementally.
Personally I wouldn't uphold any of y'all as examples of slow, incremental, responsible AI service roll-outs.
 
Upvote
31 (32 / -1)

richgroot

Smack-Fu Master, in training
62
Subscriptor++
Because of the more personalised, chatty nature of LLMs, the company also plans to hire experts to shape the AI’s personality, voice and diction so it remains familiar to Alexa users, according to one person familiar with the matter.

I always thought that somewhere there must be a woman who sounds exactly like Alexa, and I feel for her kids!

"Clean your room, and when you are done with that, do your homework. Shall I set a timer for you?"
 
Upvote
17 (17 / 0)

caeldan

Ars Scholae Palatinae
1,084
Home automation is such a weird space to develop for, really.

1. Since it's in my home, it's something I want to own and not pay a subscription for.
2. The things I want it to do around the house for me, are not things that lend themselves to monetization(turn on/off lights, broadcast to another room in the house, control the thermostat, look up recipes, act as a radio).
3. The only caveat to the above 2 items is ease of activating entertainment (ie music and video in rooms I am in).
4. I can't really see a use case for 'AI' legitimately improving those items, outside of possibly knowing my entire pantry and providing me recommendations for things to make that I haven't tried but might like for dinner.
 
Upvote
63 (63 / 0)

MobiusPizza

Ars Scholae Palatinae
1,363
I know there is negative sentiment of AI in Ars community, but I for one hope Alexa has AI natural language processing capability. At the moment Alexa is dumb to a point of unusable.

I cannot ask it to add an event to my calendar.
I cannot ask it to turn off my air conditioner unless I say a very specific and unnatural phrase such as turn the downstairs thermostat to off . It doesn't understand even words like HVAC.

I can't tell it say like automatically turn off my Echo show screen at 10pm at night. There is simply no understanding of my instructions.
 
Upvote
20 (29 / -9)
You know what, Alexa is so bad at just understanding something simple, how bad could the hallucinations be comparatively?

Alexa cant understand "turn off the downstairs lights" is functionally the same as "turn off the lights downstairs". Yes, i know one is grammatically better than the other, but functionally this is something simple
 
Upvote
35 (39 / -4)

Bongle

Ars Praefectus
4,461
Subscriptor++
Problem: Alexa is a money furnace, but people like it for setting alarms or adding to grocery lists
Amazon's solution: Increase the cost of running the service while also reducing accuracy


Like, if OpenAI can't make money off of ChatGPT (Altman recently said even the $200/mth tier loses them money) I don't see how Alexa is going to make money by making it cost Amazon 10x as much per query. The likely-reduced accuracy because of the hallucinations that are inherent to the chosen technology is icing on the cake.
 
Upvote
47 (48 / -1)

josi_ok

Ars Centurion
279
Subscriptor
It'll be interesting to see how Amazon's Echo / Alexa solution compares with Apple's HomePod / Siri. It looks like Apple is preparing new HomePods and Apple TVs. Apple seems to be trying to push a more distributed (on-device) computing solution. Amazon is apparently looking at central servers. Both sides seem to still have steep challenges ahead in deployment.

Personally, I don't see what Alexa will provide that I would be interested in. My own use-case, like 95% of the rest of it's users, is simply to play music, set timers, and home automation, with occasional Wikipedia-type queries. It does these adequately already.
 
Upvote
3 (4 / -1)

pug fugly

Ars Tribunus Militum
1,715
At this point, whenever I see a new product release and the main feature is "now includes AI" I'm 100% not interested in that product. It's a step backwards IMO. Sure hope this trend of cramming AI into everything ends sooner rather than later.
This! I'm currently shopping for a new TV but many manufacturers have ruled themselves out with this insane BS.
 
Upvote
34 (34 / 0)

MichaelHurd

Wise, Aged Ars Veteran
145
Subscriptor
I know there is negative sentiment of AI in Ars community, but I for one hope Alexa has AI natural language processing capability. At the moment Alexa is dumb to a point of unusable.

I cannot ask it to add an event to my calendar.
I cannot ask it to turn off my air conditioner unless I say a very specific and unnatural phrase such as turn the downstairs thermostat to off . It doesn't understand even words like HVAC.

I can't tell it say like automatically turn off my Echo show screen at 10pm at night. There is simply no understanding of my instructions.
I think the negative sentiment toward AI is almost entirely toward Generative AI because companies are trying to use it for tasks it's not well-suited for, and thus very unreliable*. Using AI for language processing is essentially what Large Language Models were designed for (if I recall, LLMs were built for translation, which requires fast and accurate language processing), so I would expect it to be more reliable and thus accepted in that space.

* Also because the companies are, in common interpretation, stealing and reusing art and stuff like that.
 
Upvote
37 (37 / 0)

ColdWetDog

Ars Legatus Legionis
14,402
The problem both Google and Amazon have with these devices is that they want them to do things that the consumer is not interested in. I don't want a "proactive" assistant. I want a device that will wake up when I call it and do the basic shit that I want: Turn the lights on or off, turn the TV on or off, play music. Otherwise, just stay out of the way.

Of course, that one time purchase doesn't make enough money so they have to keep finding ways to be annoying.
'Your plastic pal that's fun to be with' was sarcasm. NOT a road map.

I am unsure what that has slipped past the tech bros. Of course, there is the whole problem with '1984' so I suppose I shouldn't be surprised.
 
Upvote
28 (29 / -1)

Carewolf

Ars Legatus Legionis
10,364
It'll be interesting to see how Amazon's Echo / Alexa solution compares with Apple's HomePod / Siri. It looks like Apple is preparing new HomePods and Apple TVs. Apple seems to be trying to push a more distributed (on-device) computing solution. Amazon is apparently looking at central servers. Both sides seem to still have steep challenges ahead in deployment.

Personally, I don't see what Alexa will provide that I would be interested in. My own use-case, like 95% of the rest of it's users, is simply to play music, set timers, and home automation, with occasional Wikipedia-type queries. It does these adequately already.
You want to compare them to the worst "AI" assistant on the market, why??
 
Upvote
-11 (2 / -13)

OtherSystemGuy

Ars Scholae Palatinae
1,284
Subscriptor++
“The reliability is the issue—getting it to be working close to 100 percent of the time,” the employee added. “That’s why you see us... or Apple or Google shipping slowly and incrementally.”

Exactly the reason I immediately disabled Apple's new Siri AI as soon as it rolled out. It's email categorization, summaries, and alerts were a complete mess - and about what I expected because the AI doing the work doesn't represent me and how I do things (and it's an auto complete AI, not a task learner).

When will this AI hype train end? As was mentioned earlier, these systems are basically VC cash furnaces. It can't be sustainable for much longer.
 
Upvote
40 (40 / 0)
“[T]he most challenging thing about AI agents is making sure they’re safe, reliable, and predictable,” Anthropic’s chief executive, Dario Amodei, told the FT last year.
Well there's your problem! You trained an opaque 700-billion-parameter model on most of the internet, without first considering whether its output would be "predictable." The reason most people only use voice assistants to set timers and turn lights on and off is because that's all that they can predictably do. That's the same reason we program computers using languages defined by formal grammars and semantics instead of telling them to "add up the receipts" and expecting something reasonable.
 
Upvote
54 (55 / -1)

Lexus Lunar Lorry

Ars Scholae Palatinae
846
Subscriptor++
An enduring challenge for Amazon’s Alexa team—which was hit by major lay-offs in 2023—is how to make money. Figuring out how to make the assistants “cheap enough to run at scale” will be a major task, said Jared Roesch, co-founder of generative AI group OctoAI.

Options being discussed include creating a new Alexa subscription service, or to take a cut of sales of goods and services, said a former Alexa employee.
If regular Alexa couldn't generate any meaningful subscription revenue or sales commissions, does the team seriously expect GenAI Alexa (which is presumably far more expensive per query) to be different? It seems like Amazon is doubling down on a failed monetization strategy.
 
Upvote
32 (33 / -1)

Tam-Lin

Ars Scholae Palatinae
825
Subscriptor++
Saying "we want to eliminate hallucinations from LLMs" is a meaningless statement. All LLMs do is "hallucinate". When whatever they make up has some overlap with reality we then attribute intelligence to them, and when it doesn't, we call it a "hallucination", instead of what it really is, which is working as designed.
 
Upvote
58 (59 / -1)

Bongle

Ars Praefectus
4,461
Subscriptor++
I still find it funny they say "hallucinate" instead of "spews random bullshit".

Also, things like "hallucinations have been near zero" are meaningless. If there is a 0.3% chance that it spews bullshit, but does 10 million tasks a day, that means its gonna be making alot of mistakes every day.
And just like Google's "put glue on pizza" or "eat rocks to get your vitamins" or "use a hitachi magic wand on your children", your most-ridiculous outputs will be widely publicized.
 
Upvote
23 (24 / -1)
At this point, whenever I see a new product release and the main feature is "now includes AI" I'm 100% not interested in that product. It's a step backwards IMO. Sure hope this trend of cramming AI into everything ends sooner rather than later.
I would have agreed with you but we've crossed the threshold where 'AI' means absolutely anything the marketing department decides which may or may not be a step in any direction whatsoever.

E.g. any sort of eye tracking for camera autofocus now seems to be called AI; we had lesser versions of eye tracking autofocus at least a decade ago and nobody felt the need to call it AI but at some point the tag got attached. Including by Sony, which people generally regard as having the best eye-tracking autofocus in the industry.

I really think they just kept improving what they had and put a new label to it.
 
Upvote
24 (25 / -1)

onychomys

Ars Praetorian
461
Subscriptor++
I work for a very famous hospital in the American midwest. Yesterday we had an AI-in-pathology grand rounds and the speaker told a story about how he had an alexa hooked to his garbage disposal. Even if we set aside the Maximum Overdrive problem (and my goodness, talk about a place you don't want hallucinations!) with doing something like that, I have such a hard time believing that it's somehow easier or more convenient to say "Alexa, turn on the garbage disposal" than it is to just lean over and flip the switch yourself. I just don't get why you'd do something like that even if you were a giant home automation nerd.
 
Upvote
46 (46 / 0)

dooferlad

Seniorius Lurkius
9
Subscriptor
On one hand, I’m glad that at least one massive company seems to care about its products producing nonsense.
On the other hand… isn’t avoiding hallucinations basically impossible, given how LLMs are mainly very clever autocomplete? Even with some effort to provide sources like Google tries to, they’re terrible at picking up tone and cite sarcasm.
Yep, the stochastic text generator will just output the next token bases on the statistical model that has been built up from training data (to set the model weights) and previous tokens (hidden prompt + user supplied prompt + tokens it has already outputted).
 
Upvote
7 (8 / -1)
And just like Google's "put glue on pizza" or "eat rocks to get your vitamins" or "use a hitachi magic wand on your children", your most-ridiculous outputs will be widely publicized.
Yup.
To be fair, Amazon has a higher bar to clear because Alexa can directly spend your money by ordering stuff, while Google can just hide behind lawyers when it tells a kid to commit suicide...
 
Upvote
10 (10 / 0)

markgo

Ars Praefectus
3,776
Subscriptor++
You want to compare them to the worst "AI" assistant on the market, why??
Funny how Apple’s caution seems to be borne out by Amazon’s experience.

Personally, I’d rather have tiny, incremental, predictable improvements in on-device agents with abilities to do real-world things.

There’s always ChatGPT if you want to…chat.
 
Upvote
17 (17 / 0)

JoHBE

Ars Praefectus
4,132
Subscriptor++
Soooooo many quotes to comment on to clarify what actually needs to be said, but I'll choose this one:

"“[T]he most challenging thing about AI agents is making sure they’re safe, reliable, and predictable,” Anthropic’s chief executive, Dario Amodei, told the FT last year."

So here's the thing: being reliable, predictable and (to some degree) safe, are all fundamental properties of anything we would call an "AI agent" They are not some optional "nice to have" qualities. They are implicit in the concept and term. So this quote translates to: "“[T]he most challenging thing about AI agents is making sure they’re an AI agent" (and not a make-pretend misleading sidekick wannabe that throws you under the bus at the most unexpected moments)

How long did the Ubiquitous Self Driving Cars delusion take before it fizzled out?? I'm getting impatient.
 
Upvote
28 (28 / 0)

Therblig

Ars Centurion
371
Subscriptor++
You know what, Alexa is so bad at just understanding something simple, how bad could the hallucinations be comparatively?

Alexa cant understand "turn off the downstairs lights" is functionally the same as "turn off the lights downstairs". Yes, i know one is grammatically better than the other, but functionally this is something simple
It can be worse than that. Don't assume everyone's English is educated native quality. My German wife speaks very good English and has a large vocabulary, but still pronounces many words with a foreign accent. She is also prone to translating German literally, resulting in stuff like the old joke, "Throw Mama down the stairs her hat."
 
Upvote
23 (23 / 0)