Amazon must solve hallucination problem before launching AI-enabled Alexa

JMTronicHobbyist · Jan 14, 2025

Alexa is already good at making fart noises, how can AI help?

benpocalypse · Jan 14, 2025

But Apple doesn't?

josi_ok · Jan 14, 2025

Carewolf said:
You want to compare them to the worst "AI" assistant on the market, why??

Apple and Amazon are two of the biggest players in home voice assistants (hardware and software). Both have struggled for years to develop a (arguable) compelling voice product. Each is taking a very different approach.
It's still very early in the AI wars. One can argue that there is no market for more advanced products, but these companies seem to believe otherwise and they have the resources to continue to evolve.

Carewolf · Jan 14, 2025

markgo said:
Funny how Apple’s caution seems to be borne out by Amazon’s experience.

Personally, I’d rather have tiny, incremental, predictable improvements in on-device agents with abilities to do real-world things.

There’s always ChatGPT if you want to…chat.

Well, Apple are definitely being smarter burning less money on a money-losing industry, but at least Alexa has a bigger cash-fire

D

dpx · Jan 14, 2025

I, for one, just want to know how much it's going to cost. I use my Alexa for time, date, weather, cooking measurements, reminders, and timers, and that's pretty much it. Alexa refuses to play my saved music collection, tiering to their paid services, which was something it used to do, but now been blocked and monetized, seriously leading me to seek FOSS solutions, but I haven't found one yet. All this Alexa AI stuff means is a new subscription, and i'm not paying.

spartan56 · Jan 14, 2025

deadman12-4 said:
I still find it funny they say "hallucinate" instead of "spews random bullshit".

Also, things like "hallucinations have been near zero" are meaningless. If there is a 0.3% chance that it spews bullshit, but does 10 million tasks a day, that means its gonna be making alot of mistakes every day.

Yes, I'm glad Ars at least told it straight in the article but if the entire media could start saying "blatantly false BS" instead of "hallucinations" that would be great. Would help this dumb bubble pop quicker hopefully too.

Epimetheus_Secundus · Jan 14, 2025

Given the current state of the technology, I have very low confidence that they can achieve their goals

JoHBE · Jan 14, 2025

Tam-Lin said:
Saying "we want to eliminate hallucinations from LLMs" is a meaningless statement. All LLMs do is "hallucinate". When whatever they make up has some overlap with reality we then attribute intelligence to them, and when it doesn't, we call it a "hallucination", instead of what it really is, which is working as designed.

THIS needs to be hammered into everyone's head again and again. The fundamentals on which it works, don't contain any "mistakes" that can be "fixed" or "improved" to specifically address that aspect. You gotta start from scratch, and whatever successes you had over the last 5 years, are possibly and probably not very predictive.

LonM · Jan 14, 2025

“The reliability is the issue—getting it to be working close to 100 percent of the time,” the employee added.

I expect the technology I buy to work 100% of the time. Not close to it. All the time*. I would be very unhappy if my spreadsheet software only gave me the right answers 99.999% of the time, if my e-reader showed me only 99.999% of the words in a book I'd bought.

More to the point, how do you even define "working right" for an LLM-based chatbot when it's whole existence is just making up plausible sounding replies?

(*Yes, I understand there are issues like networked services, where it's never going to be guaranteed 100%, but that's obviously not what they're referring to here)

bfrantz · Jan 14, 2025

It does seem like Amazon is going overboard in their ambitions here. I personally don't want a more "agentic" Alexa, for the same reason that I already don't use it to place Amazon orders for me (something I assume it can do decently well now) - I don't trust it to do anything important without the ability for me to easily verify its actions before it takes them, at which point it's more tedious than just doing that thing on my phone.

Where I think GenAI could make Alexa better is mainly around its ability to understand more varied types of queries and translate them to existing capabilities, or to answer random "google search" type questions with a bit more detail and natural-sounding delivery. Obviously it will still have the potential to give wrong answers, but that's no different than any other chatbot and having the ability to interact with a chatbot via Echo devices is something I could see being handy. But I'd see this as a new "skill," not a fundamental change in how it works for everything else. For stuff like playing songs, setting reminders, etc., the main thing that I'd like to see improved is its reliability at hearing the "wake" word and ability to understand what I'm saying and filter out noise.

Regarding the economics, I'd be willing to spend a couple bucks a month for what it already does, if that's what it takes to keep Amazon from killing it. It's genuinely useful for a limited set of things, and I'd take incremental improvements in those areas over a super expensive overhaul that brings a bunch of features I probably won't use. Or let them provide two versions, and keep the cheap/free one around alongside the new LLM-powered one that costs like $10/mo. My only concern with that is whether the old version will degrade if developers stop maintaining its skills, etc. because their focus is on the new one.

NinjaNerd56 · Jan 14, 2025

So I have a fair amount of time with ChatGPT, and a TON with Siri and Alexa.

All of them are actually pretty terrible, although I have had good outcomes with ChatGPT writing projects.

If we get a decent shopping list from Alexa, it is a win. I do have a slew of Routines and Skills that are useful, but those are single dimension, specific use things.

Siri, even with Apple Intelligence, is a train wreck. Voice search on Apple TV is our only regular use case.

35 years ago, I wrote a white paper on voice recognition after some work at CMU in the Speech Consortium. I created the phrase “Star Trek Syndrome” to illustrate the difficulties in making general purpose interactions work well, if at all.

Perhaps I should update and release it again…

ColdWetDog · Jan 14, 2025

dpx said:
I, for one, just want to know how much it's going to cost. I use my Alexa for time, date, weather, cooking measurements, reminders, and timers, and that's pretty much it. Alexa refuses to play my saved music collection, tiering to their paid services, which was something it used to do, but now been blocked and monetized, seriously leading me to seek FOSS solutions, but I haven't found one yet. All this Alexa AI stuff means is a new subscription, and i'm not paying.

For some time Siri has been unable to play regular old playlists - especially in CarPlay mode. Now it has 'improved' its behavior by playing near hits from Apple Music - which I don't subscribe to. Doesn't do this all of the time, sometimes it just plays any ol random song.

So I've seen a clear digression in Siri's capabilities over the years. And that's before AI, presumably. What joy. But goes along with the rest of the timeline, I suppose.

Edgar Allan Esquire · Jan 14, 2025

I'm a little cynical in that I think the motivation is less about product quality and more the fear that people could talk their sales bot into lower than specified prices for goods. There were a few "Amazon's pricing screw up" feeds I'd see back in the day trying to catch the ~30 minute window things could be off by a factor of 10.

DarthSlack · Jan 14, 2025

internetomancer said:
It's funny watching LLMs get smarter and smarter while still being unable to do the only thing anyone wants.

I feel like we're going to wind up accidentally building God trying to get a dumb speaker to tell us who won the football game.

Because LLMs aren't getting any smarter, they're just getting bigger. And all that means is the statistics are improved, nothing else.

Nifty'sPapa · Jan 14, 2025

I really liked how Alexa as more focused that the Google Assistant. While it couldn't answer questions very well, it was good at playing music. However, in the last year, it's gotten worse at that for me. As someone who hates covers, that's all Alexa ever wants to play. Will this resolve that? Time will tell. In the interim, the device remains a vector to SiriusXM.

mygeek911 · Jan 14, 2025

The hallucination problem I want solved is the bogus shipping times. My last few packages from them have been shipped a day or two AFTER they stated they would be delivered.

tRexx · Jan 14, 2025

I am afraid the AI will eventually think that the solution to any of my problems is to buy a product from Amazon.

Cloudgazer · Jan 14, 2025

Alexa - play Arvo Pärt on Apple Music
Playing - I have a pet on Apple Music!

Hopefully Smarter · Jan 14, 2025

The AI industry is under no obligation to fix its hallucination problem.

Sheep Disorder · Jan 14, 2025

JoHBE said:
THIS needs to be hammered into everyone's head again and again. The fundamentals on which it works, don't contain any "mistakes" that can be "fixed" or "improved" to specifically address that aspect. You gotta start from scratch, and whatever successes you had over the last 5 years, are possibly and probably not very predictive.

So you are saying that everything's a hammer (referrring to Abraham Maslow's quote)?

Quote is rather pertinent to this discussion.

hambone · Jan 14, 2025

Prasad, the former chief architect of Alexa, said last month’s release of the company’s in-house Amazon Nova models—led by his AGI team—was in part motivated by the specific needs for optimum speed, cost, and reliability, in order to help AI applications such as Alexa “get to that last mile, which is really hard.”

This is an interesting quote.

You can kind of characterize the current limits on AI as an "uncanny valley" problem.

Whether it is trying to ace self-driving cars or write a genuinely insightful essay, AI sort of performs in the "reasonably OK-ish" range. But it's that last 10% that makes all the difference and creates all the value, and there AI pretty regularly falls short.

I see this as a sort of asymptotic problem that might have a very long and expensive development tail.

That said, AI is absolutely dominating the slop content generation market, so it has that going for it. :biggreen:

schnackenpfefferhausen · Jan 14, 2025

View: https://www.youtube.com/watch?v=g3j9muCo4o0

When all you have is a hammer, everything looks like a nail.

Uncivil Servant · Jan 14, 2025

This seems like such a strange decision because AWS already has excellent algorithms and data crunching to return factual and useful information. Somehow they manage to take my Amazon library that's split between evolutionary biology, feminist sci-fi/fantasy, military history, and biblical archaeology, and they still manage to find useful suggestions.

LLMs aren't very useful for what Amazon does best with data. At most, it could be the equivalent of XUI, vocal presentation of info from the useful algorithms.

So if I ask Alexa "play me something so funky it sounds like the sax player's soul is trying to fly out his mouth through the reed", use the existing algorithms to determine what I likely want to hear, and use the LLM part as wrapping, the part that lets me know which venue the Tedeschi Trucks Band was playing.

LLMs are not "thinking machines". They are "speaking machines". That's the only way to market them, and the cost-effectiveness becomes obviously bad.

EricM2 · Jan 14, 2025

Quite generally AI must solve its consistency problem.
Software used to be predictable. If you threw a problem at a classic program 10.000 times, you got 10.000 times the same result. If not, we used to call this a bug.
Reliably deterministic behavior made software a good compagnion for creative, but unreliable wetware - us humans.
AI currently brings in some creative aspects, but combines it with non-deterministic behavior like hallucinations or chaotic changes in responses.
So using AI means combining creative, but unreliable wetware with creative, but unreliable software.
I don't see this going commercially anywhere serious, until the reliablity aspect is really solved.

jdale · Jan 14, 2025

This includes solving the problem of “hallucinations” or fabricated answers, its response speed or “latency,” and reliability. “Hallucinations have to be close to zero,” said Prasad. “It’s still an open problem in the industry, but we are working extremely hard on it.”

We're almost ready, all we have to do is solve the #1 problem facing the industry, a problem that arises directly from the very foundations of the technological approach we are using.

NetworkElf · Jan 14, 2025

All of my Alexa devices have been becoming less reliable over the past couple of years. I originally thought that it was my Internet connectivity, but I was able to eliminate that as an issue. If they can't reliably play rain and thunder sounds at night, they are not going to live through an AI upgrade.

freeskier93 · Jan 14, 2025

MobiusPizza said:
I know there is negative sentiment of AI in Ars community, but I for one hope Alexa has AI natural language processing capability. At the moment Alexa is dumb to a point of unusable.

I cannot ask it to add an event to my calendar.
I cannot ask it to turn off my air conditioner unless I say a very specific and unnatural phrase such as turn the downstairs thermostat to off . It doesn't understand even words like HVAC.

I can't tell it say like automatically turn off my Echo show screen at 10pm at night. There is simply no understanding of my instructions.

I've been testing OpenAI (gpt-4o-mini) with Home Assistant and have been really impressed so far. For actual home control and getting home information it does a really good job of figuring out what I want. I'm currently using Google Home devices and so far in my testing OpenAI seems far better. Next step is testing the new Home Assistant Voice Hardware to see how well that works. I'm hoping if it works well, I can finally ditch Google Home.

The main problem is having to pay for OpenAI, but it's pretty damn cheap. I'm thinking less than $10 a year, which I'm willing to pay for right now until local LLMs become viable.

Hoptimist · Jan 14, 2025

The AI hype seems like a corporate form of mass hysteria. Overwhelming FOMO, all must have it, regardless of any concept of ROI. There are plentiful and good uses for AI, but the corporate/marketing types are prioritizing the least useful, most glamorous, and most difficult tasks. AI (or rather ML) is already producing things people value (ex. photos), but incremental expansion to well suited tasks seems secondary.

Alexa, please give me instructions for how to pound sand? Alexa, please demonstrate these instructions.

arstechnica.unknowing564 · Jan 14, 2025

Alexa is an amazing alarm clock/radio with weather/traffic functionality. I neither need nor want anything beyond that.

Uncivil Servant · Jan 14, 2025

Therblig said:
It can be worse than that. Don't assume everyone's English is educated native quality. My German wife speaks very good English and has a large vocabulary, but still pronounces many words with a foreign accent. She is also prone to translating German literally, resulting in stuff like the old joke, "Throw Mama down the stairs her hat."

It's even worse than that: English is a lot like Arabic in being an imperial language with classical forms everyone learns in school but also multiple mutually-incomprehensible dialects. US Mid-Atlantic and UK Received Pronunciation are what most of us think of when we think of "American" and "British" accents. It's almost certainly ensured that civil servants in both countries can communicate freely and easily without language barriers, which tends to reinforce said dialects among the elites.

The cynic in me expects these voice assistants to recognize Mid-Atlantic, BBC RP, NorCal/Left Coast, and some Indian English dialects perfectly, and categorize Appalachian and Tidewater as "non-English" simply because they've drifted less from 17th century English than the others.

I can't even imagine how they'd handle various types of African American Vernacular, especially the irregular usage of the verb "to be".

Bash · Jan 14, 2025

Tam-Lin said:
Saying "we want to eliminate hallucinations from LLMs" is a meaningless statement. All LLMs do is "hallucinate". When whatever they make up has some overlap with reality we then attribute intelligence to them, and when it doesn't, we call it a "hallucination", instead of what it really is, which is working as designed.

Agreed -- the fact that they call this an "AGI" group at Amazon means that everything they do is tained by a hearty serving of bullshit. I do believe an LLM can be part of a useful voice assistant, but AGI is not coming from us adding another 100GB of VRAM and some clever filtering rules to make a better LLM.

A.P. · Jan 14, 2025

Generative AI makes things up. The whole thing is hallucinations -- that's just what it does and what it's for. "Paint me a picture of a duck that looks like Groucho Marx." Hallucination accepted. If you wanted something that gives correct answers you'd have used a database instead. The whole idea that you'd build a generative AI system to answer questions and then decide "we just need to solve the hallucination problem first" means you've completely lost the thread.

adamsc · Jan 14, 2025

Resolute said:
The problem both Google and Amazon have with these devices is that they want them to do things that the consumer is not interested in. I don't want a "proactive" assistant. I want a device that will wake up when I call it and do the basic shit that I want: Turn the lights on or off, turn the TV on or off, play music. Otherwise, just stay out of the way.

They’re all stuck in an uncanny valley: many people would like the dream they’re chasing of having a true personal assistant and would even pay for that, but that requires true AGI which is going to require significant research breakthroughs beyond LLMs. If you don’t have AGI, people are going to pay only what an unreliable assistant is worth: nothing.

Apple can afford to run Siri as an iPhone/Watch/AirPod feature because they do more work on the high-end hardware you purchase from them periodically, and Google might be able to do that for Pixel devices (sales volume is iffy), but Alexa probably isn’t selling enough devices at a profit to pay for heavy server usage and Amazon isn’t making anything like enough margin from extra sales to Alexa users to cover the difference.

adamsc · Jan 14, 2025

Lexus Lunar Lorry said:
If regular Alexa couldn't generate any meaningful subscription revenue or sales commissions, does the team seriously expect GenAI Alexa (which is presumably far more expensive per query) to be different? It seems like Amazon is doubling down on a failed monetization strategy.

They need it to be true. They’ve invested billions in R&D on voice assistants, smart speakers, etc. but it hasn’t paid off. Wall Street is thirsty for Great Depression-scale layoffs, and Amazon’s valuation assumes tech leadership so their management’s personal net worth is linked to investors believing it’ll work out for them.

Thomas Harte · Jan 14, 2025

Uncivil Servant said:
The cynic in me expects these voice assistants to recognize Mid-Atlantic, BBC RP, NorCal/Left Coast, and some Indian English dialects perfectly, and categorize Appalachian and Tidewater as "non-English" simply because they've drifted less from 17th century English than the others.

We got an Echo Dot for $5 on a punt circa 2018 because it was bundled with something or another. My pronunciation is about as straightforward and uninflected as southern England can produce and my history in publishing means that I chide myself if I even so much as split an infinitive. That's despite having spent now almost half my adult life in the US; people even ask when my natural-born-American child moved here because he's done such an excellent job of learning my British accent.

Alexa couldn't even understand the prompt for a cookie recipe that I recited directly from its literature. Due to its general uselessness we passed on the Echo Dot to somebody else shortly thereafter.

So either the speech recognition is calibrated by geolocation or Alexa was having a really bad few days.

dpx · Jan 14, 2025

LonM said:
I expect the technology I buy to work 100% of the time. Not close to it. All the time*. I would be very unhappy if my spreadsheet software only gave me the right answers 99.999% of the time, if my e-reader showed me only 99.999% of the words in a book I'd bought.

More to the point, how do you even define "working right" for an LLM-based chatbot when it's whole existence is just making up plausible sounding replies?

(*Yes, I understand there are issues like networked services, where it's never going to be guaranteed 100%, but that's obviously not what they're referring to here)

tfw you realize even motherboards are chips which are networked together by soder, all systems are networked systems, none of them, so far, have been guaranteed to be accurate 100% of the time, whether it be pc's, healthcare, or space systems. not one.

Heart of Dawn · Jan 14, 2025

Consumers don't want it. It costs a fortune to run (not to mention the environmental impacts of using all that water and electricity). There is absolutely no way to ever get it to stop making up bullshit, let alone become remotely intelligent.

But the tech companies keep pushing it anyway because it must be the "next big thing™," in order to sustain the myth that they can always have infinite growth.

Even once this bubble bursts, they are going to do exactly the same with the next one.

Jim Salter · Jan 14, 2025

Today, it was my turn to play the "why is the writing in this piece so terrible?" / scroll up and see it's syndicated / disappointed "Oh. That's why" game.

graylshaped · Jan 14, 2025

TinCoyote said:
I think they have seriously overestimated how much people want that.

I have Amazon Echos in my home....the majority of use is home automation activation, weather, and timers. My wife and I also use it to communicate between our two home offices, which are on different floors.

Exactly. And with respect to a quote from the article--"The complexity comes from Alexa users expecting quick responses as well as extremely high levels of accuracy"--given that Alexa often struggles to complete those simple tasks, I have not ever expected a high level of accuracy, and do not see me ever relying on any automated "agent" to take unilateral action on my behalf.

Amazon must solve hallucination problem before launching AI-enabled Alexa

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Centurion

Ars Legatus Legionis

Smack-Fu Master, in training

Ars Praetorian

Ars Scholae Palatinae

Ars Praefectus

Ars Centurion

Seniorius Lurkius

Ars Praefectus

Ars Legatus Legionis

Ars Praefectus

Ars Legatus Legionis

Ars Praetorian

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Legatus Legionis

Account Banned

Ars Tribunus Militum

Ars Praefectus

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Centurion

Ars Legatus Legionis

Ars Scholae Palatinae

Ars Centurion

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Praefectus

Ars Praefectus

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Legatus Legionis