Rivian apologizes to customers after infotainment-bricking OTA update

Illusive Man

Ars Scholae Palatinae
629
One of the largest gaps between tech orientated vs regular people who want reliable cars. OTA updates is often mentioned in the context of recalls that doesn't require users to do anything/bring the car to the dealer.

But as with so many things software - it seems to just reduce the manufacturer's rigorous validation and testing prior to release & pushes the beta out into the real world.
 
Upvote
10 (10 / 0)

adamsc

Ars Praefectus
4,291
Subscriptor++
The real issue imho isn't fat-fingering a release, it's not having an A/B update mechanism so they can failover to the last known good image. Come on Rivian, this is high availability embedded engineering 101.

Yeah, this is like an interview challenge to see how many areas where a cut corner made the problem worse. No rollbacks, a deployment process one typo away from disaster, some un-specified issues around certificate management, etc. I hope they don’t just crucify the person who kicked off the release and instead start thinking about systemic protection.
 
Upvote
16 (16 / 0)

Sarty

Ars Tribunus Angusticlavius
7,976
But as with so many things software - it seems to just reduce the manufacturer's rigorous validation and testing prior to release & pushes the beta out into the real world.
I remember back when you bought a game, it came on a CD, then you typically installed it and it ran out of the box!
 
Upvote
0 (6 / -6)

jhodge

Ars Tribunus Angusticlavius
8,737
Subscriptor++
As someone in software dev, I really don't want modern fast coding/fast testing/fast update culture to be part of my car. Give me super-stable code from decades ago probably developed on a PDP-11 or VAX cluster thats well proven.
But this gives a whole new meaning to 'rolling release'! ;)
 
Upvote
10 (11 / -1)

MHStrawn

Ars Scholae Palatinae
1,435
Subscriptor
I'm so excited to be arriving at the day where my car can achieve the same stability as a typical Windows laptop!
This is exactly why I am stubbornly sticking to dumb devices for most things. I see no reason for my refrigerator or dishwasher or microwave or television screen or washer/dryer of water heater to be connected to the internet.

All that does in introduce unnecessary complexity and dependencies. I know many embrace the benefits of such advances and more power to you. But I admit to being a Luddite in this area.
 
Upvote
18 (18 / 0)

S4WRXTTCS

Ars Scholae Palatinae
1,393
This is exactly why I am stubbornly sticking to dumb devices for most things. I see no reason for my refrigerator or dishwasher or microwave or television screen or washer/dryer of water heater to be connected to the internet.

All that does in introduce unnecessary complexity and dependencies. I know many embrace the benefits of such advances and more power to you. But I admit to being a Luddite in this area.

What drives me nuts is these IoT devices are advertising wifi adhoc points by default.

My new Fridge has one
My new Washer and Dryer has one
My new EV charger has one

To my knowledge none of them are really dependent on them. But, I'm simply annoyed that they show up on any wifi scan.

I figured I could add the EV charger to my Wifi and it would go away, but it didn't. Apparently they have an always available adhoc one for reconfiguring.
 
Upvote
4 (4 / 0)

Jeff S

Ars Legatus Legionis
11,233
Subscriptor++
In other words, Rivian is telling us they are utterly incompetent w/o telling us they are utterly incompetent.

There should be no possible mechanism in a well designed production software update distribution system to distribute builds that haven't gone through the entire QA process. Builds that haven't been through the full lifecycle and signed off on at every stage of the lifecycle, should not be POSSIBLE to "fat finger" and send to the production fleet of cars that someone is driving. Full stop.

If you want to sell people devices that can kill them, their passengers, other drivers, cyclists, pedestrians, people sitting on a restaurant/cafe patio, or people inside a building that your car runs into, you need to act like adults, even if that costs a bit more money.

This is a catastrophic failure of their software lifecycle management. It doesn't seem to have been severe enough that it killed or disabled anyone, but this is the canary in the coalmine telling them their software update system is very badly designed.
 
Upvote
15 (17 / -2)

Jeff S

Ars Legatus Legionis
11,233
Subscriptor++
I don't own a Rivian, but I did read the thread.

This was an optional update. Folks got a push notification on their phone that an update was available, and they pushed the "go ahead and update" button. So it was a choice on the use's part that led to this situation, to some extent. One could safely ignore the update notifications and simply continue using their car like a normal car.
People should have a reasonable expectation that any updates made available, even if optional, won't break your car. It should be safe to click the "Apply Optional Update" button for a CAR.
 
Upvote
24 (24 / 0)

Jeff S

Ars Legatus Legionis
11,233
Subscriptor++
a "fat finger". This tells me their CI/CD process is flawed. The only buttons deployment engineers should push are the "Create new release" buttons in GitHub (or other solution), and the rest happens via automation (including cert generation, etc.). To cut a release or deploy software, you should never need to type configuration-like things into a terminal or web browser via a keyboard.
I sure the heck hope that new code release the engineer created in GitHub/etc, goes through stages of QA BEFORE the release is made available to the public. It shouldn't go straight from the engineer clicks "merge branch and release" to installed in cars.
 
Upvote
5 (5 / 0)

ranthog

Ars Legatus Legionis
15,378
People should have a reasonable expectation that any updates made available, even if optional, won't break your car. It should be safe to click the "Apply Optional Update" button for a CAR.
Realistically it seems like if the first two groups that they had pushed to were a set of test systems, and then to vehicles still owned by Riven, the problem could have been found by the manufacturer before any customer saw the issue. Albeit perhaps causing presale problems.
 
Upvote
4 (4 / 0)

vought1221

Ars Scholae Palatinae
864
Subscriptor++
a "fat finger". This tells me their CI/CD process is flawed. The only buttons deployment engineers should push are the "Create new release" buttons in GitHub (or other solution), and the rest happens via automation (including cert generation, etc.). To cut a release or deploy software, you should never need to type configuration-like things into a terminal or web browser via a keyboard.
Part of the problem may be something I ran across when building images for testing back in the Android L days.

Their CI may be fine, but if you send an update that overwrites a previous bundle and don’t test the +1 OTA your customers are gonna get screwed.
It's too bad that it's still 2001 or so and enough flash memory to retain a working copy of the prior version to fall back to in the event of boot failures and the updated version is unrealistically expensive crazy talk and all you can do is hold you breath and hope that you've not made any mistakes during your manual build process.

Oh, wait, no; it's 2023 and you can't even buy a chromebook so cheap and awful that it doesn't work that way. My mistake, carry on.
To be fair, at least until Android 6, an OTA update for that OS meant multiple partitions of bullshit to deal with when sending a signed OTA. Rolling back meant resetting to the read-only image stored at the factory, or holding your breath and trying to update that recovery image.

Not an easy process, but at least a well-documented one.
 
Upvote
0 (2 / -2)

fongquardt

Smack-Fu Master, in training
1
I own a rivian. updates are opt-in, via the dashboard or via a mobile app. All these people hit the button within an hour (or less?) once it was released. Rivian seemed to have yanked it as fast as they heard/saw complaints online and via their service agents.

no idea how it wasn't caught via their ci/cd pipeline. I don't know if they've ever done a big open rca after an event.

updates range from gimmicky and fun (Halloween theme if you opt in), to new modes and ratios for suspension and towing, to Bluetooth changes for door unlocks, better integration with EV mapping for routes, etc... they've almost always pushed the truck into a better place than when I originally bought it.

I do corporate IT, so I usually let any update soak for a week anyway.

the truck isn't perfect and they really need to get more service centers, but its been the best vehicle I've owned.
 
Upvote
2 (10 / -8)

vought1221

Ars Scholae Palatinae
864
Subscriptor++
I sure the heck hope that new code release the engineer created in GitHub/etc, goes through stages of QA BEFORE the release is made available to the public. It shouldn't go straight from the engineer clicks "merge branch and release" to installed in cars.
That QA should always include +1 updates as well, to check signing and any boot-level code that handles the OTA process.

Don’t send OTAs that could create an inability to OTA again.
 
Upvote
7 (7 / 0)

ranthog

Ars Legatus Legionis
15,378
That QA should always include +1 updates as well, to check signing and any boot-level code that handles the OTA process.

Don’t send OTAs that could create an inability to OTA again.
Realistically for something like this you'd want the system to be able to detect a failed update and roll back. This type of arrangement can have issues with not being able to properly detect a failed update, but in the worst case you're stuck in the same situation they are in today or you have updates that fail to apply due to the fail save.

This type of system isn't easy to implement if your underlying architecture wasn't designed to support it. But I'd think that for OtA updates on a car it would eventually save you a lot of trouble if a bad update ever did go out.
 
Upvote
7 (7 / 0)
Yes, yes, I am a truly ancient fuddy-duddy with his professional roots in super-conservative aerospace, and yes, bills of materials are a thing, and and and————


These should not even be running on the same god damned computer systems.
With apologies to the author, I've got to agreed with Sarty. I'm also an old fart, working in the airline industry, but heavily into virtualization now. VMs or no VMs, Infotainment and InstrumentPanel should not be running are on the same computer system so they can't affect each other. Also for the critical InstrumentPanel at least, there should be multiple copies (on completely separate hardware again) ready to take over if the primary stops responding.
 
Upvote
23 (24 / -1)

ranthog

Ars Legatus Legionis
15,378
This snafu doesn't really make me any happier about the trend of depending on touchscreen buttons for basic functionality (heating? windshield wipers?) or using just one screen for infotainment and basic info (Volvo EX30, anyone?).
I'd agree that I'd want separation, but it needs to be more than just the screen. In this case having a separate screen for the instrument panel didn't help. Having a separate system for the infotainment and instrument panel is a feature.
 
Upvote
6 (6 / 0)
This article underlines one of my core sources of cognitive dissonance: I love having cool shiny new things, but I hate needing to rely on cool shiny new things.

Corollary, often observed in these stories about bleeding edge tech:

It’s really cool when it’s a new futuristic toy no one else has.
It‘s really a hassle after it becomes a required device/operation you can no longer avoid and must manage for your family.

(See: OTA updates, smartphones, touchscreens, cloud connections, AI, security/authentication, etc.)
 
Upvote
10 (10 / 0)

Happy Medium

Ars Tribunus Militum
2,165
Subscriptor++
Might have something to do with the fact that regular standby just isn’t reliable enough. Previously you’d see a lot of people carrying around their computers with lids crooked open because they don’t trust it’ll come back from standby quickly (or at all). Not sure Modern Standby has completely fixed this, but I sure understand what they’re trying to fix.
Ahahaha! Modern Standby! Reliable! Great joke! .... Oh, if you're serious no, Modern Standby is by FAR less reliable than S3 sleep, at least on all recent hardware I've seen. It's the worst of all worlds, in that you don't know when your laptop will just not reactivate when you open it back up again, AND it can sometimes just completely run down your battery while in sleep, oh, AND it sometimes will download windows updates and reset your computer without your consent during the process closing all your windows! So fun!
 
Upvote
10 (10 / 0)

jtwrenn

Ars Tribunus Militum
2,585
Oh Jeezus tap-dancing christ. I know it's off topic, but I hate modern standby with an incredible passion. It doesn't actually even make your laptop reactivate any faster (if anything it may be even more laggy on wakeup) and can just randomly use up all your battery. The fact that MS is actively trying to make it harder and harder to use regular old standby is just incredibly infuriating.
Totally agree on standby. Hell I have pretty much just stopped closing my laptop without shutting down. These things are so fast now it doesn't matter 9 times out of 10 and it's suddenly perfectly stable.


The idea of ita auto updates on cars is terrifying to me. Make them ota on request or at a time of my choosing. Give me an annoying you have updates waiting whatever but this random when they decide bullshit.
 
Upvote
2 (2 / 0)

Kasoroth

Ars Praefectus
4,054
Subscriptor++
I remember back when you bought a game, it came on a CD, then you typically installed it and it ran out of the box!
That's not quite how I remember it, and I still have my original Battlecruiser 3000 AD disc as evidence.
Daggerfall had quite a few issues too.
Even when games came on CDs, there were plenty of bugs. Downloading patches over a 56k dial-up connection wasn't really fun either.

Rose colored glasses might be pleasant to look through, but there's no way in hell I'd trade in modern game downloading and patching through Steam for what we dealt with in 1996.
 
Upvote
6 (6 / 0)

rivertrip

Ars Scholae Palatinae
875
You could view the current trend in corporate apologies (“oopsie-daisy! We totally derped sowwy!”) charitably as them now accepting responsibility rather than issuing non-apology apologies, but I can’t help but see it as an acknowledgement that they recognize that corporate accountability is a myth, so it doesn’t matter if you cop to fucking up.
Rivian didn’t assume accountability for the consequences of their mistake. They just said they know your car doesn’t work, and they intend to fix it at some unspecified time. Couldn’t go to your job or pick up your child from daycare? That’s on you.

And the “fat finger” explanation obviously is a lie.
 
Upvote
-1 (3 / -4)

Nogami

Ars Scholae Palatinae
880
Couldn’t go to your job or pick up your child from daycare? That’s on you.

And the “fat finger” explanation obviously is a lie.

Apparently you can still drive it, but without the bells and whistles, if that's a dealbreaker, that's up to the individual driver.

How is it obviously a lie?
 
Upvote
-9 (2 / -11)
I own a rivian. updates are opt-in, via the dashboard or via a mobile app. All these people hit the button within an hour (or less?) once it was released. Rivian seemed to have yanked it as fast as they heard/saw complaints online and via their service agents.

no idea how it wasn't caught via their ci/cd pipeline. I don't know if they've ever done a big open rca after an event.

updates range from gimmicky and fun (Halloween theme if you opt in), to new modes and ratios for suspension and towing, to Bluetooth changes for door unlocks, better integration with EV mapping for routes, etc... they've almost always pushed the truck into a better place than when I originally bought it.

I do corporate IT, so I usually let any update soak for a week anyway.

the truck isn't perfect and they really need to get more service centers, but its been the best vehicle I've owned.
The point is that Rivian are the ones who should let it soak for a week, not customers.

Nothing in this update is so important that it can't wait another week.
 
Upvote
10 (10 / 0)

omarsidd

Ars Praefectus
4,175
Subscriptor
Apart from their obvious specific mistake...there's also the mistake of pushing an untested image to production. Rivian doesn't have very many models, so it should be easy to require each notably changed image be validated (at least in passing) on every model of hardware they have.

A problem with CI/CD is the "kids" (young developers) have decided the magic build pipeline is authoritative, so there's no longer any degree of "no, this won't ship until you tested it yourself". Any version of old-school release methodology that included an actual test phase, QA, or the simplest of sanity spot-checking could prevent this kind of "can see at a glance" failure.
 
Upvote
7 (7 / 0)

Dr Gitlin

Ars Legatus Legionis
24,914
Ars Staff
With apologies to the author, I've got to agreed with Sarty. I'm also an old fart, working in the airline industry, but heavily into virtualization now. VMs or no VMs, Infotainment and InstrumentPanel should not be running are on the same computer system so they can't affect each other. Also for the critical InstrumentPanel at least, there should be multiple copies (on completely separate hardware again) ready to take over if the primary stops responding.

AFAIK that’s not how the auto industry approaches it and you can run a real time OS in a hypervisor and still get ASIL-D certification.

https://www.prnewswire.com/news-rel...ity-level-asil-d-certification-300969956.html
 
Upvote
-1 (3 / -4)

Nilt

Ars Legatus Legionis
21,839
Subscriptor++
I don't own a Rivian, but I did read the thread.

This was an optional update. Folks got a push notification on their phone that an update was available, and they pushed the "go ahead and update" button. So it was a choice on the use's part that led to this situation, to some extent. One could safely ignore the update notifications and simply continue using their car like a normal car.
Do you really think we need to start asking users to consider whether they want to install updates or not? I, for one, remember the bad old days when updates were virtually never installed and all manner of shitheels took advantage of that. Nope, I much prefer a secure computing environment. The freaking solution for this "problem" is robust testing required for the freaking death machines before updates are published at all. The only reason nobody appears to have died this time is because it brikec the cars. Next time we may not be so fortunate.
 
Upvote
-1 (0 / -1)

m0nckywrench

Ars Tribunus Angusticlavius
7,659
I saw some reports that this made things like the speedometer inaccessible. Is that what the other screen is? The main dashboard?! Yeah... I think I'll keep buying cars that don't get updates pushed down from the internet... at least as long as it's possible to do so.
I'd not mind if it were aviation quality hardware and software, but terestrial consumer dogshit (to put it gently) is in a different inferior league intended to annoy mechanics and bewilder non-techie owners.

BTW if you're serious it's easy to keep vehicles for decades so I do. Settle on what you want, learn everything worth knowing about it, learn to do your own maintenance and score and store spare parts, for example electronics. This saves gobs of cash over time and the more you learn the less dependant you become not just with cars but other DIY. Self-service salvages are great money savers. I'll use all the techniques I use for ICE when I buy a used BEV then drive the hell out of that, too.
 
Upvote
-2 (1 / -3)