I think the problem there is you still need the bitlocker key to access the encrypted drive contents, otherwise you're going to be doing a nuke and pave reinstallIt's worth pointing out that you can use any old Windows install bootable drive/disk, and press Shift+F10 (F8 on custom WinPE bootable drives) to get a command prompt window. That doesn't automatically delete the file, of course, but if (like some of our systems) newer hardware isn't supported by the default WinPE/WinRE images you have, it's possible to load the needed drivers and then delete the file yourself, just as an example.
"While software updates may occasionally cause disturbances, significant incidents like the CrowdStrike event are infrequent," wrote Microsoft VP of Enterprise and OS Security David Weston in a blog post. "We currently estimate that CrowdStrike’s update affected 8.5 million Windows devices, or less than one percent of all Windows machines. While the percentage was small, the broad economic and societal impacts reflect the use of CrowdStrike by enterprises that run many critical services."
I have a co-worker who‘s been stuck since Saturday. He might get home mid-week.“Flight delays and cancellations were no longer front-page news”
Delta may not be front page news, but with 700 flights cancelled today, they should be.
Maybe I've been in corporate too long, these read approximately the same to me.I'm not in IT so I can't comment on the technical issues, but as a lowly comms hack - this statement is just yeccccchhh. Just overly workshopped garbage.
"These big events are rare. As many as 8.5 million Windows devices have been impacted. That's a small percent of Windows machines, but CrowdStrike's importance to key businesses has painfully multiplied the impact. We've developed a tool to more easily fix the machines that can't be fixed with 10-15 reboot attempts, and more tools and fixes are in the works. Stay tuned."
See, it's easy to sound like a human and do the same thing. Cut the corporate filler and ass-covering and ship the sentence. It builds trust and isn't so subtly unsettling.
I disagree. It's difficult because CrowdStrike is a security provider, and they need to be perfectly rigorous in their own checks (do they not send their code to an in-house test machine first...?). But at the same time, Windows having complete control of kernel-level software would not necessarily stop attacks at that level. In fact, I'd argue you would have both 1.) fewer eyes on the kernel code and thus less ability to catch attacks and 2.) you'd be leaving 100% of the reporting responsibility in Microsoft's hands, we've seen how well that goes recently...https://www.techradar.com/pro/secur...k-down-windows-following-crowdstrike-incident
I tend to agree with Microsoft here.
The government has an important anti-trust role to play, but having a role to play and playing it well are not the same thing.
They had to. Nothing says ooh look buy Apple then every screen you see is a BSOD to the public. Worse yet you'll have some corporate type idiot want to switch not realizing what those implications could be.Nice of MS to help fix the fuck up that Crowdstrike made of things. I'm glad they are trying to help their customers instead of just blaming the offending party.
That is one heck of a claim, given Microsoft could have simply created the types of API's necessary for this type of server, which is vendor independent.https://www.techradar.com/pro/secur...k-down-windows-following-crowdstrike-incident
I tend to agree with Microsoft here.
The government has an important anti-trust role to play, but having a role to play and playing it well are not the same thing.
I mean if the recovery plan for something like this is call in all the admins and have them go desk to desk fixing PCs, that is still going to take time. That doesn't mean your recovery plans are bad, just that some recoveries are more painful than others. The costs of this disruption are still likely a lot lower than the cost to many companies to say store backup images of every user desktop that they could recover to in a rare case like this.Hopefully this is also a wake up call to the effected companies to make sure their disaster recovery plans actually work.
This won't be the last time something like this happens.
Dave's video was a really nice overview, I watched it last night.David Plummer, former Microsoft programmer, has an interesting top-level review of the incident for those who want a bit more information.
The crux of it is that Crowdstrike uses a kernel-level driver that essentially parses the definition file - which is essentially a script file- at ring-0 level. Worse, it does not do any sanity checking on the file beforehand. Because the driver is marked as 'necessary to boot', which catches you in a boot-loop when it crashes. And just to rub salt in the wound, the Crowdstrike update ignored any staging instructions set up by the administrator, so it got pushed to /every/ machine on the network. Thus, rather than a few computers being affected, every computer that used Crowdstrike crashed.
A manually instigated safe-mode boot will get you out of the boot-crash loop by bypassing the Crowdstrike driver, allowing you to delete the broken update. But since this requires physical access, something made difficult by the sheer number of machines affected and the difficulty in reaching some of them, makes this a difficult clean-up.
Credit where credit is due, Microsoft is kind-of coming in clutch here. I was at Whole Foods yesterday and saw a tech manually reflashing a Self-Checkout PoS and I was thinking that automating this by using a flash-drive would probably be the best in-the-middle approach.
The only "better" way would be if they could just automate the entire bitlocker section as well, considering that they control the secureboot signing keys.
Perhaps it wouldn't be the case anymore, but I can totally see how doing that in the past could have been a performance nightmare. AV/Antimalware is already excellent at bogging a system down to its knees, now imagine that with the overhead of interprocess messaging between kernel space and user space driversThat is one heck of a claim, given Microsoft could have simply created the types of API's necessary for this type of server, which is vendor independent.
Every procedure I've seen requires the recovery key though, not just the "regular" user-key. That was one of the limitations we had where IT took my laptop to fix but then they didn't yet have access to the bitlocker recovery key system and couldn't use my own PIN to unlock it (even had me try typing it in the command window to attempt unlocking, it didn't work). I could put my PIN in to reach the BSOD...but not do any kind of repair.I think the problem there is you still need the bitlocker key to access the encrypted drive contents, otherwise you're going to be doing a nuke and pave reinstall
The "easy" fix documented by both CrowdStrike (whose direct fault this is) and Microsoft (which has taken a lot of the blame for it in mainstream reporting, partly because of an unrelated July 18 Azure outage that had hit shortly before)
It's kind of incredible that they blew up tons of linux systems a few months ago and nobody really realizes it. And it also shows that this isn't some microsoft/windows exclusive problem (I'm so sick of the "lol winowz bad" snark), it's just a risk of doing anything so close into the kernel, which is just sometimes necessaryAccording to The Register Crowdstrike's failure to adequately sanitise input extends to the Linux version of the Falcon sensor as well as Windows. I think the failure also implies that they aren't utilising fuzzing to test their software.
Given that the Crowdstrike CEO was CTO at McAfee in 2010 when they were responsible for a similar incident I suggest watching George Kurtz's future career trajectory and avoiding software from any company that employs him!
One of my coworkers in India received a hand-written boarding pass!I have a co-worker who‘s been stuck since Saturday. He might get home mid-week.
I think they know if they get into that game it will end with a lot of customers moving things to other operating systems. There's nothing inherently complex about signage and kiosks where it HAS to be windows.Nice of MS to help fix the fuck up that Crowdstrike made of things. I'm glad they are trying to help their customers instead of just blaming the offending party.
This is extraordinarily misleading as it just would mean Defender would have to dogfood the same out-of-kernel APIs that other EDR vendors would get to remain compliant in the EU.https://www.techradar.com/pro/secur...k-down-windows-following-crowdstrike-incident
I tend to agree with Microsoft here.
The government has an important anti-trust role to play, but having a role to play and playing it well are not the same thing.
I hope there's industry demand for 3rd party validation from kernel driver experts.IMO, CrowdStrike need to clearly and transparently explain how this happened.
- exactly what is their pre-release test protocol?
- was that protocol followed in this case?
- if so, how could they have missed an issue of this magnitude?
- if not, why is it possible to bypass testing?
...and most importantly:
- how will they be modifying their test protocol to ensure that this can not happen again?
I need to hear this because I can't understand how any sort of modern continuous integration and testing software development process could have shipped this to customers. It didn't only trigger in rare conditions - it killed absolutely vanilla Windows installations and did so consistently. Do they really run an environment where they ship code (ok, definition updates in this case) to customers without even a limited internal deployment to a test farm first?
Now that the acute issue is on it's was to resolution, we all deserve some answers.
It appears they pushed a zeroed-out or otherwise corrupted definitions file. Perhaps a placeholder file.Do they really run an environment where they ship code (ok, definition updates in this case) to customers without even a limited internal deployment to a test farm first?
That's a very weak argument. If Microsoft was able to lock down the kernel, then the CrowdStrike functionality could only be offered by Microsoft. What makes you think that Microsoft engineers could never make a similar mistake as happened in this case?https://www.techradar.com/pro/secur...k-down-windows-following-crowdstrike-incident
I tend to agree with Microsoft here.
The government has an important anti-trust role to play, but having a role to play and playing it well are not the same thing.
I understand why some software has to operate close to the metal but Windows and Linux operate with only two protection rings when x64 cpus support four. Perhaps there's an argument for device drivers (and the Falcon agent was written as a device driver) to sit within a ring above the kernel but below user space. Increasing amounts of software is being run in kernel space (the Wireguard VPN e.g.) partly because the transition from user space to kernel space takes too long so reducing the transition time would also need to looked at. This is very definitely not a quick fix but we must look at ways of improving software reliability and security.It's kind of incredible that they blew up tons of linux systems a few months ago and nobody really realizes it. And it also shows that this isn't some microsoft/windows exclusive problem (I'm so sick of the "lol winowz bad" snark), it's just a risk of doing anything so close into the kernel, which is just sometimes necessary
The laid down definitions file that ended up on disk was corrupted in some manner. You can get multiple copies of the same bad 291*.*32.sys file and see that eat has a different byte sequence.It appears they pushed a zeroed-out or otherwise corrupted definitions file. Perhaps a placeholder file.
If the definitions file pushed to customers was sent out in error (or full of errors), no amount of pre-push testing may have prevented this. Of course, they could push in stages, with the first stage going to heavily monitored systems. Imagine they will start doing this now.
The real questions is why their parser didn't reject the improper file? Not only did the parser not require securely signed code, it appears to have had no validation whatsoever... This for a parser that is running in ring zero of the kernel...
Further, some have suggested that the definitions file may have been that in name only. In that it may include code that the parser executes, again, in ring zero of the kernal. Executable code that is pushed out to customers multiple times each day.
Crowdstrike customers should be demanding detailed answers to each of these questions. There is now quite a lot of competition in this market.
They have telemetry from devices. We know exactly how many are affected. Rollout should cease if there's a problem.It appears they pushed a zeroed-out or otherwise corrupted definitions file. Perhaps a placeholder file.
If the definitions file pushed to customers was sent out in error (or full of errors), no amount of pre-push testing may have prevented this. Of course, they could push in stages, with the first stage going to heavily monitored systems. Imagine they will start doing this now.
The real question is why their parser didn't reject the improper file? Not only did the parser not require securely signed code, it appears to have had no validation whatsoever... This for a parser that is running in ring zero of the kernel.
Further, some have suggested that the definitions file may have been that in name only. In that it may include code that the parser executes, again, in ring zero of the kernal. Executable code that is pushed out to customers multiple times each day, often to prevent zero day exploits.
The internal testing of such frequent updates must be .. challenging. Given the lack of parser validation and the frequency of these updates, a catastrophe like this was likely inevitable.
Crowdstrike customers should be demanding detailed answers to each of these questions. There is now quite a lot of competition in this market.
Yeah, this particular incident didn't impact Linux systems, but they did have a similar event with the Linux software a month or two ago. Which is fucking wild.According to The Register Crowdstrike's failure to adequately sanitise input extends to the Linux version of the Falcon sensor as well as Windows. I think the failure also implies that they aren't utilising fuzzing to test their software.
Given that the Crowdstrike CEO was CTO at McAfee in 2010 when they were responsible for a similar incident I suggest watching George Kurtz's future career trajectory and avoiding software from any company that employs him!
Do they require that of apple with gatekeeper and the other inbuilt mac AV vs competitors? I know it's been a long complaint how gimped/limited mac AV has been compared to windows precisely because of thisThis is extraordinarily misleading as it just would mean Defender would have to dogfood the same out-of-kernel APIs that other EDR vendors would get to remain compliant in the EU.
But Microsoft won’t do it because it doesn’t think there should be a security boundary between admin and kernel. A security boundary that’s required when implementing a replacement API.
You download the script from the link in the article (https://techcommunity.microsoft.com...with-crowdstrike-issue-impacting/ba-p/4196959), plug a blank thumb drive into your PC, and run the script. Then you distribute the thumb drive to whoever in your organization needs them.And these magic thumb drives are distributed how?