CrowdStrike update causing a BSOD error and mass outages
Based on multiple reports, a significant portion of the tech industry has been disrupted today due to blue screens caused by the latest CrowdStrike update. Numerous fintech and banking companies are reportedly facing outages, and Visa payWave services are down. This issue is affecting almost every Windows user who uses the CrowdStrike cybersecurity solution. It impacts both Windows desktops and servers, meaning airlines, railways, 911 services, radio stations, banks, insurance companies, and others are currently unable to function normally.
Although there's a workaround, CrowdStrike is currently investigating the cause of this major issue. However, there's a caveat: if your company follows best IT practices, your employees shouldn't have access to admin-level accounts, so editing the System folders themselves won't be an option. The only recourse is to either contact the IT department or wait for further updates from CrowdStrike. An official CrowdStrike Reddit thread is available for more information.
Workaround for those who can access Safe Mode and edit the System folder:
- Boot Windows into Safe Mode or the Windows Recovery Environment
- Navigate to the
C:\Windows\System32\drivers\CrowdStrike
directory - Locate the file matching
C-00000291*.sys
, and delete it. - Boot the host normally.
Important! We would like to emphasize that it's not Microsoft's fault (we have seen that people blame them); it's because of third-party software that runs on Microsoft operating systems.
The Reason Behind the Outage
Everyone wants to know how and why this happened. Currently, even CrowdStrike may not know the actual cause, but we can speculate on what could have been done to avoid such a scenario.
We believe that, most likely, such a problem could have been avoided with thorough integration testing and compatibility testing, ensuring that this type of regression/situation was prevented from reaching production and end users in the first place.
We've seen similar scenarios quite often during our decade-plus experience in software QA and testing, helping companies avoid such and similar issues and reach the production environment. As a result, a significant regression was missed, causing BSOD (Blue Screen of Death) for thousands, potentially millions of people and affecting businesses worldwide.
The other question is why. Bugs can happen and cause outages; that's a known fact. But how did a third-party app cause a BSOD in the Microsoft operating system?
CrowdStrike is a cybersecurity solution that operates at a low level within the Windows operating system, often utilizing kernel-level drivers to monitor system activity. A fatal error leading to a BSOD could occur if:
- The update introduced a conflict with a critical Windows system component.
- There was a bug in the driver code that caused system instability.
- The update made incompatible changes to system memory or critical system structures.
- The software interfered with other security software or drivers in an unexpected way.
- Or maybe something else we haven't thought of.
Such deep integration with the operating system means that any significant issues in CrowdStrike's code could potentially crash the entire system, resulting in a BSOD. This incident highlights the importance of profound software testing, especially for software that operates at such a fundamental level of the operating system. In addition, this testing should be done before releases in stage environments. If and when CrowdStrike releases an update, we will update the article and let you know.
The bottom line
People rely on various types of software produced by tech companies, and often, their lives depend on the quality of the software. We're here to help you care about your users and the quality of your solution through testing. Drop us a message to book a consultation.