The CrowdStrike gaffe that caused millions of Windows machines to crash with the infamous Blue Screen of Death (BSOD) could have happened to anyone considering how security updates are pushed these days, experts believe.
With updates being rolled out daily by security vendors, it is important that they aren’t rushed and go through the basic due diligence to ensure something like CrowdStrike outage doesn’t happen again, according to Chris Steffen, vice president of research at Enterprise Management Associates.
“Companies need to have a better understanding of patch management, and patches should never be deployed directly into a production environment without being first tested in a staging or testing environment,” said Steffen. “Once upon a time, this used to be the case with all patches. But auto-patching and updates occurring daily have made companies more lax about patch risk evaluation.”
Rushing through patches and pushing them directly to global environments has become mainstream making it likely that another vendor does this again.
Was CrowdStrike Snafu avoidable?
In its initial post-incident review, CrowdStrike revealed that the crashing of its customers’ computers was caused by a flaw in Channel File 291, included in a sensor configuration update released to Windows systems at 04:09 UTC on July 19. The review offered a preliminary explanation of how the flaw was deployed and outlined changes being implemented to prevent future occurrences.
The incident prompted industry leaders to reconsider, with many CIOs re-evaluating their dependence on cloud-based security software like CrowdStrike. Some serious questions were asked regarding CrowdStrike’s lapse.
“Everything is software and software is everything – it’s more interconnected and interdependent than ever,” said Marcus Merrill, principal test strategist at Sauce Labs. “If the software update release going out there affects not just your users but your users’ users, you must slow-roll the release over a period of hours or days, rather than risk crippling the entire planet with one large update.”
The risk of not shipping a code is always higher than the risk of shutting down the world, Merrill noted emphasizing that the vulnerabilities fixed in this update were pretty minor by comparison.
“Critical infrastructure might have an EDR or XDR solution slapped onto it just to” check the box” but the scenario where the provider accidentally breaks the infrastructure isn’t one you ever really think of,” said John Hammond, a principal security researcher at Huntress.
Other vendors aren’t immune
It would be naive to think of a world free of CrowdStrike-like scenarios, especially in the present day of interconnectivity and dependency. CrowdStrike, incidentally, happened to be the one with the slip-up but it could have been anyone, several believe.
“It’s important to note that this is not a security failure,” said Duncan Brown, group vice president of research at IDC. “SaaS-based vendors are making releases daily, so theoretically, this kind of incident could happen more often. It just happened to be CrowdStrike, so the security aspect – at least to some degree – is a red herring. But of course, there is a presumed urgency to security updates, which probably meant that the update was distributed and installed quickly and widely.”
Brown noted that while cloud-based updates are swift and beneficial for addressing security vulnerabilities, they come with an increased risk of incidents similar to the CrowdStrike issue. The alternative to cloud rollouts, an on-premises infrastructure, offers more control for companies but is slower and more costly, he added.
Steffen, too, believed such incidents aren’t too uncommon. “It is hard to berate CrowdStrike too much about the incident, as these kinds of programming errors happen to literally every software vendor,” he said. “There are not really any metrics that discuss how often these sorts of issues occur with a malformed patch or update, but they do happen, and they are not particularly rare. The impact is either not as widespread (CrowdStrike’s reach is significant) or not as severe (taking down entire systems).” It is absolutely critical that vendors supplying patches – especially critical, time-sensitive patches – thoroughly test to ensure that those updates are not causing harm or outages, he added.