ADVERTISEMENT

CrowdStrike’s Global Outage Doesn’t Have To Be A Recurring Nightmare

Widespread catastrophic failure is inevitable when companies are so reliant on just a few dominant cloud vendors that may need stricter processes themselves.

Chaos.
Chaos.

One of the most disturbing things about Friday’s devastating global outage of IT systems is how routine such ruinous events have become.

In the last few years, similar glitches from companies like Amazon.com Inc. have temporarily shut down systems across the globe, and this latest issue comes as a result of a botched software update from cyber security firm CrowdStrike Holdings Inc., whose link to mega customer Microsoft Corp. has led to worldwide problems — including chaos in airports, stock exchanges and hospitals, though a fix has now been deployed.

This time the scale is unprecedented. That should spur Microsoft and other IT firms to do more than simply administer a band-aid. Policy makers could address the world’s over-reliance on just three cloud providers too. Today’s reality, where a single bug can harm millions of people at once, doesn’t have to be the status quo.

There’s a recommendation for you too, dear reader: Do something nice for your IT people today. Bring them donuts, coffee or something stronger if it’s late enough, because they’re in for a rough weekend as resolving Friday’s shutdown becomes a slow, complicated process. Network technicians and engineers have been scrambling to address the blue screen of death that has popped up on Windows computers around the world, effectively making them useless. It’s forced airlines to write their flight times on white boards and issue hand-written paper tickets; one TV news station in Britain was forced to go off the air.

The glitch is due to an update of CrowdStrike's Falcon software, ironically designed to prevent harm from viruses and cyber threats and described as a “tiny, single, lightweight sensor.” Falcon counts Microsoft as a key customer and crucially, has privileged access to one of the most fundamental cores of an operating system like Windows, known as the kernel.

In theory, this is a good idea. If CrowdStrike’s tool didn’t have this access, then any malicious hacker who got root access could simply deactivate CrowdStrike’s anti-virus software and run rampant.

But it’s now obvious there’s a flip side to having that kind of privileged access, if CrowdStrike itself makes an error. That’s why blame shouldn’t just fall on CrowdStrike (whose shares had fallen by more than 20% early Friday morning) but also on Microsoft for arguably not designing a more resilient operating system. Damningly, Apple Inc. and Linux’s operating systems were not impacted by the glitch at all, according to a blog post from CrowdStrike on Friday. And neither appear to give Falcon such privileged access to their kernel, which now looks unwise. Microsoft didn’t respond to a request for comment.

This wasn’t a cyberattack, but, like previous outages, the result of the Byzantine complexity of cloud IT processes. The cyber security industry has done a stellar job in the last decade in marketing itself as a salvo to all manner of frightening threat actors, but one downside may be that companies have neglected basic IT hygiene as that infrastructure becomes more intricate. “Over the last few years, most of our customers have ended up spending more on cyber security than on IT,” Palo Alto Networks Inc. Chief Executive Officer Nikesh Arora said earlier this year.

One technical solution might go back, naturally enough, to the age-old trick of “turning it off and on again.” Joao Alves, head of engineering at online marketplace Adevinta, tweeted that the tech industry will likely demand that cloud providers, “double boot for OS and kernel-modules upgrades.” In plain English, that means restart a system twice when updating software. The first boot applies the update, and the second makes sure the system is stable before fully activating the changes. Microsoft didn’t reply to questions at the time of writing about whether it has such processes in place.

But these are only piecemeal solutions. The bigger problem is for cloud computing and, by extension, cyber security services, which has left too many companies and organizations vulnerable to a single point of failure. When just three companies — Microsoft, Amazon and Alphabet Inc.’s Google — dominate the market for cloud computing, one minor incident can have global ramifications. 

European lawmakers are furthest ahead in addressing the market stranglehold that these so-called hyperscalers have with its new Data Act, which aims to lower the cost of switching between cloud providers and improve interoperability.

US lawmakers should get in the game too. One idea might be to force companies in critical sectors like healthcare, finance, transportation and energy to use more than just one cloud provider for their core infrastructure, which tends to be the status quo. Instead, a new regulation could force them to use at least two independent providers for their core operations, or at least ensure that no single provider accounts for more than about two-thirds of their critical IT infrastructure. If one provider has a catastrophic failure, the other can keep things running. 

As painful as Friday’s outage has been, it’d be a waste to not use it as a catalyst to stop what is fast becoming a recurring nightmare.

More From Bloomberg Opinion:

  • JD Vance Wants to Make Silicon Valley Great Again: Parmy Olson
  • Wall Street Senses the Barbarians Are Finally at the Gates: Paul J. Davies
  • Britain Has a $5 Billion Bitcoin Stash. Reeves Can Unleash It: Merryn Somerset Webb

This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.

Parmy Olson is a Bloomberg Opinion columnist covering technology. A former reporter for the Wall Street Journal and Forbes, she is author of “We Are Anonymous.”

More stories like this are available on bloomberg.com/opinion

©2024 Bloomberg L.P.