What Caused the Global IT Outage Today?

And What We Can Learn From It

We are now a community of 220! Thank you❤️

This newsletter is free and I don’t use paid advertising. I completely rely on organic growth through users who like my content and share it.

So, if you like today’s edition, please take a moment to share this newsletter on social media or forward this email to someone you know.

If this email was forwarded to you, you can subscribe here.

If you want to create a newsletter with Beehiiv, you can sign up here.

What Caused the Global IT Outage Today?

The weekend came early for Windows users around the world as an issue with the latest update from CrowdStrike, a network security company, caused Windows systems around the world to crash and even go into a boot loop.

This is the official response from CrowdStrike’s President and CEO, George Kurtz:

A bit concerning is the absence of any hint of an apology from George who was the CEO of Foundstone and CTO of McAfee previously.

The impact of the global outage was massive. It disrupted businesses and institutions in multiple countries, throwing airports, airlines, railways, government services, banks, stock exchanges, media houses, and more into chaos.

Some businesses completely stopped services while some decided the show must go on. Indigo, an airline from India, issued handwritten boarding passes as the software for printing the passes was on Windows.

Here’s a post by Akshay Kothari, co-founder of Notion, who was flying from Hyderabad to Kolkata:

Sky News went completely off air early in the day but got back eventually while the effects of the outage spread like wildfire.

According to aviation analytics firm Cirium, more than 1,000 flights have been cancelled worldwide today, as of 10:30 GMT.

Although, CrowdStrike says the fix has been deployed - it is not that simple. To receive the fix, a computer must be connected to the internet and for that, it must start in the first place. Many laptops with applications like Bitlocker (a Windows feature for encryption) refuse to even start.

The grave error has already cost the world millions if not billions even when it was not a cybersecurity attack or a hack.

If you’re reading this and your organization uses Windows, your IT team must have already fixed the issue in your laptop. If for some reason, you also have an issue in your personal laptop and don’t have IT support around, Reddit saved the day early with this fix on the CrowdStrike subreddit.

These are the steps to be followed:

  1. Boot Windows into Safe Mode or the Windows Recovery Environment

  2. Navigate to the C:\Windows\System32\drivers\CrowdStrike directory

  3. Locate the file matching “C-00000291*.sys”, and delete it.

  4. Boot the host normally.

DISCLAIMER: If you are unsure about any of these, get help from someone who does. Something done wrong can cause you to lose your data in the worst case.

The Blue Screen of Death (BSoD) is another way of referring to the crash message screen of Windows laptop. From mall displays to card payment machines, it was everywhere today.

The outage today is a reminder that no matter how secure and advanced our technology is, things can go very wrong, very quickly. This outage was not a security breach but because it was so long, the financial losses might be equivalent if not more.

A wrong update should never be deployed on such a large scale and some quality assurance or security check might have been manually skipped at CrowdStrike. This shouldn’t have happened. Ultimately, some employees or teams at CrowdStrike might be in big trouble today.

There’s no way CrowdStrike can take full responsibility for this outage. They will go bankrupt if some big companies go after them. This will not happen. Every tech company knows that today it was CrowdStrike, tomorrow it could be them.

For CrowdStrike, a company founded in 2011, the future has suddenly become much more uncertain. Their legal and PR team have their most difficult assignment ever thrown right at their faces.

As our world gets more connected, the risk surface area for such outages always increases. This outage was avoidable but sometimes tech going down is virtually inevitable. It is also a reminder of how fragile the cloud really is and why emergency, health, banking and government systems need to have multiple backups and servers with high resiliency always available.

Meanwhile, Mac and Linux users are more smug than ever but there’s no reason why a similar mistake cannot happen with other operating systems too.

If you’re connected to the internet, there’s always some risk and it can never be completely eliminated.

Did you like today’s newsletter? Feel free to reply to this mail.

This newsletter is free but you can support me here.

I’d be happy to connect on Medium, LinkedIn and X.

Reply

or to participate.