crowdstrike-lessons-learned

Lessons Learned From the CrowdStrike Global IT Outage

Intro

 

This article isn’t here to deep dive into the technical reasons why this incident occurred, or why it was so impactful, but rather to give a high-level overview of the incident and explore what small and medium-sized businesses in Australia can learn from this issue, and how we can better position ourselves to recover from something similar in the future.

 

What happened (a summary)

 

In the early hours of Friday, July 19, 2024, a faulty software update to CrowdStrike’s Falcon sensor caused Windows-based computers and servers to be unable to boot, disabling significant infrastructure for business services worldwide. Falcon is a next-generation antivirus (NG/AV) and Endpoint Detection and Response (EDR) tool supplied by the leading cybersecurity vendor CrowdStrike, which has been at the forefront of cybersecurity defence for many years, alongside players like SentinelOne.

While the final post-incident response hasn’t been published by CrowdStrike yet, it appears that fundamental failures in quality assurance processes were the underlying reason a seemingly insignificant mistake was able to cripple the world’s global IT infrastructure. This issue initially affected businesses in Australia and New Zealand due to the time zone differences, and as the rest of the world woke up, the problem escalated, causing widespread chaos.

 

This will happen again

 

This isn’t the first time a significant IT outage has been caused by or distributed through a supply chain network, and it unfortunately won’t be the last. Similar incidents have happened in the past and are likely to increase in frequency as the world continues to rely on global distributed infrastructure services.

 

 

  • SolarWinds: In 2020, the SolarWinds supply chain attack affected thousands of organisations, including multiple government agencies. Attackers inserted malicious code into the Orion software platform, which was then downloaded by SolarWinds’ customers as part of a routine update.

 

 

… and these are just the ones you may have heard of.

 

It could have been much worse

 

Supply chain attacks are where attackers target software updates or other critical components supplied by third parties over established distribution networks, such as third-party vendor-managed software updates, or back-end system access. The reason these attacks are so dangerous is they rely on pre-established and typically very trusted distribution networks that are often not manageable to the same degree as your own business’s IT systems.

In the case of CrowdStrike, it was essentially a bad virus definition update. How many of us can say this is something you’re actively managing?

While the CrowdStrike incident was not caused by malicious parties, it’s a clear warning of what can happen if a similar fault were intentionally introduced, and an opportunity to refine our response for future incidents.

 

What can we do differently?

 

The answer to this question isn’t a technical one, nor is it specific to CrowdStrike; it’s not specific to this incident and requires us to look internally at our own responses.

The saving grace for many businesses in recovering from this incident were the hard-working IT professionals that had the knowledge and skills to apply the required fixes, even under very difficult circumstances, combined with their established incident response and disaster recovery processes.

It’s within our response to this incident that we need to extract the lessons learned and refine our approach to future challenges. So, what are they?

Take ownership of your business continuity plans and your disaster recovery plans

Business continuity and disaster recovery preparation is often considered fully outsourced when working with external IT providers, but the fact remains that it’s your business on the line and these policies are yours to own. If you don’t know what your current BCP and DR plans look like, how can you know if they’re appropriate for your company?


Work with the right professionals that enable you to make informed decisions

Working with external IT service providers is a great way to quickly bring in the right skills and expertise to help shape and manage your IT, but the greatest service providers aren’t just managing your tech. The truly great providers are helping you to identify gaps in policy and procedures that ultimately help shape what your tech looks like.


Test and validate plans and processes regularly

Another element that’s often assumed is the availability and reliability of backup services, but as the old rule goes; an untested backup isn’t really a backup. Depending on the criticality of your services, yearly tests may suffice. If you have more complex infrastructure, you may need to test your backup technology and evaluate your processes on a quarterly basis


Remain vigilant for opportunistic attacks

It’s an unfortunate truth that at times of crisis, attackers try to take advantage of the chaos. When emotions run high, it’s easy to bypass a lot of our normal defences. During this recent CrowdStrike incident, there have been several reports of fake domains and people pretending to be from legitimate IT companies to exploit the situation and gain additional access.

 

Summary

 

In summary, while the CrowdStrike incident wasn’t caused through malice and fixes were quickly released, it’s highlighted the need for business leaders to be actively involved in the creation and testing of disaster recovery plans and procedures, to ensure downtime is kept to a minimum and the controls put in place by IT professionals remain appropriate for the business.

Business continuity and disaster recovery planning is something we offer to all clients of Aus Advantage. If you would like to learn more, get in touch with our team.

Contact – Aus Advantage