Home

Blog

About

Contact Me

The Fragility of Our Digital World: Lessons from the CrowdStrike Incident

July 23, 2024

l

Ethan

Dev

Unveiling the Vulnerabilities in Our Digital Infrastructure

This week’s CrowdStrike incident, where a routine update caused a cascading failure across thousands of critical systems worldwide, is a stark reminder of the fragile interconnectedness of our digital world. While this incident was a misstep, not malice, it exposes the vulnerability of our essential services and the software engineers behind them.

Widespread outages should serve as a wake-up call for engineers and developers. If an error can cause this much chaos, imagine the havoc a deliberate attack could unleash. This scenario isn’t just theoretical. The potential for bad actors to exploit these weaknesses is a chilling reality.

The Implications of a Widespread Failure

Simultaneous attacks on critical services, like emergency dispatch and hospital systems during a mass casualty event, could have catastrophic consequences. Time will tell how disruption has ultimately affected.

How Common Is This Type of Incident?

This type of incident is fairly common; it was caused by a botched update for a popular and well-regarded cybersecurity software called CrowdStrike. Botched updates happen from time to time. Who among us hasn’t had a new bug appear after updating software overnight? What makes this incident unique and different is how widespread and dire the impact was.

The faulty update triggered a Blue Screen of Death (BSOD) loop on Windows computers, rendering them useless. This was compounded by the automatic installation of the update on countless machines overnight. To make matters worse, once the computers are stuck in a death loop like this, the only option is for technicians to manually apply the fix—computer by computer. This will take days in the best of cases, causing disruption far and wide.

Thousands of companies globally have been crippled, and the recovery process will be painstakingly slow.

The Ripple Effect of a Single Update

This incident underscores the interconnectedness of modern businesses and their reliance on complex software ecosystems.

The software supply chain at a typical company can involve over 10,000 pieces of software and vendors. A single update, like a domino, can topple countless systems. With businesses relying on thousands of software components and vendors, the sheer volume necessitates automated updates. And, as we’ve seen, automation isn’t foolproof.

As engineers, while bad updates are inevitable, the consequences don’t have to be. Organizations need to plan for these disruptions and prioritize business continuity. This incident should also prompt us to question the level of trust we place in our vendors and the robustness of our interconnected systems.

A Blueprint for Disaster

This type of issue could affect any software, especially with the rise of malicious open-source software. If you think a botched update is bad, this incident could become a blueprint for far more insidious software supply chain attacks. Imagine a scenario where malicious code is intentionally distributed under the guise of an update, wreaking havoc on a massive scale that couldn’t be fixed with just an “update”.

Fortifying Our Digital Infrastructure

The stakes are too high to ignore. Once we get everything sorted out, we must use this CrowdStrike incident as a catalyst to fortify our digital infrastructure against both errors and malicious intent. This should make us all stop and consider how interconnected our modern businesses are and what level of trust and reliability should be expected for our infrastructure.

Engineering Best Practices for Preventing Such Incidents

  • Rigorous Testing Before Deployment: Utilize comprehensive automated testing suites and manual testing processes to catch potential issues before they reach production.

  • Incremental Rollouts: Deploy updates incrementally rather than all at once. This minimizes the risk of widespread failures and makes it easier to identify and address issues early.

  • Robust Rollback Mechanisms: Ensure that you have reliable rollback mechanisms in place to revert to previous stable versions quickly in case of an error.

  • Continuous Monitoring and Incident Response: Implement continuous monitoring to detect anomalies as soon as they occur. Have a well-defined incident response plan that engineers can follow to mitigate the impact of any issues.

  • Secure Software Development Lifecycle (SDLC): Incorporate security best practices throughout the development process, from initial design to final deployment, to protect against malicious threats.

  • Cross-functional Collaboration: Foster a culture of collaboration between development, QA, and operations teams to ensure seamless and secure software delivery.

Conclusion

The CrowdStrike incident is a powerful reminder to engineers of the importance of rigorous testing, secure development practices, and robust incident response strategies. By taking proactive steps, engineers can help prevent similar disruptions in the future and fortify the resilience of our digital infrastructure.

Written by Ethan

Cloud Solutions Architect. Full Stack Web Developer. Cloud Enthusiast. Gym rat. I'm a driven, detail oriented, Cloud Solution Architect based in Pittsburgh, PA. Experienced in both networking and software development cycles where I enjoy designing scalable, flexible and cost effective solutions with a focus on end user experience and business objectives. When I'm not working or at the gym I enjoy continuous learning, experimenting with new technologies and sharing what I learned to the communities.

Comments

0 Comments

Blog

Categories

AWS-Icon

AWS

Software_Development_Logo_Icon

Software Design

Network Icon

Network Design

Azure-Icon

Azure

Stay up to date with the latest news on the Cloud! We promise we won't spam you.

Stay up to date with the latest news on the Cloud! We promise we won't spam you.

Join our mailing list to receive the latest updates from our team. We promise we won't spam you.

You have Successfully Subscribed!

Share This