Recently, a significant server outage occurred, affecting systems running on Windows Server. The disruption was caused by the release of a faulty update for CrowdStrike, which led to widespread blue screens of death (BSOD) on servers. This outage paralyzed airports and banking systems worldwide. To fix the affected servers, administrators had to boot them in safe mode and remove the faulty file.
This incident highlights the critical importance of having robust disaster recovery (DR) tools in place – not only to defend against cyberattacks but also to manage unexpected failures like faulty updates.
The Role of Disaster Recovery in Mitigating Downtime
Disaster recovery solutions, such as continuous replicator, are essential in minimizing downtime and ensuring business continuity. In the event of a failure like the recent CrowdStrike update issue, having a DR solution allows administrators to revert to a state before the update was applied. For instance, continuous replicator provides an RPO (Recovery Point Objective) of approximately 5 seconds, meaning it captures checkpoints every 5 seconds. This enables organizations to recover their servers from the most recent stable state quickly.
Benefits of DR Solutions
- Minimal Downtime: Using a DR solution, the affected server can be brought back online within minutes, significantly reducing the recovery time compared to manually troubleshooting and fixing the server.
- Operational Continuity: By activating the server in the DR environment, it can continue to function normally, maintaining critical operations while IT staff work on fixing the production environment.
- Reduced Risk and Impact: DR tools help mitigate the risk and impact of faulty updates or other unexpected issues by providing a quick and reliable recovery option.
Case in Point: The CrowdStrike Update Incident
In the recent CrowdStrike incident, if a company had implemented a DR solution like Zerto, the process would have been much smoother:
- Immediate Recovery: Servers could have been rolled back to a state just before the update, thanks to the frequent checkpoints.
- Quick Resumption of Services: The recovery process would take only a few minutes, allowing the affected systems to resume normal operations quickly.
- Concurrent Repairs: While the DR environment ensured operational continuity, IT teams could focus on repairing and stabilizing the production servers without pressure.
Our Services
At Kyndryl, we offer comprehensive disaster recovery services tailored to various tools and platforms for both on-premises and cloud environments. Our expertise includes the implementation, maintenance, and restoration of systems when needed. By partnering with us, you can ensure that your business is prepared for any unforeseen disruptions and can maintain continuous operations with minimal downtime.
Find out more about Kyndryl Resiliency and Disaster Recovery services:
- Disaster Recovery Plans
- Business Continuity Plan
- Cyber Incident Recovery
- Hybrid Platform Recovery
- Resiliency Orchestration Managed Services
- Incident Recovery Services
- Security and Resiliency
Conclusion
The recent server outage caused by the faulty CrowdStrike update serves as a stark reminder of the importance of disaster recovery tools. These solutions are not just for mitigating cyberattacks but are crucial for handling any unexpected disruptions, including faulty updates. Investing in a robust DR solution can save time, reduce downtime, and ensure business continuity in the face of unforeseen challenges. Kyndryl is here to support your disaster recovery needs, ensuring your systems are always ready to recover quickly and efficiently.


Leave a comment