Planning to Failover

I remember a few years ago being asked, “What are you going to do if we need to failover to our disaster recovery site?” My quick answer was, “I will be working on my resume.” Not very politically correct. The truth was, we didn’t have a plan to failover because we had not replicated our production environment. In the case of a disaster, it would take days to build. We lacked a plan.

As with most companies, we had a secure copy of our data backed up nightly at an offsite disaster recovery site. Many people, even CIOs, believe that routine data backups cover the organization in an outage or disaster. However, data backup and disaster recovery are not the same things.

If you fail to deploy high availability strategies for critical applications, it will result in a significant loss – either in downtime, data or your organization’s reputation. In the past two years, we have read about outages that lasted days. Delta Airlines and British Airways data center outages lasted nearly three days each and cost more than $100 million in lost revenue.

Backing up your data without a recovery environment is almost the same as not backing it up at all. Furthermore, if the recovery environment does not reflect your production environment then it is pointless. The servers, operating systems, external and internal connections, storage etc. must be in place. Too often, the people, processes and tools that are necessary to restore and recover are an afterthought.  Unfortunately, that means you must deal with them for the first time in the middle of an outage or disaster.

Business Continuity and Data Availability

We made a decision to eliminate the disaster recovery site in favor of two active data centers. We decided our most important service is our availability. The main purpose of having a disaster recovery site is to recover quickly. This ensures that the operation of business continues with minimal impact on availability for customers. While it may be desirable to recover all applications as quickly as possible, it is not likely to be done. The recovery process has to be prioritized against the applications and services that are often critical.

With a primary focus on availability, a traditional disaster recovery site is no longer good enough. Critical applications hosted on the mainframe must be accessible at all times. We addressed this by implementing a design that allows the mainframe to move from the primary site to a secondary site. This is designed to take effect in case of a failure at either location. We tested the system with the assistance of our team, supporting the needs of our customers to focus on the most business-critical applications.

So, what did we accomplish? We made certain that critical applications have the ability to failover in real time.  Being able to failover means you can protect the organization from data loss and real-time is essential to your customers’ recovery time.

Ed Toner is part of the GovLoop Featured Contributor program, where we feature articles by government voices from all across the country (and world!). To see more Featured Contributor posts, click here.

Leave a Comment

Leave a comment

Leave a Reply