The dizzying pace of technology innovation has given rise to possibilities unimaginable even ten years ago. Cloud services, big data, IoT, mobile apps, and others are providing enterprises new ways of improving their bottom line. What hasn’t kept pace are the policy, people, process and technologies to keep it all running when unexpected events occur. What will you do when disaster strikes? Do you have a business continuity and disaster recovery plan for your business?
The Problem with Disaster Recovery
Business continuity and disaster recovery plans are the ginger haired foster child of IT. It’s always invited last to the dinner table, ignored by the parents but yelled at loudest when it fails. The inconsistency in how companies and industries approach this is understandable. Think about it. It’s hard to bring yourself to pay for something you hope you never have to use, and when you do, it’s probably because of something awful. Expressing the Return on Investment for business continuity planning is really hard. Because of this, lots of organizations push off developing business continuity and disaster recovery plans to “later.”
Requirements Driving the Need for Business Continuity and Disaster Recovery
Without question, the Federal Government has had the greatest impact on business continuity and disaster recovery. Through regulation and law, the Government has influenced the requirement for disaster recovery albeit with a noticeable lack of how. Entities and organizations that serve the “common good” are required to have measures that protect against and recover from disruption. Graham-Leach-Bliley, Sarbanes-Oxley, and HIPAA all use regulatory pressure to mandate data protection, disaster recovery and business continuity policies and practice. The events of 9/11 were clearly a tipping point for the severity of disruption and how exposed certain industries were without continuity plans.
A Simple Approach to Business Impact and Risk Assessments
Your company or division has reached the point where an unforeseen event will affect the organization’s core purpose. Not everything you do is core to your business. There are some activities you have to do, but they aren’t why you exist. Have you performed a Business Impact Analysis (BIA) to understand the impact of a disruption on core (or critical) functions and identified potential loss scenarios (or hazards) through a modest Risk Assessment (RA)? Don’t get fancy and try to apply statistical methods. Underwriting and insurance companies use the law of large numbers to create risk models. You don’t have that.
Keep it simple and ask yourself:
- What happens when this fails?
- How long can I wait until it comes back?
- What’s the impact on the delay between failure and restore?
This is the BIA.
Staying with the simple theme, there is the risk (something happens good or bad), a cause (the trigger) and the impact (outcome). As a result of “cause,” an “uncertain event” could happen which might lead to an “effect.”
This is the RA.
Once you can establish a cadence, try to create an easily repeatable process for both BIA and RA. It’s very important to identify and keep pace with changes to business processes and related impact.
Just because there’s risk doesn’t mean you have to create, maintain and test for every risk event. How far you decide to go should be based on how the outcome affects the organization’s core purpose. Contractual and regulatory compliance requirements also dictate the degree of planning required for in-scope systems. A common nomenclature for risk management would look like this:
- Avoid – eliminate any impact of risk event
- Accept – if it happens, deal with it
- Transfer – move it somewhere else; e.g. the cloud
- Mitigate – take steps to reduce outcome should risk event occur
Have you evaluated the different risk handling alternatives (accept, avoid, etc.) available to you? Business continuity and disaster recovery planning isn’t just IT. Sure, you may experience equipment or software failures and be aware of natural disasters like flood, hurricane and tornado. What about pandemic, cyber-hostage, or even key employee or supplier turnover? By thinking about the three simple questions you’ll be able to consider all possibilities of disruption and be able to plan to respond.
Disaster Recovery Planning is Changing
Much of today’s disaster recovery planning can be traced back to cold war Government preparedness. The secret bunker at the Greenbrier resort exemplifies the 1950’s thinking of how to react to an apocalyptic event; bad things will happen, so you must duplicate as much as possible. When you think about IT business continuity planning, the historical mainframe mentality is not applicable to today’s decentralization of business functions, IT delivery and outsourced services; “the stuff is everywhere.”
Another problem with legacy disaster recovery and business continuity plans is that they are hard to test because the event is typically destructive or catastrophic. Because of this, if the plans are tested they may not fully expose the plan effectiveness. It takes imagination to create reasonable tests and plans for disasters. Business continuity and disaster recovery plans have to change with the business and the technology that supports it. Have you considered how your legacy disaster recovery plan has scaled with your business. Do you have the necessary resources to execute TODAY?
Business Continuity in the Cloud
Yes, workloads are moving to the cloud which means that some risk is transferred to the provider. What does the provider do with that risk? What happens when the cloud fails?
- March 2017, a typographical error by cloud provider technician disabled hundreds of thousands of customers for hours. A month earlier, the same provider experienced a 4-hour outage to its storage services.
- August 2017, an interactive cloud-based online gaming environment was unavailable preventing login, multiplayer games, and even login for single-player games.
- October 2017, during routine data center maintenance, an accidental activation of a fire suppression system created a sequence of cascading failures that resulted in a loss of cloud-based compute, storage, backup/recovery and reporting for over 8 hours.
Just because you’ve moved workloads to the cloud doesn’t mean you are insulated from failure. You still must assess the risk, create a plan, practice, test and update.
Can your Data Center Deliver Business Continuity?
Despite highly advanced engineering and N+ “whatever” designs, the data center industry still experiences business affecting failures. Causes range from “faulty UPS,” “generator failure” and “poorly manufactured switch gear” to “squirrel in transformer.” 77% of 200 CEO’s surveyed by the Marsh Group in 2015 expected a failure in their data center.
Time will tell how the data center hubs of Ashburn, Atlanta, Chicago, Dallas and Santa Clara deliver reliability and redundancy for the emerging needs of IoT, IIoT, mobility and service needs on the Edge.
In summary, nothing’s failure proof. We can’t predict or control nature and whenever there’s a human involved, there’s a possibility of a mistake. Look at the fundamental purpose of your business and begin to build a list of exposure. Prioritize your list and decide how you would treat each risk and then create a budget. Don’t forget that business continuity and disaster recovery plans require update, test and practice. Ultimately, you will weigh the costs of the risk event against the cost of having a plan to address it.