Recovering from a disaster: A data center checklist
When you developed your data center disaster recovery (DR) plan. you designed it to protect your organization’s investment in information technology, communications and its staff. Depending on the nature of the disruption, your data center’s overall integrity may be untouched or it could be totally destroyed.
DR plans need to be flexible and scalable to address a broad range of disruption scenarios. In this article, we’ll provide data center checklists with recommended actions you can take in the aftermath of disaster. These checklists will make recovering from a disaster easier. Make sure you have the data center checklist—or a modified version using your own requirements—as you review the effects of a disruptive incident to your data center. Once you have completed an initial assessment of the situation and you are satisfied with the location of your staff, begin executing the DR plan.
Data center disaster recovery planning assumptions
A data center disaster recovery plan focuses exclusively on a data center facility and its infrastructure, such as its physical location, construction, security, power sources, environmental systems and its people. Be sure you’ve factored in the operational aspects of your data center as well as the people supporting it. This means addressing the following as you build your DR plan:
- Data center technical and management staff, all shifts
- Data center building (e.g. physical infrastructure, construction, location of entrances and exits, raised floor areas)
- Building location (e.g. access routes, proximity to highways, rail lines and airports; proximity to fuel storage tanks)
- Power generation (e.g. commercial power, backup power systems)
- Power protection (grounding and bonding, lightning arrestors, line conditioners, surge suppressors)
- Environment (e.g. heating, ventilation and air conditioning)
- Critical systems (e.g. servers, power distribution units, VoIP systems, call center systems)
- Network infrastructure (e.g. cabling, connectors, routers, copper and fiber circuits, cable racks)
- Security (physical access and information security)
- Work space (e.g. offices, conference rooms, cubicles, furniture, lighting)
- Fire protection (e.g. fire detectors, smoke detectors, fire extinguishers, FM200 extinguishment systems)
- Building floors and walls (fire-rated walls, raised floors)
- Utilities (e.g. water, power, sewer, communications)
Developing the disaster response
When developing disaster response action steps (the incident response part of a DR plan), you should discuss your ideas with building management (if your firm is a tenant) or facilities management (if the building is your own), as well as IT management. Review your response plan with all appropriate internal and external parties (e.g. first responders) to ensure that you are covering all the bases.
Factor in the following items as part of your design process:
- Relationships with various IT groups, such as the internal technology team, application team and network administrator(s). This ensures that all groups that regularly use the data center’s facilities have input into the disaster response process
- Relationships with external stakeholders, such as vendors and managed service providers
- Relationships with other company offices (if you have them) as they could be an important part of your recovery plan (e.g. providing alternate data center space)
- Relevant infrastructure documents, e.g. building plans, floor plans, system maps, network diagrams and equipment configurations
The following items should be factored into your disaster response:
1. Management’s perception of the most serious data center threats, e.g. fire, human error, loss of power, system failure, security breach. Be aware that initial management assumptions may be wrong, so be prepared to make corrections quickly.
2. Management’s perception of the most serious vulnerabilities to the data center, e.g. outdated backup power systems.
3. Results of previous data center outages and disruptions, how they were handled and lessons learned.
4. Management’s maximum acceptable outage time for a data center disruption.
5. Established industry practices for responding to data center disruptions.
6. Experience and lessons learned from other data center disasters.
7. Data center emergency team(s) that are trained in responding to emergencies.
8. Emergency response capabilities of your primary and alternate data center vendors and emergency response capabilities. If they have ever been used, did they work properly? Cost of the services and status of service contracts.
Data center checklist: General response
The following checklist can be used in the initial response stages of a data center disruption. Clearly the nature of the incident will influence which steps you will take and in which sequence. For example, response steps for a power outage will probably be somewhat different than for a fire. Be sure to include these steps in your DR plan.
Scenario 1: Power outage
The previous steps assume that specific plans have been developed for the various situations listed, such as email recovery, hardware and software recovery, data recovery, document recovery and relocation to an alternate data center.
Once the situation has been mitigated and recovery can begin, assess the event, determine what happened, what worked and what didn’t work. Schedule and conduct meetings as often as practical to compile this important data, as it may be necessary for insurance claims and even possible lawsuits.
Additional data center disaster recovery planning resources
Developing a data center disaster response can be very complex, depending on the amount of detail you elect to include. One way to facilitate this process is to review existing standards and data center practices. Three useful ones are:
When building a data center disaster recovery plan, keep in mind the following actions:
1. Secure senior management support so that your plans can be funded, documented and regularly exercised.
2. Take the data center DR planning process seriously: Plans do not have to be dozens of pages long, but they should contain current and accurate information.
3. Consider using standards as part of the process, such as the ones previously listed.
4. Keep the planning process simple by gathering and organizing the right information
5. Review results with key departments, such as facilities, to ensure that your assumptions are correct.
Data center disasters can seriously disrupt business operations. While some firms address data center recovery by building a second data center or leasing specially equipped space at a third-party facility, a careful assessment of data center operations and risks is an important starting point in a DR program. With a well-developed disaster recovery plan, especially one with well-defined recovery and restoration steps, damage to a data center can be minimized.
This was last published in May 2011
Dig Deeper on Disaster recovery planning – management