Preparing a Business Continuity Plan for Drake 12/13/2004 Version 2.2, Paul Morris, CIO Introduction The Audit Committee of the Board of Trustees has given the CIO a mandate to lead the creation of a Disaster Recovery Plan for Drake. This should cover the protection and restoration of all the information systems, central or departmental, which are critical to Drake’s business operations (finance, student systems, payroll, human resources, etc.). A description of what the plan will contain, and a timetable for its creation, are due on January 10th (i.e., a “plan for a plan”), together with current backup procedures for Drake’s systems. The completed sections of the plan will be due on the date given in the timetable. Creating this plan will require a campus-wide effort, since the processes and procedures by which Drake operates are highly decentralized. OIT will produce a plan for recovery of its own operations. But most University processes and many IT resources function outside OIT, and so to ensure business continuity, each department will need to deal with its own area of responsibility. If there is any material from the 2001 Contingency Plan which can be recycled, we will certainly do so. But that document does not have any the documentation on alternative business processes or system recovery procedures needed to deal with major outages in OIT, departmental or network services. Unit Business Continuity For each unit, there should be a plan covering existing mission-critical processes, and how they use OIT systems, departmental systems, outsourced systems, the campus network and the Internet. The plan should cover how and for how long they could function if these systems were not available. It should also cover how they would recover from a failure of departmental systems for which they are responsible. (OIT may need to modify its own processes and procedures to meet needs arising from departmental plans). It will take some time to create these plans. What is needed to meet the Board mandate is first, a plan and timetable from the departments for creation of their plans, and then a certification by the department head when the plan is complete and tested.
-1-
Alternative scenarios for recovery of systems hosted in Dial OIT’s recovery plans for a physical loss of Dial will depend to a large extent on what departments specify as the length of time they can function without OIT services: 1. If mission-critical functions can be handled manually by departments for some months, then there would be time to reconstruct Dial, acquire and install new systems, and reload and test software from backups. 2. If the “manual processing” window is two weeks or so, then OIT’s plan will need to include the creation of a “warm site”. This would be prepared space elsewhere on campus with space, power, air-conditioning and networking ready to receive new machines as soon as they could be delivered. 3. Extending the notion of a “warm site”, we could purchase duplicates of critical pieces of equipment currently in Dial. These would be placed in the new “warm site” and would be fully configured to be made active in an emergency. 4. If the manual processing window is only a few days, then a “hot” site would be required: a stand-by data center with all critical systems installed and ready to go as soon as the data backup tapes from Dial are loaded. There are clearly very different costs associated with these different responses, so thoughtful discussions with departments about their needs will be required. The reduction in down time is proportional to the amount spent in advance for site preparation, hardware and software. Organization for Plan Preparation This will be a major, campus-wide effort. Policy issues will arise, particularly regarding how much money will be spent on preventive measures, which will require decisions at the executive level. The work of creating the plan will be done at the operational level by staff in OIT and many other units. We will rely on two existing committees for the detailed work. The DUSIS Team Leaders is the most informed group for understanding the functions performed by DUSIS, and how the critical functions could be performed if DUSIS were not available. CAAD consists of a broad group of managers and staff from across the University, and would provide a good place to share information and discuss inter-departmental issues. Each unit head listed in Appendix A should designate a person to lead the activity for that area, and act as the contact person. The CIO is available to meet with any department heads or their designees who want more information or assistance, or who have concerns to be addressed. OIT staff as a whole will also be available to offer information and assistance. However, it is the departments who know their own situation, priorities and operations best, and they will need to take responsibility for their unit’s plans. Appendix B lays out a template for each unit to perform a business impact assessment of threats to their critical business operations, and acceptable down time for those operations. This is organized around “critical assets”, which for this purpose means those computer applications which Drake uses for its critical operations. It focuses on information and supporting systems. It does not deal with recovery from disasters such as residence halls or classroom buildings becoming uninhabitable.
-2-
Appendix C provides a template for documenting current backup and recovery procedures for departmental servers which contain critical applications and data. The central university assets supported by the Office of Information Technology (the server farm and its applications running in the Dial Computer Center, the campus network, the Internet connection and the telephone system) are dealt with in separate documents for which OIT is responsible. The audience for the present document is those units who need plans to cover their own critical assets and operations. The following pages indicate the information which the unit should contribute to the combined into the University’s overall plan. Immediate and Long-Term Deliverables Each unit is asked to submit by January 51: A report on current backup procedures for computers within the unit which contain critical information not stored on OIT servers. A template is given in Appendix C. A timetable by which the unit will complete the following elements of its Disaster Recovery Plan: o Prioritized List of Critical Assets (items i to vii on page 6) o Recovery and risk mitigation plans for systems within the unit (items viii and ix) o List of Critical Business Processes, and plan for operating these processes when one of the critical assets is not available (item x) o Development of a plan for testing the processing and recovery procedures (item xi) o Results of an actual test of the procedures (item xii) It is highly desirable to develop a timetable would get the full plan done by August, in time for the annual external audit.
1
This date will give the CIO time to combine all the documents into a single document for transmission to the Trustees on January 10.
-3-
Appendix A: Departments to be covered by Drake Business Continuity Plan
Admissions Athletics Business and Finance Campus Security College of Business and Public Administration Cowles Library Dining Services Facilities Head Start Health and Counseling Center Human Resources Institutional Advancement International Center Law School Legal Clinic Mailing Services Marketing and Communications Office of Information Technology President’s Office Provost’s Office Residential Life School of Arts and Sciences School of Education School of Journalism & Mass Communication School of Pharmacy Student Affairs and Student Excellence Student Financial Planning Student Life Student Records/Registration
Leslie Mamoorian Dave Blank Victoria Payseur Hans Hanson Charles Edwards Rodney Henshaw Carla Carlson Jolene Schmidt Georgia Sheriff Sentwali Bakari Venessa Macro John Willey Gretchen Olson David Walker Suzanne Levitt John Murano Brooke Benschoter Paul Morris David Maxwell Ron Troyer Sentwali Bakari John Burney Janet McMahill Charles Edwards Raylene Rospond Wanda Everage Susan Ladd Sentwali Bakari Wanda Everage
Interim Director Director Vice President Director Dean Dean General Manager Director Director Dean Director Vice President Director Dean Executive Director Director Director Chief Information Officer President Provost Director Dean Interim Dean Dean Dean Vice Provost Director Dean Director
-4-
Appendix B: Template for Departmental Information Assets Unit Information
Name of Unit: Head of Unit: Report prepared by: Date report completed2: Approval by Unit Head:
Any general comments:
2
Revision dates should be added over time as the plan is updated. -5-
Information Required For Each Critical Asset
i. ii. iii. iv. v. vi. vii. viii. ix. x. Name of Asset 3 (application): Who owns it (i.e. is responsible for it)4: Who supports it5: Description of software6: Description of hardware7: Location (building and room number): Environmental conditions for departmental hardware8: Plan to recover from loss of physical assets, with the assets listed in the prioritized order in which they would be restored9: Prioritized list of threats10, plus ways these threats may be prevented or mitigated: List of critical business processes which use this asset, prioritized by importance (and hence by the order in which they would be returned to operation: For each critical business process: • Documentation on how the unit would conduct business if the asset were not available11. • Estimated length of time for which that process could be continued without the asset. • How transaction data would be stored during the outage, and then entered into the asset when it was returned to service. xi. xii. Plan to test these procedures for operation and recovery Results of running the test plan.
3 4
This may be a central OIT application, one located within the unit, or one which has been outsourced. This will usually be a functional manager, e.g. DUSIS Finance is “owned” by VP of Business & Finance. 5 This may be OIT, a unit technical person, or an outsourcing company 6 For local applications: name and version number, associated software (e.g. web server), operating system and version number, vendor contact information in case replacement is needed. Not needed for software supported by OIT in Dial. 7 For local systems: detailed hardware configuration, vendor contact information in case replacement is needed. Not needed for hardware in Dial. 8 Air-conditioning, uninterruptible power supply, fire suppression, physical security, backup devices. 9 Not needed for hardware in Dial. 10 These may include virus and hacker attacks, lost or insecure passwords, physical access to systems (including screens which may display confidential information), physical threats such as water damage, fire, vandalism and tornados. 11 E.g. if the DUSIS system were successfully attacked by hackers, or the campus network went down so that DUSIS were unavailable, or if there were a fire in the room in which a departmental server were located.
-6-
Appendix C: Departmental Computer Backups
The Trustees’ Audit Committee has asked for a report on current data backup and recovery procedures. Please document the following: Data being backed up Other software being backed up12 Computers being backed up13 Backup schedule14 Name of backup software program 15 Media type16 Storage location17 Restore procedures18 Test plan for backup and restore procedures 19 As the Risk Assessment in Appendix B is conducted, it is likely that other computers and types of data will be identified as critical, and at that time these procedures will need to be expanded.
12
This includes any applications or system software for which storage media are not available. This also includes software which would be installed on a replacement computer, such as the Windows Registry. 13 Information should include identifying information for each computer, and its location. 14 This specifies when full and incremental backups are performed of the data and other software identified above. 15 e.g., Veritas, Roxio, Retrospect. 16 e.g., DLT Tape, DVD, CD, Iomega ZIP. 17 Where storage media are stored. 18 This is a description of how specific data or system files are restored when needed, and how a whole computer system is be reloaded when needed. 19 Describes how and how often the backup and restore procedures are tested.
-7-