A DISASTER RECOVERY DEAL:
ASKING THE RIGHT QUESTIONS
CASE SYNOPSIS
Disasters -- both man-made and "acts of God" such as hurricanes, earthquakes,
floods -- are on the rise during the past decade. At the same time, dependence on
computerized information systems and networks has increased. The time frame within
which companies must recover from a disaster and be able to conduct business using these
systems has simultaneously decreased. Disaster recovery for outsourcers is even more
critical because many businesses depend on their ability to recover. In fact, outsourcers
such as Financial Information Services1 (FIS) may impact the national and even
international economy should they fail to recover within the necessary time frame. Over
two hundred banks all over the United States depend on FIS for credit card processing,
transaction processing for checking and other accounts, and management information
systems. Yet FIS has planned to recover only three banks. They are spending less than a
quarter of the money their major competitor is spending on disaster recovery. And even
the large banks seem unaware that a recovery plan is not part of their contract with FIS.
Constance Goodman, Assistant Vice President and Director of Corporate
Information Security and Disaster Recovery Planning, faces an ethical and a legal
dilemma. Should banks go out of business because of FIS' inability to recover? Officers of
the company are liable and can be sued personally if they can be shown to be negligent in
their planning for recovery. She has now faced a year of "stonewalling" with executives at
FIS, and knows that the current CFO is either unwilling to spend the money required for
appropriate disaster recovery or is unconvinced that the expenditure is a necessary cost of
doing business. She faces a complex and difficult decision: to resign this executive position
without another job in sight or to continue on salary while knowing that she will not be
supported in effectively providing for recovering customer systems should a disaster strike.
She could also be sued personally by these customers. She has not yet lied to customers,
but they have not asked the right questions. They have asked questions like, "Do you have
a contract for disaster recovery? With whom?" not questions like, "Does your contract for
disaster recovery match your current configuration? How many hours did it take during
the last test to have my systems running again? When was the last test?"
1
All company and individual names are aliases but the case is factual. The case is based on interviews, presentations, reports,
and letters from the disaster planning group of an organization like FIS.
A DISASTER RECOVERY DEAL:
ASKING THE RIGHT QUESTIONS
Dig the well before you are thirsty.
Chinese proverb
Constance Goodman, Assistant Vice President and Director of Corporate
Information Security and Disaster Recovery Planning for Financial Information Services
(FIS)2 believed the probability for disaster to be significant. Not only were natural and
man made disasters on the rise generally and predicted to increase over the next decade,
but FIS was disaster prone in many ways (Table 1).
Table 1
"How to" Disaster Recovery Planning
Business Impact Analysis
Risk Assessment
Identification of Mission Critical Functions
Company Insurance Policies
Experienced, Full-Time DRP leader
Information Backups
Off-site storage
Documentation of Vendor Supplied Products
Coverage for Possible Situations:
employee sabotage, power interruptions, severe, damaging weather (tornadoes,
hurricanes, ice storms, wind, flooding, etc.), fire, other "acts of God" (e.g.,
earthquakes), bombs, crashing aircraft, gas leaks, hazardous transportation spills,
proximity to nuclear power plants, structure problems with older buildings
Executive Protection Plan
DRP Teams
Employee Replacement Program should deaths or disability occur
Media Coverage
Continually Review and Update Plan
Test for Disaster Recovery
The company's data center was near a major expressway. One wreck of a truck carrying
poisonous gas could cause an evacuation. If the evacuation lasted more than three days,
FIS and most of it customers could not sustain their business. And FIS could not recover
the systems at some other location and continue operations. The data center was also close
to a regional airport and on the path of aircraft coming and going. One crash could
disrupt operations. Physical security was lacking as well. A large parking area
underneath the data center was unsecured. One truck with explosives, one disgruntled
employee, and many people's lives and livelihood would be affected. The region was also
2
All company and individual names are aliases but the case is factual. The case is based on interviews, presentations, reports,
and letters from the disaster planning group of an organization like FIS.
susceptible to tornadoes. Yet the senior management of the organization seemed
unmovable in their belief that a disaster was not going to happen to them. Constance had
heard the CFO tell customers that if their credit card processing were disrupted, they
would pay the credit card company to process for them, but Constance did not believe
these companies could handle the volume of accounts now being processed by FIS. She
also knew that FIS' budget for disaster recovery was less than 25% of its major
competitor's budget.
PROBLEMS IN DISASTER RECOVERY PLANNING AT FIS
FIS was growing so rapidly that within a year of signing a contract for test and
recovery with a major vendor, the contract was outdated, and the CFO refused to spend
the money to upgrade the contract. A new system, using DB2, had been introduced and
loading the DB2 tables had increased recovery time significantly since the contract was
negotiated. After twenty-one years in disaster recovery planning and information security,
Constance was aware of the problems in the field (Table 2). Yet, she found herself in a
particularly untenable position for many reasons. She had taken the position because she
saw opportunity. Federal examiners had cited the company for their laxness in this area
and had mandated that FIS rebid their disaster recovery contract. Within six months of
joining the company, she had closed all open audit items and for the first time in six years,
an audit was conducted which had no audit items for correction. The auditors met with
FIS' Board of Directors and cited her for outstanding work. The contract had been rebid
and another vendor selected. She had successfully argued for moving the DRP up in the
organization and at the time of vendor selection, had reported directly to the Senior Vice
President of Operations, who reported directly to the company President. Then a
reorganization occurred and she reported to a Vice President who reported directly to the
Chief Financial Officer. Now a year later, she knew that the VP to whom she now reported
was stonewalling her progress and undermining her work with subordinates. He had even
told one subordinate, "I'm going to get rid of her."
Constance was haunted by the notion that the failure to recover would mean job loss
and affect the lives of thousands of people. FIS acted as an outsourcer for credit card
processing and other information systems in over 200 banks. The present plan for disaster
recovery accommodated only three of the 200 banks. FIS provided transaction processing
for checking and other accounts as well as management information systems. She felt the
company had a moral responsibility to see that all banks could continue to operate if their
outsourcer experienced a disaster. She knew she also faced a legal dilemma should a
disaster occur. As an officer of the company, and the officer designated with the
responsibility for disaster recovery, she was liable for recovery and could be sued
personally by client banks should FIS fail to recover. Clearly, Constance had to implement
a sound disaster recovery plan or leave the company. She felt like the clock was ticking on
a time bomb.
Table 2
On-Going Problems in Disaster Recovery Planning (DRP)
1) Organizational structure --position of DRP
2) Lack of organization support
3) Lack of internal experience
4) Considered a part-time assignment
5) Questionable vendor practices and contracts
6) Financial limitations
7) Lack of time and resources
8) "Don't Need This" attitude
9) Considered too hard, impossible
Executive positions like hers were rare. Companies have only one DRP leader if
that, and DRP is usually not placed as high in the organization as at FIS. Until the
reorganization, it was placed even higher than now, but she had retained her position as
Table 3
Critical Issues in Disaster Recovery Planning (DRP)
Business Resumption Planning
Contingency Planning
Recovery Vendors
Computers, Networks, Processes, Printing, Buildings --Facilities
Sophisticated Backups
Sophisticated Insurance
Experienced DRP Personnel and Managers; Placement in Organization
Disaster Recovery Software
"One Stop" Shopping for Disaster Recovery Facilities, Backups, Insurance,
Software
a company officer. When she had been applauded for resolving audit problems and
promoted, she had bought a new home and settled in the area. What had looked like a
promising situation with a promising company seemed to have turned into a nightmare
after she recommended that FIS change disaster recovery vendors. Selection of recovery
vendors was a critical issue in recovery planning (Table 3). With the compressed business
cycle experienced by most organizations, business must resume within three days of a
disaster.
RECOVERY VENDOR SELECTION
Before Constance joined FIS, Requests for Proposals (RFP) had been sent to
disaster recovery vendors. However, these requests needed clarification, according to
vendors, so one of Constance's first actions was to issue a letter of clarification to the
vendors. This letter established the minimal requirements for FIS' current business. In
addition, site visits to vendors' proposed hot site facilities were conducted and vendor
references were checked including an informal survey of present customers. TechShare
had the current contract with FIS and had established a relationship with the CFO
through its services and through the many social functions sponsored by TechShare.
TechShare's original response to the RFP did not meet the minimum requirements set
forth by FIS for recovery. The CPU and DASD configurations proposed would satisfy
current testing needs only, not allow for growth or even full recovery. Further, Techshare
stated that at the time of a disaster, they would make a "best effort" to obtain a required
second mainframe at a fair market price. No guarantees for full recovery were offered.
The minimal requirements for tape drives were not met. After the letter of clarification,
TechShare did meet the requirements for CPU, DASD, and second mainframe. However,
Constance's department believed and stated in their report that TechShare "cannot
support FIS' growth or stay current with future technological advances in hardware and
software without shifting tremendous costs to FIS . . . In such a scenario, FIS would not
only be paying for its own advances but also the advances of TechShare as well."
The second vendor, BlueSky, "met or exceeded most of the requirements requested
by FIS in the RFP and letter of clarification," according to the DRP department. Their
support also included a Recovery Network Analysis and Business Impact Analysis within
the first six months of the contract. Further, BlueSky offered different configurations for a
test environment and for full recovery, thus lowering the total cost. The third vendor,
SunnyDaze, did not meet the minimal requirements set forth in the RFP. The "CPU
configuration proposed had two hundred MIPS less than FIS' current production
configuration . . . DASD and tape devices did not meet minimum criteria requested." The
same held true after the letter of clarification. The DRP department concluded that
SunnyDaze "cannot meet FIS' current or future recovery needs."
Vendor references were checked for TechShare and BlueSky. Each vendor
provided references in the original RFP. Five questions were asked of each reference:
1. How long have you been a subscriber?
2. How do you like their service?
3. What, if anything, do you dislike about their service?
4. Have you ever used another recovery vendor? If so, which one?
5. What was your experience with that vendor?
TechShare's five references were generally positive with some exceptions. In answer to
what was not liked about their service, one reference reported that during a test,
TechShare's facility experienced a total power failure and TechShare was unable to
recover for four hours. The client asked for compensation for the four lost hours and
TechShare has yet to respond to that request. Further, this client added: "We've also
found they are very sneaky about contract stuff. Be careful when reading their contracts.
They will sneak in things to try and trap you without your knowing what you've signed
for." Another reference provided by TechShare also complained about the testing facility:
"They want us to schedule our test time a year in advance . . . They're not easy to work
with regarding trying to test over a weekend." This client had been with TechShare for
seven months, having just left BlueSky because "they made too many small mistakes. It
became too much of an effort for us." Another client who had been with TechShare for ten
years expressed a "concern about their organization in general. About their ability to stay
in business and satisfy our requirements. And, we're concerned about our hot site location
because they have so many subscriptions sold on that facility. We wonder if we could get in
there if we had a disaster, especially one affecting our geographic region."
Table 4
Vendor Comparison
Vendor Hardware Monthly Cost Annual Cost
TechShare (2) IBM982, 5.8 TB DASD $61K $732K
(3) Tandem Himalayas
BlueSky For Testing: $46.3K $555.6K
(1)IBM982, 3TB DASD
(1) Tandem Himalaya
For Recovery:
(2) IBM982, 5 TB DASD
(2) Tandem Himalayas
Network Recovery Analysis
Business Impact Analysis
SunnyDaze (1)IBM982, (1)IBM962, $47.5K $570K
3.4TB DASD
(2) Tandem Himalayas
BlueSky references were positive, with one client noting initial difficulties with the
hot site facility as did another client. This client noted, however, that BlueSky had offered
free test time as compensation. This second client, who had been with BlueSky for five
years, had also used TechShare and SunnyDaze: "SunnyDaze wouldn't step up to the plate
hardware wise. We were outdistancing SunnyDaze rapidly and they didn't want to keep
up. . . One particular area with TechShare is test time. Test time is a nightmare. You can't
schedule it with them. You have to go over a year out for test time. Also they kept making
stupid mistakes. We had to send out our technical people to help them solve their technical
problems." Another client who had also used SunnyDaze and TechShare said that both
build into their contracts the right to raise prices every year. "BlueSky lives up to their
fixed price contract."
Constance developed a critical factors matrix to evaluate vendors(Table 5).
Table 5
Critical Matrix Requirements
1=not applicable; 2=critical requirements not met; 3=critical requirements not met with
exceptions; 4= critical requirements satisfied; 5= critical requirements exceeded
FACTORS TechS BlueS Sunny
hare ky Daze
FACTORS TechS BlueS Sunny
hare ky Daze
Network 3 3 2
Tandem Himalayas (computer configuration) 4 4 3
Annual test time 3 4 2
Mainframe configuration (IBM 982) 3 4 3
DASD 3 4 2
Pricing 3 4 3
Ability to schedule a test within 6 months 2 5 3
Contract termination penalty 3 3 3
External Hot Site Logistics (availability, easy access to airports, 2 4 1
hotels, transportation, traveling distances, etc.)
Internal Hot Site Facility Site (physical security, floor plans, 3 5 1
customer services and facilities, conference rooms, access to
technical manuals and technical support personnel, condition of
facility)
Client References 3 4 1
Remote Facilities (same issues as hot site external and internal) 3 4 1
Value Added Services (services bundled with the contract) 3 5 1
Declaration Fee (signing fee in range of $25,000) 4 5 2
Upgrade Flexibility (upgrade hardware and contract) 3 4 2
Downgrade Flexibility (downgrade hardware and contract) 3 4 1
TOTAL 48 66 31
Because she then reported directly to the Senior Vice President of Operations, her
recommendations and the analysis upon which they were based were a key factor in
selecting BlueSky as the disaster recovery vendor. Subsequently, the reorganization took
place and DRP was no longer reported to the Senior VP of Operations, who had reported
to the President. Instead, DRP and Constance reported to a Vice President who reported
to the CFO. Soon after, she learned that the CFO had held private meetings with
TechShare.
Within the year, FIS grew more rapidly than expected, but the CFO refused to
upgrade the contract to reflect FIS' added capacity. A new system employing DB2 was
implemented and preliminary tests indicated that the system's tables could not be loaded
within the three day time frame for recovering all systems. The response to Constance's
memos describing the dilemma was continuing requests for additional information. When
that information was provided, a change in the request was made or even more
information requested. But no action was taken to upgrade the BlueSky contract.
Constance began to feel more and more uneasy when dealing with present
customers and potential customers. She feared that they would ask the right questions,
questions like
How does the mainframe configuration of your disaster recovery vendor's
hot site compare to the one presently used at FIS?
When was your last recovery test?
How long did it take to get our systems fully functional? Can you provide
documentation of our last recovery test?
What is your network recovery plan for us?
But so far, even the larger banks made dangerous assumptions. They asked, "Who is your
disaster recovery vendor?" They assumed that a contract for recovery meant safety.
CASE ANALYSIS QUESTIONS
1. Why is disaster recovery an increasingly important topic?
2. What, if anything, did you learn from reading this case which surprised you?
3. If you were Constance, what would you do now?
4. Would you ask the same questions in your vendor reference survey? Would you ask
questions not included? Please explain briefly the rationale for any revisions or
additions which you would suggest.
5. Study the Critical Matrix Requirements table (Table 5). Are the scores meaningful?
What is the purpose of this exercise? Can you suggest ways to make the scores and
the exercise more meaningful?