Assessing the Robustness of a Vendor by vverge

VIEWS: 6 PAGES: 13

									Assessing the Robustness of a Vendor’s Data Center, Part One: Qualitative Measures
Last updated Aug 11, 2006. Many companies today rely on other suppliers to provide IT services of one form or another to support their core mission. This is especially true in the residential mortgage industry where mortgage companies rely on specialized firms for credit checks, appraisals, inspections, flood information, and past lending history. Each of these support firms depend heavily on their data centers to provide current, reliable information to their mortgage clients. Mortgage clients, in turn, and very interested in the reliability and disaster recoverability of these supplier's data center. One of my mortgage clients recently asked me to develop a survey of questions on which we could determine the reliability and recoverability of these supplier data centers. I present the results of these surveys in this two-part article. Part One focuses on the following eight qualitative measures:

       

Physical Characteristics (Table 1) Power Configurations (Table 2) Fire Detection and Suppression (Table 3) Network Operations Center (Table 4) Network Configuration (Table 5) Data Configuration (Table 6) Business Continuity (Table 7) Customer Support (Table 8)

Each measure consists of a table containing between 5 and 17 questions intended to describe various aspects of a vendor's data center. The vendors received the entire set of tables and returned their responses for review. The eight tables of qualitative questions are shown below. In Part Two I will discuss the quantitative measures. Table 1 Physical Characteristics

#

Question

Response

1

How is the building accessed (key card, mantraps, bio metrics, etc.)?

2

How is the data center accessed (key card, bio metrics, etc.)?

3

Among employees and customers, who has access to the building and to the data center?

4

Do executives, developers, sales people and operators reside in the same location as the data center?

5

Is the data center identified with signage within or outside of the building?

6

What is the amount of total data center floor space and how much of it is currently used?

#

Question

Response

7

What is the amount of total raised floor space in the data center and how much of it is currently used?

8

What is the depth of raised floor space?

9

To what extent does your data center use floor tiles or ladder racks?

10

If you use floor tile, do you use tilt detectors?

11

What is the turnover ratio for the data center?

12

Is your data center air conditioning system segregated from the rest of the building air conditioning?

Table 2 Power Configurations

#

Question

Response

1

How many separate power grid feeds come into the data center?

2

What is the maximum capacity (in KVA) of your uninterruptible power supply (UPS) system?

3

What percentage of your UPS capacity would be used if activated?

4

Which equipment, if any, is not on the UPS (air conditioning, monitors, consoles, printers, etc.)?

5

How frequently is maintenance performed on the UPS system?

6

How frequently do you test failing over to the UPS system?

7

How many banks of batteries do you have to support your UPS system?

8

For how many minutes will batteries support your full load?

9

How frequently is maintenance performed on your batteries?

#

Question

Response

10

What is the maximum capacity in KVA of your backup generator?

11

What percentage of your generator capacity would be used if activated?

12

How many hours could your generator run on one tank of fuel?

13

How often do you conduct tests of running your generator?

14

How often do you conduct tests of failing over to your generator?

15

Does your commercial street power route through an automated transfer switch?

16

Is there a bypass circuit in case the UPS fails?

17

How many power outages have you experienced during the past year?

Table 3 Fire Detection and Suppression

#

Question

1

What type of smoke and fire detection system do you use?

2

How often is your smoke and fire detection system inspected?

3

How often is your smoke and fire detection system tested?

4

What type of fire suppression system do you use (halon, CO2, water, dry chemical, etc.)?

5

If using a water sprinkler system, is it active (wet or charged) or is it passive (dry or uncharged)?

6

Are there other water pipes directly above your data center?

7

Is your fire suppression system integrated with any other environmental systems?

8

How close is your nearest fire department?

Table 4 Network Operations Center

#

Question

Response

1

What type of network operations center (NOC) do you have?

2

What are the typical weekly hours of coverage for your NOC?

3

How does your NOC monitor the status of systems, networks and their components?

4

How does your NOC monitor incoming requests?

5

How does your NOC ensure that requests are being executed in a timely manner?

6

How are any exceptions to normal operations in the NOC communicated?

7

How are any exceptions to normal operations in the NOC escalated?

8

Who handles tickets assigned to the NOC?

9

Who has overall responsibility for the operation of your NOC?

10

What are the number and experience levels of staff working in your NOC?

11

Which environmental readings does the NOC monitor (moisture, temperature, humidity, etc.)?

Table 5 Network Configuration

#

Question

Response

1

How many separate voice circuit feeds come into the data center?

2

How many separate data network feeds come into the data center?

3

To what extent does your network backbone have redundant components (routers, hubs, etc.)?

4

What type of load balancing do you use (round robin, load, etc.)?

#

Question

Response

5

To what extent do you use IDS or Net Security tools?

6

Who among engineers, managers and developers can administer your network?

7

How are data and development environments segregated on the network?

8

Which of your server types (apps, database, web, file, email, etc.) are not clustered?

9

Is your clustering of servers in active/active mode or active/passive mode?

10

How often do you test your cluster environment?

Table 6 Data Configuration

#

Question

Response

1

What is your data backup policies concerning daily and weekly backups, logs and databases?

2

How many copies of backups are made and where are they stored?

3

How long are backup copies retained and how close is the offsite storage facility?

4

Who is authorized to restore data and what is the expected restore time?

5

Who do you use for your offsite storage services?

6

How often do you visit and audit your offsite storage provider?

7

Who can order a retrieval of an archived tape from your offsite storage provider?

8

Who has access to production data?

9

To what extent is production data used in the development, testing and staging environments?

Table 7 Business Continuity

#

Question

Response

To what extent does your business continuity plan (BCP) include the following:

1

Critical business processes?

2

Critical IT applications?

3

Critical input dependencies?

4

Critical output dependencies?

5

Contact information of key recovery team members?

6

Contact information of key suppliers?

7

Contact information of key customers?

8

Contact information of other key individuals?

9

Listing of critical telephone circuit numbers?

10

Vital Records?

11

How are vital records protected?

12

How many onsite spares of hubs, servers, routers, switches, and load balancers do you maintain?

13

How does your BCP specify how support staff is to work with customers during an outage?

14

How frequently is your BCP updated?

15

How frequently is your BCP tested?

#

Question

Response

16

Does your support staff have access to a hardcopy of the BCP at all times?

17

How close are emergency services to your site?

Table 8 Customer Support

#

Question

Response

1

What are the typical weekly hours of coverage for your customer support department?

2

What are the number and experience levels of staff working in customer support?

3

What are your current call volumes on a monthly basis?

4

Describe your escalation process.

5

What types of background checks are performed on your employees?

In Part One of this two-part series on assessing the robustness of a vendor's data center, I described eight specific measures for evaluating such facilities. In this Part Two I discuss the following six qualitative measures:
Web Environment (Table 1) Development (Table 2) Database Administration (Table 3) Security (Table 4) Operations (Table 5) Product Quality (Table 6) These measures can be weighted by the supplier or the client or both, and rated by the suppliers, and then verified by an outside party. In my client's case, they weighted the measures and a colleague and I performed the verification of responses by visiting the sites of the suppliers. The quantitative measures described here centered on the technology and standards that the vendors used in their data centers. Table 1 Web Environment

#

Question

Response

Weight (1-to3)

Rating (1-to5)

Score

1

On what platforms are your applications certified to run?

2

What technologies do your web applications use?

3

Is there any business logic stored in the web application?

4

What web interfaces do you expose (synchronous, asynchronous)?

Table 2 Development

#

Question

Response

Weight (1-to3)

Rating (1-to5)

Score

1

To what degree does Argent maintain ownership of their proprietary changes?

#

Question

Response

Weight (1-to3)

Rating (1-to5)

Score

2

What user customization features do you offer (work flow, rules, field description, report layout)?

3

What is your standard rate for customization?

4

What is your customer support model for application support (including customizations)?

5

On what language and platform is your system designed to run on? (dev lang, architecture, DB)

6

What is the architecture of your system?

7

Do you have user/admin/developer documentation for your system and its interfaces?

8

What integration methods do you support (web service, FTP, API)?

9

What data abstraction methods do you support (XML, X.12, fixed length, CSV)?

10

What is your plan and process for legacy conversation?

11

What makes your user interface easy to use?

Table 3 Database Administration

#

Question

Response

Weight (1-to3)

Rating (1-to5)

Score

1

Do you provide an open access architecture to our data and metadata?

2

How do you supply ad-hoc and customized reporting?

#

Question

Response

Weight (1-to3)

Rating (1-to5)

Score

3

How do you support external data warehousing requirements (real time and batch)?

4

How do you handle integration with other systems?

5

How is your database optimized (transactions or reporting)?

6

How do you replicate your data for recovery?

Table 4 Security

#

Question

Response

Weight (1-to3)

Rating (1-to5)

Score

1

Where do you store user credentials?

2

Do you support role base security?

3

Do you integrate with external security repositories (LDAP,AD)?

4

Does your application support single sign-on (SSO)?

5

What is your transmission security method?

6

Is our data stored with other customer’s data or in our own repository?

7

What type of security logging and reporting do you offer?

8

What methods do you have for privacy legislation (protecting names, ssn, property address, pins, etc.) and how is it audited?

#

Question

Response

Weight (1-to3)

Rating (1-to5)

Score

9

Are you SAS 70 certified?

Table 5 Operations

#

Question

Response

Weight (1-to3)

Rating (1-to5)

Score

1

To what extent is your BCP comprehensive and up-todate?

2

Do you have a customer friendly version of your BCP?

3

Do you have an second site (web, app, DB servers)?

4

What is your availability architecture (load balancers, server config, DB config)?

5

How do you test your systems performance (load testing, scaling abilities)?

6

What is your current system threshold? (how many transactions can you handle?)

7

What monitoring capabilities do you have to ensure uptime of your system?

8

Is there any client software to role out? (Active X, applets, rich client)?

9

What is your monthly percent availability (four 9s, five 9s)?

10

What are your bandwidth requirements? Based on what volume?

11

How do you handle system latency?

#

Question

Response

Weight (1-to3)

Rating (1-to5)

Score

12

Describe the content and currency of your service level agreements (SLAs).

Table 6 Product Quality

#

Question

Response

Weight (1-to3)

Rating (1-to5)

Score

1

Do you have a customer friendly version of you SDLC process?

2

Do you use version control tools for your system?

3

How many stages do you have to move code from development to production?

4

Who has rights to move code from testing and staging to staging and production?

5

Do you use a defect tracking system to manage bugs and enhancements?

6

What type of formal change control process do you use?

7

How often do you release changes to production?

This concludes the two-part series on assessing the robustness of a vendor's data center. In Part One I described eight specific qualitative measures that focused on the physical facility and on the plans, processes and procedures used to ensure reliable and recoverable operations. In Part Two I discussed six separate quantitative measures that could be weighted and rated by the suppliers, and then verified by an outside party. These quantitative measures centered on the technology and standards in use by the data center. As I mentioned at the outset, a recent mortgage client of mine asked me to develop this series of surveys to help evaluate the reliability and recoverability of the data centers of several of my client's key suppliers. These suppliers provided the client with required information such as credit checks, appraisals, inspections, flood information, and past lending history. A colleague and I used these assessment forms with four different suppliers. After receiving the completed forms back from each supplier, we visited their data centers to validate their responses. For the most part

their responses were valid although a few needed clarification. It is my intent for you to use these forms as they currently exist, or to modify them to suit your specific needs. At the very least they can serve to provoke some meaning discussions with key suppliers concerning the reliability and recoverability of their data centers. Your eventual processed data deserves no less. Case Study: Lessons Learned from a World-Wide Disaster Recovery Exercise, Part One: What Did the Team Do Well | Next SectionPrevious Section


								
To top