What is High Availability?
When do you need it?
How do you create it ?
How do you measure it?
Lee Hampton-Whitehead
What is High Availability?
“High availability is a system design
protocol and associated
implementation that ensures a certain
absolute degree of operational
continuity during a given
measurement period.”
Source http://en.wikipedia.org/wiki/High_availability
What is High Availability?
• Minimum service / application outages
• Uptime is not the same as availability
• Planned outage considerations
• Quality of Service
When do you need it?
Any system failure that may impact business
function is a candidate for a HA solution
• Internal Own Use systems (Email or Resource
management)
• Customer Advisor (Helpdesk or CRM)
• External Customer Facing (Online Shopping or
banking)
When do you need it?
The cost of failure
• Financial
– Loss of orders
– Share Price
– Contractual penalty clauses
• Brand Value
• Customer satisfaction and loyalty
Other considerations
• Regulatory requirements
• Safety Implications
When do you need it?
Business impact analysis is required to assess
• The cost of service outage
• Total Cost of Ownership (TCO)
• Return on Investment (ROI)
• Budget spend
When do you need it?
Risk analysis factors
• Time to recovery
• Data impact of outage or recovery
• Associated hardware software and resource costs
• Probability of event
• Implications on the business
• Complexity risks
• Third-party solutions
• Pros &Cons
Design Sizing Criteria
• Number of transactions
• Size of transactions
• Secondary load considerations
Service Level Agreement
• Agreed availability criteria
• Measurement techniques
• Penalty clauses
• Capacity, Performance and usage
• Help desk support
• Contingency
• Costing
Costs
The cost implications of most availability solutions
include, but are not limited to, the following:
• Hardware
• Software
• Network infrastructure
• Training
• Serviceability
• Operational costs
Costs
How do you create it ?
Redundant Infrastructure
Effective removal of all SPOF (Single Point Of Failure)
• Power
• Network (including LAN, WAN, DNS etc)
• Server hardware
• Storage and associated interconnect
• Air conditioning
How do you create it ?
Web
Servers
Corporate
Network
Load
Users Balancers
SQL
Servers
How do you create it ?
Load Balancing
• Typically used for WEB based applications
• User requests are distributed across a farm of
servers
• The Load Balancing technology will
redistribute the application traffic in the event
of a server failure
• All servers / hardware actively participate in
delivering the application
How do you create it ?
Failover Cluster
• Typically used for the database components of
an application
• Can be used for File Serving, Print Serving and
Storage solutions where a load balanced may
not be suitable
• A Typical failover cluster deployment (Active-
Passive) may result in under utilised hardware
How do you create it ?
Data Replication / Mirroring
Data is central to any business application
• Log Shipping
• Transactional replication
• Database mirroring
How do you create it ?
Corporate Infrastructure
Basic Requirements
• IP Routed Network
• DHCP
• DNS
• AD
How do you create it ?
Proactive System Monitoring
• Indentify and correct minor / non service
affecting problems before they become major
outages
• Baseline Server utilisation values
• Application / Service monitoring should be
representative of typical user activities
• HA Testing
How do you create it ?
Application Architecture
• Poorly written code can destabilise any HA
implementation
• The Application should be able cope with the
loss of transitory data
• Data Consistency , Use of “Transaction”
How do you create it ?
IT Processes
• Restricted / managed system access
• Software development lifecycle
• Infrastructure life cycle
• Effective service wrap
How do you create it ?
Unapproved changes
Availability Comparison
• 95%
36.5 hours/month or 18.25 days/year
• 99.9% (“Three nines”)
43.8 minutes/month or 8.76 hours/year
• 99.99% (“Four nines”)
4.38 minutes/month or 52.6 minutes/year
• 99.999% (“Five nines")
26.2 seconds/month or 5.26 minutes/year
How do you measure it?
SLA
The Service level agreement will define a
good, poor or failed service in terms
• Service times
• Response times
• Fault resolution times
The SLA will also define how these values are
to be measured
How do you measure it?
Measurement considerations
• Users Perspective
• Monitor every component
• Synthetic transactions
• Simplicity
Availability Report
Summary
• Indentify Business requirements
• Risk Assessment
• Decide on ‘appropriate’ solution
• Indentify and address any SPOF
• Application design is important to ensure high
availability
• Implement and Follow Procedure
• Consider the user story
• Rigorous testing
Q&A