Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Source Child Partition - Microsoft by wuzhenguang


									Locking, Blocking & Deadlocking
Database Snapshots, Log Shipping
Mixed Architectures, Feature Interaction
  Locks – granularity/escalation
  Controlling granularity, type and wait time
  Blocking – is it really a problem?
  Minimizing blocking
  Detecting blocking
  When a live-lock becomes a deadlock
  Deadlock resolution
  Unhandled deadlocks
  Avoiding deadlocks
  Deadlock analysis and trace details
Granularity (chosen at compilation)
   Table level
Escalated at execution if resources (to handle all of the
smaller locks) are not available
   Row  table-level (or row to partition if table supports partition level locking)
   Page  table level (or row to partition if table supports partition level locking)
   Locks do NOT escalate row  page  partition/table
       Generally, lock escalation is desired as it results in less resources being needed to handle a
       query/transaction and often the query can complete faster
       However, if you believe lock escalation is causing you grief you can disable it (for the entire
       instance) with a trace flag - see KB Q323630
       SQL Server 2005 has two trace flags, one allows lock escalation under memory pressure
       (1224), one does not (1211)
       SQL Server 2008 ONLY supports partition-level lock escalation when set at the table-level
       through ALTER TABLE
What is ‘lock escalation’?
  When there are too many locks and SQL Server takes a
  lock higher in the hierarchy to save lock memory

    Database   S             Database   S

    Table      IX            Table      X

    Page       IX

    Row        X
SQL Server 2005 introduced partitioning, which
some customers use to scale a query workload
  Another common use is to streamline maintenance and
  enable fast range inserts and removals from tables


                   FG1                 FG2                 FG3

     Partition 1         Partition 2         Partition 3
Lock escalation on partitioned tables reduces
concurrency as the table lock locks ALL partitions

 Query 1                                    Partitioned          Query 2
                                IX          table

Update                                                         update
         Partition 1         Partition 2         Partition 3

                       FG1                 FG2                 FG3

Only way to solve this currently is to disable
SQL Server 2008 allows lock escalation to the partition level,
allowing concurrent access to other partitions

 Query 1                                    Partitioned          Query 2
                                IX          table

Update                                                         Update
         Partition 1         Partition 2         Partition 3

                       FG1                 FG2                 FG3

Escalation to partition level does not block queries on other
Escalation setting is per-table, set with ALTER TABLE:
   ALTER TABLE mytable SET
AUTO: Partition-level escalation if the table is partitioned
TABLE: Always table-level escalation
DISABLE: Don’t escalate unless absolutely necessary

How to tell what setting a table already has?
   SELECT lock_escalation_desc FROM sys.tables WHERE
   name = 'mytablename'
There is no tool (SSMS) support for ALTER TABLE
Locking can be monitored using the DMV
   FROM sys.dm_tran_locks
   WHERE [resource_type] <> 'DATABASE'
Table level escalation will show:

Partition level escalation will show:
Be careful! Lock hints can negatively affect:
   Performance – the query may take long to run
   Concurrency – the query might run faster but other users must
   wait… is that really faster?
ROWLOCK – use row level locking instead of table or page
   This may not be possible depending on resources and may
   force the query to take longer but give you better concurrency
PAGLOCK – similar to row level but page level instead, this
may be a better choice depending on data distribution
TABLOCK – better for off hour updates when concurrency
might not be a problem
UPDLOCK – requests that an update lock be used
(instead of a shared lock – typically used when you
plan to do modifications within the transaction)
   This can only be used at the row or page level
   Update locks do not exist at any other level
XLOCK – requests an eXclusive lock at the row,
page, or table level
   This can negatively affect concurrency while
   increasing performance – for this
   Used if you want to bypass update locks and
   disallow readers
For more information see “Locking Hints” in the BOL
Table-Level Changes
   FROM clause, per table (no spaces)
     Level 1 – READCOMMITTED (locking)
     Level 1 – READCOMMITTED (versioning)
       Only in 2005 and only if the database option to
       Can be overridden with READCOMMITTEDLOCK

      JOIN dbo.publishers
Session-Level Changes
   Session level settings impact entire session but can be
   overridden with table-level settings
      Level 0 – READ UNCOMMITTED
      Level 1 – READ COMMITTED
      Level 2 – REPEATABLE READ
      Level 3 – SERIALIZABLE

   Only in 2005 and only if the database option is on
Default behavior - “Read Committed”
“Read Committed Isolation Using Row Versioning”
   ALTER DATABASE <database_name>
                       “Read Committed
“Snapshot Isolation”
   ALTER DATABASE <database_name>
Both database options can be turned on as well:

   ALTER DATABASE <database_name> SET READ_COM...
   ALTER DATABASE <database_name> SET ALLOW_SNA...

Default Behavior
  All phenomena – except dirty reads are possible,
  even in the bounds of a single select query
  In volatile databases, a long running query may
  produce inconsistent results
      Can increase isolation to remove phenomena
      Increasing isolation requires locks to be held
      This can create blocking

  No phenomena are possible in the bounds of a
  single statement
  In volatile databases, a multi-statement transaction
  may yield different results for different access of
  the same data
  Each statement is consistent but only for the
  execution of that statement, not for the life of the
  transaction (if the transaction has multiple
  Each time data is read by a new statement the
  latest version is used

In the Context of a Multi-Statement Transaction

 BEGIN TRAN                      RCSI                 tempdb
 sql                             • Q1 > Q2
 Q1 = SELECT count(*)            • Q1 < Q2
       FROM dbo.tname            • Q1 = Q2
       WHERE country = 'USA'     • Q1 and Q2 are both
                                 guaranteed to be accurate as of
 sql                             the beginning of the statement
 Q2 = SELECT count(*)            and the “version” of the row
       FROM dbo.tname            cannot change for the life of the
       WHERE country = 'USA'     statement
 sql                             • RCSI guarantees the accuracy
 COMMIT TRAN                     of the statement but the data can
                                 change and read differently by

  Setting ALLOWS users to ask for snapshot isolation – NOT
  on by default
  ALL phenomena and ALL default locking behavior is
  EXACTLY the same unless you explicitly ask for snapshot
  Once requested, no phenomena are possible in the bounds
  of a transaction running under snapshot isolation
  In volatile databases, a multi-statement transaction will
  always see the transactionally accurate version which
  existed when the transaction started
  Versions must stick around longer, multi-statement
  transactions may have conflicts

In the Context of a Multi-Statement Transaction

 SET TRANSACTION ISOLATION      Snapshot Isolation
     LEVEL SNAPSHOT             • Q1 = Q2
 BEGIN TRAN                     • Rows accessed by Q1 are
 sql                            consistent at start of
 Q1 = SELECT count(*)           transaction
        FROM dbo.tname          • Data isolated at start of
        WHERE country = 'USA'   transaction and ensures that
 sql                            the transaction sees the same
                                data at Q2, even if changed by
 Q2 = SELECT count(*)           other transactions
       FROM dbo.tname           • Uses the row version store in
       WHERE country = 'USA'    tempdb
 sql                            • Rows are accessible to other
 COMMIT TRAN                    transactions

        Locks guarantee consistency
        First incompatible lock request waits
        All other lock requests (even if compatible with locks
        currently held) WAIT
        Why? To prevent lock starvation

       Imagine 3                                                     Then an eXclusive
    compatible readers     Even compatible                           Lock is requested
S                            shared locks   S                S
               header                                                        header
                            must WAIT (or
S              Row
                            queue) behind S         X       S                Row
               Row         the incompatible
    S                             lock                           S
                                                eXclusive lock

                                                  What is this called? LIVE Lock
At the Expense of Blocking
    No other transactions can invalidate the data set
    through modifications
    Is this always what you want or need?
        Queuing applications typically want locks, if
        someone is modifying that row – we want to go
        to another not see last transactional state
        Very volatile prices – don’t want to give them the
        last price if it’s currently being updated, so wait…
        (but the update’s fast)
    What about long running transactions or cases
    where you don’t need absolute current…
Write efficient transactions – keep them short and in
one batch
Do not allow interaction in the midst of the batch
Use indexes to help SQL Server find – and lock –
only the necessary data
Consider estimates for long running queries and/or
migrating data to a secondary analysis server
Problems with locking becoming blocking are often
when long running (and conflicting) transactions
Dynamic Management Views
System stored procedures
SQL Profiler
Performance Monitor counters
PSS Script - sp_blocker_pss80
Blocked Process Report
Deadlock events and trace flag 1222
SQL Server 2008 adds
  Performance Data Collection
  Extended Events
SSMS Reports – Top Transactions by Blocking
Using DMVs
  sys.dm_tran_locks and sys.dm_os_waiting_tasks
  sys.dm_exec_requests CROSS APPLY
    Shows the last command submitted
Using system procedures
  sp_who, sp_who2, sp_who3
    Shows whether or not connections are being blocked
    (blk column in sp_who, BlkBy in sp_who2)
    sp_who2 adds
       LastBatch to show you when the last command was submitted from the
       CPUTime and DiskIO to show you relative activity for the transactions
    sp_who3 is rewrite of sp_who2 using SQL Server 2005 DMVs
Programmatically collect process (spid) information by
directly querying master.dbo.sysprocesses or DMVs
SQL Server 2008 adds system data collection
Collect lock information
   SELECT * FROM sys.dm_tran_locks
Use sp_blocker_pss80 to help determine who’s at the
head of the blocking chain
Collect SQL Profiler trace
  Start with Troubleshooting Performance
  Problems in SQL Server whitepaper
Collect Perfmon or Sysmon log
  Consider using
  Create a table and populate it with
  INSERT tname
    SELECT GETDATE(), relevant fields
  FROM sys.dm_os_performance_counters
    WHERE <sqlcounter information>
SQL Server Blocked Process Threshold
Set at an instance level
   sp_configure 'blocked process
   threshold', n (secs)
Each spid blocked for n seconds triggers an event
that can be caught by
   Event Notification
   WMI - SQL Agent Alert
A general event-handling system
Part of SQL Server 2008
   Used for problem diagnosis and info gathering
   Used to implement auditing
Engine lives in sqlservr.exe
   Consumers can be in-proc or out of proc
   ETW consumer is provided
     Enables correlation between SQL Server and Windows trace
Excellent for troubleshooting resources that have
not been released
Using extended Events
All resources have similar deadlock handling
Deadlocks can be caused by
  Data locks - locks of rows, pages, partitions, tables, and
  database metadata
  Worker threads - queued tasks waiting for worker threads
  can cause deadlock .
  Memory - two tasks waiting for memory
  Parallel query execution resources
  MARS interleaving (only if MARS is enabled on client)
Perfmon Counter: Deadlocks/Sec
SQL Profiler Events
  Select Locks/Deadlock Graph
  Save all deadlock graphs to single file or one file
  Saved in XML format with XDL extension
  Displayed graphically in profiler
     Or can be analyzed using XML APIs
Extended Events
   Writes deadlock graph
Trace flags 1204 and 1222
   Writes deadlock info to SQL Server log
By default, SQL Server chooses the least expensive
victim to rollback
 If desired, you can impact this choice:
    (choose me, choose me)
    SQL Server 2005 - set to -10 through 10 or
    LOW-MEDIUM-HIGH (can also set with a variable)
The deadlock victim receives error 1205:
    Your transaction (<tran id>) was deadlocked with
    another process and has been chosen the
    deadlock victim. Rerun your transaction.
You must handle the deadlock case by either:
   Re-submitting the request (automatically)
   Asking the user to resubmit the request
   Cancelling the request and returning the user to
   the appropriate screen to start again (not very
   user friendly)
   But something should be done!
Deadlocks caused by distributed transactions
invoked by stored procedures are not detected
   A distributed transaction created with 4-part
   naming CAN be detected. It’s only when RPC
   calls create a deadlock.
Try to prevent them from happening altogether!
The strange deadlock case of duplicate
  Lock Hashes
Minimize blocking
  More effective queries
  More effective indexes
  Add or Remove lock hints
Access resources in the same order
MARS connections for client applications
Add retry logic in application if error 1205 returned
Consider versioning isolation levels
Architectural Overview
Implementation and Management
Monitoring and Diagnosis

Software solution for high availability
Per-database not per-server
Plays log data against a second server
Released SQL Server 2005 Service Pack 1
Now very popular
Avoids single point of failure in clustering

Increased data protection
Increased availability
Improved upgrade availability
Must implement full recovery model
Multiple mirrors not permitted
Not intended for large numbers of databases
  Increased overhead in log stream traffic (Log stream compression
  in SQL Server 2008 has improved this story)
  Unlike clustering, can end up with some databases failed over,
  some not
  Particular concern for multi-database applications
Cannot mirror
High performance (asynch)
High safety (synch)
High safety with automatic failover (synch
with witness)
  Disaster recovery scenarios
  Servers separated by large distances
  Avoid small errors affecting principal
  Do not configure a witness in this mode
  (possible but pointless and introduces risk)
  Consider log shipping instead (eg multiples)
  Avoiding data loss
  Performance impact
Tradeoff safety vs performance

            8. Acknowledgement
1. Commit
                                        4. Transfer to mirror

                                        7. Acknowledgement

                             2. Write to Log                          5. Write to Log

                             3. Log written                           6. Log written

                                         Log                                     Log
                                 Data                                  Data

                         Principal                                    Mirror
Can specify “Failover Partner” in connection
Mirror partner details requested and cached
on connection (2005)
Mirror partner details requested, cached
and persisted to local registry (2008)
Connection string property only used when
primary cannot be contacted and no cached
details of partner
Mirror cannot be directly accessed
Database snapshots can be created on the
Reporting can take place against the
Significant practical issues in
  refreshing the snapshots
  directing reports to correct snapshot
Snapshot cannot be restored to the mirror
Snapshots have performance implications
When configuring Database Mirroring and
Failover Clustering together, use Failover
Clustering for the local failover and
Database Mirroring for disaster recovery
Do not use mirroring auto failover in this
case, and trigger failover to the mirror
Configuring both together will likely result in
the mirror taking over before the clustering
failover, leaving clients connected to the
disaster recovery location
Log stream compression
Automated Page Repair
Incoming log stream processing
Transparent client redirection
Mirror partners and/or witness may not be
located in same domain
  Create certificate on each server
  Backup certificate on each server
  Import certificates from other servers on each
  Use certificates for authentication in setup
  Ensure that endpoint configuration uses
  compatible encryption algorithm
    Eg: Witness on Windows Server 2000 does not
    support AES
Must be in synchronous mode
Manually failing over asynchronous mode
requires temporary change to synchronous
Alternately “Force Service” allows data loss
Can be initiated from SSMS or via ALTER
We must always avoid the split brain scenario
(two principals)
If no quorum (2 of the 3 of principal, partner,
witness), service must stop
Very important that the witness isn’t on the same
server as either the primary or mirror partner
Only occurs when synchronized
Important to avoid lack of synchronization
Automated failover is then not guaranteed even
with all three servers installed
Network issues could stop necessary
sys.database_mirroring catalog view
Most are NULL when database not online
Log Send Queue Rate
Log Bytes Redone from Cache/sec
Log Bytes Sent from Cache/sec
Log Compressed Bytes Rcvd/sec (2008)
Log Compressed Bytes Sent/sec (2008)
Log Harden Time (ms)
Log Remaining for Undo KB
Log Scanned for Undo KB
Mirrored Write Transactions/sec
Log Send Flow Control Time (ms)
Log Receive Flow Control Time (ms)
Send/Receive Ack Time
(From SQLCAT Team testing)
(From SQLCAT Team testing)
(From SQLCAT Team testing)
(From SQLCAT Team testing)
Enterprise Edition only (for SQL Server 2008)
Detection and “correction” of
Pages on both principal and partner can be repaired
Repair happens in the background
Client still sees the error and needs to retry
No current indication that a retry might help
If another read attempt is made while repairing, an 829 is returned
Status/errors/logs can be seen in:
   msdb.dbo.suspect_pages table
   tracing “Database Suspect Data Page” event class
   SQL Server error log
Temporary Repair -> Something is probably broken!
One potential high availability solution
Standby instance on a different node
In case of failover:
  Active transactions are rolled back
  Connections are broken and need to retry
  Reconnect to other node
Single copy of databases
Built on top of Windows Clustering
  Automatic detection and failover
  Up to 8 servers in the cluster (nodes) 2005 or 16 for 2008
  One node at a time “owns” the service (such as SQL
  Server), including its resources (such as shared disk, IP,
  2005 required certified hardware (KB 309395)
  2008 has a cluster validation tool
Cluster Groups
  Collection of logically grouped cluster resources
  Contain related information for applications such as SQL
Multi-instance clusters
  Each virtual server requires its own resources
  Typically assign each to dedicated ports to
  avoid firewall issues
  Memory management needs to be considered
  Servers need to be able to handle all instances
  that could be failed to the node
More flexibility for cluster configuration
N+1 is the common scenario
  Eg 4 nodes -> 3 active, 1 warm standby
  N+1 is the recommended configuration
All nodes active also possible
  Eg 4 nodes -> 4 active
  Careful planning required
Analysis Services clusterable and multi-instance
Coexistence of SQL Server 2000 and 2005,
including WOW support (KB 327518)
Connectivity layer replaced by SQL Native Client
Standard Edition offered 2 node clustering
Up to 25 instances
  Subject to Edition/resources limitation, quorum type
Support for mount points (mounted volumes) for
additional storage
iSCSI support
Command-line only/unattended setup
Ability to rename the failover clustering instance
without reinstall
  But can’t rename default instance to named instance
Separate Full-text resource per instance
Encryption keys automatically distributed across
Support for Majority Node Set as a quorum type
  SQL Server still needed its data and log on shared drives
Surface Area Configuration special to Failover
Clustered instance
  TCP and Named Pipes are enabled, not off by default as
  with standalone
  SQL Browser was enabled; not off by default as with
HCL requirement replaced by Cluster
Validation requirement
Windows Server 2008 Clustering
  Support for 16 nodes
  New Quorum model (witness can be a
  dedicated LUN or small disk, does not need
  drive letter)
  Editing subnet mask during setup supported
  (connections from multiple networks)
  DHCP and IPV6 Support
  GPT Disks
GUID Partition Table (GPT) Disks
                        • Based on LBA
                        • Single partition > 2TB
                        • More reliable – redundant
                        copies of header and partition
Cluster setup and deployment improvements
  More enterprise-friendly
  Unattended installation and service packs via
  command line options
  Support for cluster-prepared instance deployment
  Improved error reporting and handling
Rolling upgrades help avoid downtime
  Upgrade strategy has changed from 2005
  Must upgrade one node at a time
Cluster Validation Tool
Each instance requires at least 1 named
Mount points (mounted volumes) supported
Disk subsystem must be shareable
  Local disks not appropriate
Maximum number of instances
  Up to 25 with MNS (local) quorum
  Up to 24 with shared quorum
  Why ever have so many instances?
    Consider capacity on failover scenarios!
    Plan ahead!
In earlier versions, quorum drive was a potential
point of failure
Windows Server 2003 introduced majority node
set (MNS) - changes voting method to allow
majority vote (KB838612)
SQL Server 2008 supports MNS
Disk usually allocated 500MB (as minimum
effective NTFS partition size)
MNS not recommended for two node clusters
  Majority of the nodes must be online for the cluster to
  come online
  Commonly used for 5+ nodes in a cluster
Failure of disk subsystem
Disk subsystem is shared between nodes
Entire solution must be on the Geographic Cluster
Hardware Compatibility List
Private and public networks must appear as non-routed
LANs (commonly based on VLANs)
Round trip latency no greater than 500ms
Cluster interconnect must appear as a standard LAN
Storage technologies must preserve single disk semantics
Quorum disk must be replicated real time synchronously
across sites
Avoid copper interconnect
Customers found difficulty obtaining
hardware on compatibility list
CVT is now the test for compatibility
If it passes, it is supportable
Some components not cluster aware
  Reporting Services
  Notification Services (2005)
  Integration Services (possible to create a cluster
  manually see
Analysis Services (AS) can be clustered
  Keep completely separate from database engine
  Place AS in a single group with its own disk and IP
NOTE: If sharing the group with Database, they will affect
  each other. This should be completely avoided.
Windows Event Logs
Cluster log
SQL Server Setup log
Failover cluster troubleshooting in Books
Setup tools on appropriate systems
Manual IP port
Additional cluster disks
Check affect the Group Property of SQL
Server and Analysis Services Resource
Verify failover
Using DMVs and ServerProperty to find out
more about the configuration
  select * from sys.dm_io_cluster_shared_drives
  select * from sys.dm_os_cluster_nodes
Microsoft Confidential—Preliminary Information Subject to Change   81
Introduced in 2005
Read-only static view of database
Multiples can exist per database
Always reside on the same server as the
Use NTFS sparse files
Is created with
regardless of database setting or model
If primary database becomes unavailable, all
snapshots of it are unavailable also
Operate entirely at the data page level
Pages only exist in
sparse file once
modified in primary
Copy on write
Reads take page
from snapshot if it

(Images from Books Online)
Maintaining historical data for reporting
Offloading reporting in mirroring
Safeguarding against administrative errors
Protection in major updates or bulk
Unit testing in development (can revert from a
  Possible when only a single snapshot
The snapshot feature has performance
  Remember that every query will read the
  unchanged pages from the underlying database
  Also changed pages must be copied to the
  snapshot database
  Pages copied only the first time a page is
  Subsequent changes to the same page are not
Database cannot be dropped, detached or restored
Performance is reduced
Files cannot be dropped from primary (or
Source database must be online
If creating on a mirror, it must be synchronized
Cannot snapshot master, model, tempdb
Not available on FAT32 or RAW partitions
Full text catalogs are not propagated from source
Security is based on creation time
Can only be created on primary when log shipping
Currently no easy option
Important to choose appropriate naming for
snapshots, including creation date/time
Need to drop and recreate snapshot
Issues for connected reporting apps, etc.
Automates process of:
  Backup transaction log
  Copy to another server
  Restore the log on the other server
Can have multiple secondaries
Database must be in full recovery or bulk logged
Use either log shipping or backup jobs – not both
  Using both together does not make sense and will result
  in transaction log backup files alternatively dispersed
  between both locations. Log backups always reset the
  log truncation point for the next backup.
Specify a backup location that does not reside on
the same server as the primary or secondary
Manually restore the initial backup for large
Use a frequent backup, copy, and restore interval
  Practical frequency is 1 minute when using the SQL
  Agent job scheduling subsystem. Many choose to run at
  5 minute interval. Default is 15 minutes.
Have a plan for logins, jobs, etc which are stored
outside the log shipped database
Use a separate monitoring server and configure
alerts so that you are aware of problems when they
  Also probably smart to configure a separate job to watch
  your log and file share space
There is no automated client failover
  Must have a plan to redirect clients
  Best practice: ensure the client stores the
  connection string outside of the application to
  make it easy to change
Zero data loss after a failover if all primary
logs are applied to the secondary
  If you lose logs and bring the secondary online,
  you will lose the data associated with the missing
Recover database
  This is equivalent to “force service” in database
Use the tablediff tool against the primary and
secondary after the principal is back online to
identify the data that is out of sync
If the secondary gets far out of sync with the
primary, it may be faster to restore a full
backup than to apply all of the outstanding
log files
Table partitioning might help here
Log shipping supports a delay parameter
If you catch an db error within the delay
interval, the secondary can be used to help
In this case, it is helpful to have the standby
option configured. Keep restoring log until
just before the error occurred
Obviously, it can be difficult in practice to
know when to stop applying log
Setting up log shipping without removing existing
backup jobs
Not granting the agent accounts read/write access
to the share and filesystems configured
Selecting the UI option to generate a full backup for
a large principal database during setup (blocks the
UI until complete)
Not copying the required permissions, logins, and
other dependencies to the secondary server
Specifying the file locations – only the backup
location requires a network share, all other jobs can
operate on local files
Not all SQL Server features work with all
other SQL Server features !
Important to understand workable and
supportable combinations
Not supported
Already much-requested combination
May be addressed in future versions
Failover clustering an alternative now
Full-text catalogs that are associated with the
principal database are automatically created
on the mirror when it is initially restored
during DBM setup
Subsequent data changes that affect the
index are not logged
After failover, initiate a full population of the
    Applications which depend on DTC will not
    fail over seamlessly
   Consider Failover Clustering instead
Applications which rely on multiple user
databases are not good candidates for
database mirroring
Consider Failover Clustering instead
Log Shipping and Database Mirroring are
100% compatible:
  Case 1: Log shipping from serverA -> serverB
  and also mirroring at the same time
  Case 2: Log shipping from serverA -> serverB
  and mirroring to serverC
Replication -> Supported
Database Snapshots -> Supported
When configuring Database Mirroring and
Failover Clustering together, use Failover
Clustering for the local failover and Database
Mirroring for disaster recovery
Do not use mirroring auto failover in this
case, and trigger failover to the mirror
Configuring both together will likely result in
the mirror taking over before the clustering
failover, leaving clients connected to the
disaster recovery location
Full-Text -> Supported
Replication -> Supported
Important to understand required Service Level
Agreements in any planning
Critical that clients understand and agree to these
What level of data loss can you tolerate?
   Clients will always say “none” but can’t afford that
This affects
   Backup strategy
   Recovery strategy
   Technology used
   Level of redundancy
64 bit servers now commonplace
Availability of 64 bit drivers still a concern in
some areas eg: ODBC drivers
Tools are 64 bit and run on WOW
Tools should normally be run from a 32 bit
client system (performance can be impacted
on 64 bit)
64 bit systems shine when large amounts of
memory are required
  Multiple operating systems images supporting separate
  independent applications running simultaneously on the same
  computer system.
  Strong hardware enforced isolation between the VMs

One Physical Server
                                                                 Child Partition

                                            Child Partition
                          Child Partition
     Root Partition                            Server



          Devices        Processors               Memory
The Hypervisor runs directly on the hardware
    2nd generation virtualization technology in Intel VT-x with EPT and AMD –
    V with NPT chips accelerate VM performance
OS’s and applications run in partitions on top of the Hypervisor
Enlightenment provides fast path which avoids device emulation
    Integration services provides VSCs, VSPs and VMBus
Hyper-V Role in Windows Server 2008 Server Manager
   Logical Processor: One logical
   computing engine in the OS,
   application and driver view
   Virtual Processor: Virtual logical
   processor (up 1:8 over commit of LP)
% Processor Time
   % Processor Time counters in
   either guest or root may not be
Hyper-V Processor Counters
   The best way to get true measure
   of CPU utilization
       Hyper-V Logical Processor: Total
       CPU time across entire server
       Hyper-V Virtual Processor: CPU
       time for each guest virtual processor
Consolidation Increases the importance of High
Availability Features
  Consolidation serves to increase cost for a single system failure
  Increasing focus on planned outages vs. unplanned outages
  Utilization focus increasingly puts passive HADR hardware in
  negative light
Windows Fail-Over Clustering - Host Level Clustering
Windows Fail-Over Clustering - Guest Level Clustering
Step 1: Snapshot VM memory
   Copy partition memory from source VM to Destination

                                Source Child                      Destination
Physical Server                  Partition
                                                                                            Physical Server
                                                                 Child Partition
                          Partition Memory                      Partition Memory
  Root Partition                                                                             Root Partition
                              Changed Pages
                         Storage Connections
                   Hypervisor                                                  Hypervisor
                   Hardware                                                    Hardware

   Network Connections
                                                LUN 1   LUN 2
                                               Shared Storage
Step 2: Copy changed pages from source VM to destination

                                Source Child                      Destination
Physical Server                  Partition
                                                                                            Physical Server
                                                                 Child Partition
                          Partition Memory                      Partition Memory
  Root Partition                                                                             Root Partition
                              Changed Pages                     Changed Pages
                         Storage Connections
                   Hypervisor                                                  Hypervisor
                   Hardware                                                     Hardware

   Network Connections
                                                LUN 1   LUN 2
                                               Shared Storage
Step 3: Storage connections are migrated from the source
        VM to the destination VM.

                                Source Child                       Destination
Physical Server                  Partition
                                                                                             Physical Server
                                                                  Child Partition
                          Partition Memory                       Partition Memory
  Root Partition                                                                              Root Partition
                              Changed Pages                      Changed Pages
                         Storage Connections                    Storage Connections
                   Hypervisor                                                   Hypervisor
                   Hardware                                                      Hardware

   Network Connections
                                                LUN 1   LUN 2
                                               Shared Storage
Step 4: Network connections are migrated from source VM
        to destination VM

                                Source Child                       Destination
Physical Server                  Partition
                                                                                              Physical Server
                                                                  Child Partition
                          Partition Memory                       Partition Memory
  Root Partition                                                                               Root Partition
                              Changed Pages                      Changed Pages
                         Storage Connections                    Storage Connections
                                  Network                            Network
                                Connections                        Connections
                   Hypervisor                                                    Hypervisor
                   Hardware                                                      Hardware

                                                                                      Network Connections
                                                LUN 1   LUN 2
                                               Shared Storage
Step 5: Destination VM is brought online,
        Source VM is taken off line

                                Source Child                       Destination
Physical Server                                                   Child Partition
                                                                                               Physical Server
                                                                 Partition Memory
  Root Partition                                                                                Root Partition
                                                                 Changed Pages
                                                                Storage Connections

                   Hypervisor                                                    Hypervisor
                   Hardware                                                         Hardware

                                                                                       Network Connections
                                                LUN 1   LUN 2
                                               Shared Storage

To top