WSV202_ Building a business critical system – technology - MSDN by hcj

VIEWS: 4 PAGES: 33

									                         Vital function (such as production and sales) without which a firm cannot operate or remain
                         viable. If a critical business function is interrupted,
                         a firm could suffer serious financial, legal, or other damages or penalties.




* Business Dictionary: http://www.businessdictionary.com/definition/critical-business-function.html
      Power
                            Virtualization       Scalability             RAS
    Management
• Timer coalescing     •   SLAT              • 256 Logical      • Memory
• Tick skipping        •   VMQ                 Processors         Mirroring – writes
• Core parking         •   Jumbo Frames      • Turbo Boost        to 2 locations to
                                             • Quickpath          compensate for
• Report power         •   Intel VT
                                                                  DRAM failure
  consumption to                             • 16 MB L3 Cache
  OS via ACPI                                  (7400)           • Memory Sparing
                                                                  – predicts a failing
• Accessible via                             • Multi-site
                                                                  DIMM and copies
  WMI                                          manageability
                                                                  data to a spare
  (reading/writing
                                                                  DIMM
  of power plans –
  active plan can be                                            • I/O Hot plug
  changed                                                       • MCA Recovery
  remotely)                                                     • WHEA – root
                                                                  cause
                      Built-In Redundancy & Failover Throughout the Platform
     Socket Redundancy & Failover
                                                                                            Memory Redundancy &
     • Dynamic OS Assisted Processor Socket Migration*
                                                                                            Failover
     • Electronically Isolated (Static) Partitioning
                                                                                            •Inter-socket Memory Mirroring
                                                                                            •Intra-socket Memory Mirroring
                                                                                            •Intel® SMI Lane Failover
                                                                                            •Intel® SMI Clock Fail Over
                                                                                            •Intel® SMI Packet Retry
       Memory                      NHM-EX                       NHM-EX             Memory
                                                                                            •Memory DIMM and Rank Sparing
                                                                                            •Dynamic Memory Migration
                                                                                            •Fail Over from Single DRAM Device Failure
       Memory                      NHM-EX                       NHM-EX             Memory
                                                                                            (SDDC)
                                                   Intel® QPI                               •Recovery from Single DRAM Device Failure
                                                                                            (SDDC) plus random bit error
                                    IOH                              IOH

                                                                                            Intel® QPI Redundancy &
             ICH10
                                                                                            Failover
                                                                                            •QPI Self-Healing
                            PCI Express* 2.0                    PCI Express* 2.0            •QPI Clock Fail Over
                                                                                            •Intel QPI Packet Retry
Intel® QPI = Intel® QuickPath Interconnect
Intel® SMI = Intel® Scalable Memory Interconnect
                                          Machine Check Architecture Recovery
                                            First Machine Check Recovery in Xeon®-based Systems
                                         Previously seen only in RISC, mainframe, and Itanium-based systems

                                 DRAM
                                 DDR3
                          DRAM
                          DDR3


     System works in conjunction with OS
                                 DRAM
                                 DDR3


                                                                                                    Normal Status
                          DRAM
                          DDR3
                                 DRAM
                                 DDR3
                          DRAM
                          DDR3


         or VMM to recover or restart
                                 DRAM
                                 DDR3
                          DRAM
                          DDR3



                                                                                                      With Error
                                 REG


        processes and continue normal
                          REG

                                 DRAM
                                 DDR3
                          DRAM
                          DDR3


                  operation
                                  DDR3
                               DRAM


             S                                                                                        Prevention
                             DDR3
                          DRAM
                               DRAM
                               DDR3
                          DRAM
                          DDR3


                                                                    System
                               DRAM
                               DDR3


                                                                   Patrol Scrubber scans
                          DRAM
                          DDR3


                  M              DRAM
                                 DDR3
                                                DDR3
                                                DRAM
                                                       DRAM
                                                       DDR3
                                                                   Recovery errors
                                                                    memory for
                                                                    with OS
                          DRAM
                          DDR3                         DRAM
                                                       DDR3


                B
                                 DRAM
                                 DDR3
                                                DRAM
                                                DDR3
                          DRAM
                          DDR3                         DRAM
                                                       DDR3
                                 DRAM
                                 DDR3
                                                DRAM
                                                DDR3
                          DRAM
                          DDR3                         DRAM
                                                       DDR3
                                 DRAM
                                 DDR3

     SMI                                        DRAM
                                                DDR3


                Error information
                          DRAM
                          DDR3
                                                       REG
                                 REG
                                                REG
                          REG


               passed to OS / VMM
                                                       DRAM
                                                       DDR3
                                 DRAM
                                 DDR3


                                                                          Un-correctable Error
                                                DRAM
                                                DDR3
                          DRAM
                          DDR3                          DDR3
                                                     DRAM
                                  DDR3
                               DRAM


                               S                                                                   Error
                                                   DDR3
                                                DRAM
                             DDR3
                          DRAM                       DRAM
                                                     DDR3
                               DRAM
                               DDR3
                                                DRAM
                                                DDR3

                                                                                                              HW Correctable
                          DRAM
                          DDR3                       DRAM
                                                     DDR3
                               DRAM
                               DDR3


                                                                                                 Detected*
                                                DRAM
                                                DDR3
                          DRAM
                          DDR3


                                          M            DDR3
                                                       DRAM
                                                                                                                 Errors
                                                DRAM
                                                DDR3


                                          B
                                                       DRAM
                                                       DDR3
            SMI                                 DRAM
                                                DDR3
                                                       DRAM
                                                       DDR3



                                                                                                                         Error
                                                DRAM
                                                DDR3
                                                       DRAM
                                                       DDR3
                                                DRAM
                                                DDR3
                                                       REG


        Bad memory location flagged so                                   Error
                                                REG



                                                                                                                      Corrected
                                                       DRAM
                                                       DDR3
                                                DRAM
                                                DDR3


         data will not be used by OS or
                                                        DDR3
                                                     DRAM
                                                   DDR3
                                                DRAM


                                                                    Contained
                                                     DRAM
                                                     DDR3
                                                DRAM
                                                DDR3


                   applications
                                                     DRAM
                                                     DDR3
                                                DRAM
                                                DDR3




                                        Allows Recovery From Otherwise Fatal System Errors
 *Errors detected using Patrol Scrub or Explicit Write-back from cache
10
Introduced in Windows Server 2008*
• Better root cause analysis
  – Error reporting via common error record format, richer data
    content (e.g. FRU info)
  – Platform and the OS flows are well integrated which allows both
    to contribute information to the log
• Better support for hardware error recovery
  – Built in infrastructure for error injection
  – Platform Specific Hardware Error Driver (PSHED) Plugins allow for
    platform participation in error recovery

• Error avoidance with health monitoring
  – Allows for applications to register for hardware error event notification
  – PFA apps can be used to monitor platform health
• WHEA enhancements on Intel® Architecture in Windows Server 2008* R2
  – Support for Nehalem-EX MCA recoverable errors
  – Corrected Machine Check Interrupt (CMCI) error handling support
    Intel® server processors codename Nehalem-EX
                                              CPU

                                                                     Cores

                                    Core0                    Core7
                                                                                  Broadcast MCE to all threads
                             New
                             Data
                                                                 UnCore
               WB Data                                                       Log the error
                                            LLC
                                              EWB Error detected                    MCi_Status.Valid = 1
   Data stored                                      Error detected                  MCi_Status.EN = 1 (Error enabled)
                                                                                    MCi_Status.UC (uncorrected error ) = 1
  with poison bit                                                                   MCi_Status.PCC (Process context corrupt ) = 0
                                                                                    MCi_Status.OVER (overflow) = 0
                               WB                   Poison                          MCi_Status.MCA_error_codes indicates which error is detected
                    Poison     Data                  Tag                            MCG_Status.RIPV = 1
                                                                                    MCi_Status.ADDRV = 1
                     Tag               Memory
                                                                                    MCi_Status.MISCV = 1
                                                                                    MCi_Status.MSCOD = poison (model specific)

                                      Controller                Link

                                                                                        System Software recovers the error
Memory
MCA Predictive Failure Notification
                                             Example: OS Initiates
                                              Fail-over to Spares
                                                    CPU

                                                                         Cores

                                                                                              Memory
                                          Core0                  Core7
                                                                                     1    Error is Detected
                                                                                           And Corrected
                                   New               4
                                   Data                                                    Corrected Error
                                                                      UnCore         2        Count is
                                                  LLC                       2    3          Incremented

                                                                                            Error Count
                                                                                     3        Exceeds
                                                                                            Threshhold
                1
                                             Memory                                      Uncore Issues CMCI
                                            Controller               Link            4       to the OS
           Memroy Error detected                                                              Handler
      Memory




 13
                       Mission Critical Applications                                 Management
 Business
                                                                                      Platform
                                                     Line Of Business (LOB) Custom
Applications     Enterprise Applications
                                                              Applications



                  Database                 Collaboration           Communication
 Microsoft
  Server
Applications




                          Virtualization Platform


                      Hyper-V™




               Microsoft Virtualization = Windows Server 2008 R2 Hyper-V
                                      + System Center
    The Virtual / Process view                          The Physical / real view

                Virtual Machine 1                            Physical Memory Pages

                                    Virtual Machine 3
Virtual Machine 1




                      Hyper Visor

                    Operating System
Scale Up
Configuration
                         SAN for
                       SQL and Files



                          SAN


                                       Fiber Optic channel to SAN




                        LOB Apps
                Windows Server App Fabric
                   SQL Server 2008 R2
                 Windows Server 2008 R2
Scale Out
Configuration
                                                                    SAN based
                                                                       for
                                                                   SQL and Files


                                                                       SAN



                           CICS COBOL apps                                            Fiber Optic channel to
                                                                                               SAN
                        Micro Focus
                CICS COBOL apps Server EE
                        Windows Server 2008
              Micro Focus Server EE
           LOB Apps Server 2008
              Windows
     Windows Server App Fabric HP BL465
                                                               SQL Server 2008 R2
     Windows Server 2008 R2
                     HP BL465
                                                             Windows Server 2008 R2


                                               Dual
                                          Gigabit Ethernet
                                            on PCIe bus
Scale Out
Virtualized
                                                                          SAN based
                                                                         SQL and files


                                                                             SAN

                     LOB Apps
                     App Server                                                           Fiber Optic channel to SAN
              Windows Server 2008 R2


                     LOB Apps
                     App Server                                      SQL Server 2008 R2
              Windows Server 2008 R2
                                                                Windows Server 2008 R2
                     LOB Apps
                     App Server
              Windows Server 2008 R2
                                                    Dual
                                          Gigabit Ethernet on PCIe
               Windows Server 2008 R2                bus
          Hyper-V Virtualization Server
              Backup &
                          Hardware
              Disaster
                         Provisioning
              Recovery



Performance                          Deployment,
and Health                           Patching and
 Monitoring                          State Mgmt



                           Virtual
         Mobile Device
                          Workload
          Management
                         Provisioning
         Sunguard                                                bwin                           Siemens
             1024-Core Computing Grid
                                                             30,000 Transactions per
              running Windows Server
                                                                 Second at peak                 PLM system supports 5,000
             2008 and SQL Server 2008
                                                                                                    concurrent users

            Asset Liability management
                                                              1 Million bets per day
                      (ALM) -

                                                                                                  Gained 50% of space
                                                                                                  through compression
                Near Linear scalability                        100 Terabytes of data




Sunguard - http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000006391
bwin - http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000004138
Siemens - http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000004826
Windows Server 2008 R2 and SQL Server 2008 R2 are mission critical
Hardware partners provide scale-up and resilient platform
Windows Server + Intel Xeon 7500 can detect and recover from
hardware errors
Democratizing Mission Critical
Deploying, Virtualizing, and Managing Linux and UNIX with Hyper-V
Manage Your Enterprise from a Single Seat: Windows PowerShell Remoting




Next Generation VDI with Microsoft RemoteFX
Lighting Up Nehalem EX with Windows Server 2008 R2


Implementing High Availability


Windows Server 2008 R2 Failover Clustering
www.microsoft.com/teched       www.microsoft.com/learning




http://microsoft.com/technet   http://microsoft.com/msdn
    Sign up for Tech·Ed 2011 and save $500
           starting June 8 – June 31st
http://northamerica.msteched.com/registration




             You can also register at the
    North America 2011 kiosk located at registration
             Join us in Atlanta next year

								
To top