Docstoc

Examples for Concrete TechniquesHard Disks - TU Chemnitz

Document Sample
Examples for Concrete TechniquesHard Disks - TU Chemnitz Powered By Docstoc
					                                                                                                                        Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.
                                                                                          Dependable Systems
                                                                                          Winter term 2012/2013     7.1 Motivation
                                                                                                                    Hard Disks




         Chapter 7                                                                                                            Hard disks are frequently used components of computer systems
                                                                                                                              (still)
         Examples for Concrete                                                                                                Especially due to mechanical components, hard disks have
                                                                                                                              comparative high failure rates and early aging
         Techniques: Hard Disks                                                                                                        Study: failure rate of 8.6% in third year

                                                                                                                              Consider here two techniques
                                                                                                                                       RAID: fault masking
                                                                                                                                       S.M.A.R.T.: fault diagnosis

                                   Prof. Matthias Werner
                                   Operating Systems Group




                                                                                                                  7–1                      M. Werner          Dependable Systems    Winter term 2012/2013

      Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.                                    Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  7.2 RAID                                                                                                          RAID 0
  Introduction



            RAID: Redundant array of inexpensive disks or: Redundant
            array of independent disks
            Redundancy by spreading data across several disks
            Tolerating disk failures
            Needs at least two disks
            I/O performance increased by parallel access
            Originally               [PGK88]
            Here: Original types and commonly used variations
            m = f (n) always represents the number of disks necesssary to get n
            times the capacity of a single disk




7–2                      M. Werner          Dependable Systems    Winter term 2012/2013                           7–3                      M. Werner          Dependable Systems    Winter term 2012/2013
      Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.            Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  RAID 0 (cont.)                                                                            RAID 1


            Striped array with no fault-tolerance
            Requires n disks
            Advantages
                     High performance (theoretically: n-times of single disk)
                     Simple design
            Disadvantages
                     No fault-tolerance
                     Reliability lower than for single disk
                     Failure of one disk leads to complete data loss
            Use cases
                     Storing temporary data with need for very high bandwidth
                     Video production
                     Compilation of large software projects
                     ...



7–4                      M. Werner          Dependable Systems    Winter term 2012/2013   7–5                      M. Werner          Dependable Systems    Winter term 2012/2013

      Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.            Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  RAID 1 (cont.)                                                                            RAID 2


            Mirroring of data
            Requires m = 2n disks
            Advantages
                     One write or two reads in parallel
                     Increased read performance (similar RAID 0)
                     Simple design
                     Tolerates failure of one disk
                     No reconstruction of data necessary
            Disadvantages
                     High overhead (100%)
            Use cases
                     Applications requiring very high reliability
                     Usually used for smaller amount of data




7–6                      M. Werner          Dependable Systems    Winter term 2012/2013   7–7                      M. Werner          Dependable Systems    Winter term 2012/2013
       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  RAID 2 (cont.)                                                                             RAID 3

             Using hamming code ECC to spread data among disks
             (ECC ¬ error correction code)
                                                                                                                                                        RAID 3
             Word-wise correction
             Number of disks depends on code                                                                                                                         byte-wise
             Advantages                                                                                                                                              parity
                                                                                                                                                                     generation
                      Design simpler than RAID 5
                      Correction of single disk failure
                      High data rates                                                                 A                          B                     C                              parity A-C

             Disadvantages                                                                            D                          E                     F                              parity D-F

                      High overhead (decreases with number of disks)                                  ...                        ...                   ...                               ...
                      High entry cost
                      No commercial implementation                                                Disk 1                    Disk 2                 Disk 3                             Disk 4
             Use cases
                      Not used in industry



7–8                       M. Werner          Dependable Systems    Winter term 2012/2013   7–9                       M. Werner          Dependable Systems    Winter term 2012/2013

       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  RAID 3 (cont.)                                                                             RAID 4


             Stripe set with additional disk for bytewise parity
             Requires n + 1 disks                                                                                                                       RAID 4
             Advantages
                      High data rates                                                                                                                                block-wise
                      Low overhead                                                                                                                                   parity
                                                                                                                                                                     generation
                      Disk failure has low impact on performance
             Disadvantages
                      Complex design                                                                  A                          B                     C                              parity A-C
                      Transaction rate equals single disk                                             D                          E                     F                              parity D-F
                      Unequal distribution of accesses to disks (parity disk is accessed              ...                        ...                   ...                               ...
                      much more often)
             Use cases                                                                            Disk 1                    Disk 2                 Disk 3                             Disk 4
                      Applications requiring high reliability
                      Rarely used today



7–10                      M. Werner          Dependable Systems    Winter term 2012/2013   7–11                      M. Werner          Dependable Systems    Winter term 2012/2013
       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.               Examples for Concrete Techniques: Hard Disks                    Motivation RAID S.M.A.R.T.


  RAID 4 (cont.)                                                                             RAID 5



             Similar to RAID 3 but using blockwise parity instead of byte- or                                                                                             RAID 5
             wordwise
             Requires n + 1 disks                                                                                                                                                                                    block-wise
                                                                                                                                                                                                                     parity
             Advantages                                                                                                                                                                                              generation
                      Similar to RAID 3
             Disadvantages                                                                                         A                           B                          C                   parity A-C

                      Similar to RAID 3                                                                         parity D-F                     D                          E                      F
             Use cases                                                                                             G                        parity G-I                    H                      I
                      Applications requiring high reliability                                                  Disk 1                       Disk 2                      Disk 3                Disk 4
                      Rarely used today




7–12                      M. Werner          Dependable Systems    Winter term 2012/2013   7–13                         M. Werner              Dependable Systems             Winter term 2012/2013

       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.               Examples for Concrete Techniques: Hard Disks                    Motivation RAID S.M.A.R.T.


  RAID 5 (cont.)                                                                             RAID 6

             Striping with distributed / interleaving parity
             Parity is spread over disks
             Requires n + 1 disks                                                                                                                                       RAID 6
             Advantages                                                                                                                                                                                                    block-wise
                                                                                                  block-wise
                      High data rates                                                             parity                                                                                                                   parity
                                                                                                  generation 1                                                                                                             generation 2
                      Low overhead
                      Equally distributed accesses
             Disadvantages                                                                                                       A                           B                 parity 1 A-B           parity 2 A-B


                      Disk failure has medium impact on performance
                                                                                                                             parity 2 C-D                    C                     D                  parity 1 C-D


                      Complex design
                                                                                                                             parity 1 E-F                parity 2 E-F              E                       F
                      Complex rebuild                                                                                        Disk 1                      Disk 2                Disk 3                 Disk 4
             Use cases
                      Applications requiring high reliability
                      Most versatile RAID level
                      Most often used today


7–14                      M. Werner          Dependable Systems    Winter term 2012/2013   7–15                         M. Werner              Dependable Systems             Winter term 2012/2013
       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  RAID 6 (cont.)                                                                             Variations and Other Aspects

             Similar to RAID 5 but using two different parity algorithms to
             generate two sets of parity blocks
             Parity is spread over disks
             Requires n + 2 disks                                                                       Nested RAID levels
             Advantages                                                                                          RAID 1+0
                      Higher reliability than RAID 5                                                             RAID 0+1
                      High data rates
                      Low overhead                                                                      Implementation issues
                      Equally distributed accesses                                                               Hardware-RAID
             Disadvantages                                                                                       Software-RAID
                                                                                                                 RAID-implementations found on some mainboards
                      Lower write performance
                                                                                                                 Performance considerations
                      Higher overhead
                      Complex design
                      High controller overhead
             Use cases
                      Applications requiring high reliability
                      Starts replacing RAID 5 in a lot of applications

7–16                      M. Werner          Dependable Systems    Winter term 2012/2013   7–17                      M. Werner          Dependable Systems    Winter term 2012/2013

       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  RAID 1+0                                                                                   RAID 1+0 (cont.)


                                                                                                        Stripeset of mirrors
                                                                                                        Sometimes called RAID 10
                                                                                                        Requires 2n disks
                                                                                                        Advantages
                                                                                                                 Combination of advantages of RAID 1 (redundancy) and RAID 0
                                                                                                                 (performance)
                                                                                                                 Simple design
                                                                                                                 High data rates
                                                                                                        Disadvantages
                                                                                                                 High overhead
                                                                                                        Use cases
                                                                                                                 Applications requiring high performance and high reliability




7–18                      M. Werner          Dependable Systems    Winter term 2012/2013   7–19                      M. Werner          Dependable Systems    Winter term 2012/2013
        Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  RAID 0+1                                                                                    RAID 0+1 (cont.)


                                                                                                         Mirrored stripesets
                                                                                                         Sometimes called RAID 01
                                                                                                         Requires 2n disks
                                                                                                         Advantages
                                                                                                                  High performance
                                                                                                                  Tolerates single disk failure (becomes RAID 0)
                                                                                                                  Simple design
                                                                                                         Disadvantages
                                                                                                                  High overhead
                                                                                                         Use cases
                                                                                                                  Applications requiring high performance and some reliability




7–20                       M. Werner          Dependable Systems    Winter term 2012/2013   7–21                      M. Werner          Dependable Systems    Winter term 2012/2013

        Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  RAID 1+0 vs. RAID 0+1                                                                       Hardware RAID vs. Software RAID
                                                                                                         Hardware RAID
                                                                                                                  RAID algorithm is calculated by controller or external device
                                                                                                                  Operating system “sees” a single disk device
                                                                                                                  No impact on performance of main processor
                                                                                                                  RAID array usually only usable with the original controller or
       In case of four disks, which one is better?                                                                compatible
              Goal: high reliability                                                                              Hardware design optimized for RAID usage
                                                                                                                  Usually operates on whole disks
              Modelling using reliability block diagrams
              Assumption: All disks have the same reliability                                            Software-RAID
        ¬ Calculate as exercise!                                                                                  RAID algorithm executed by OS (or one of its drivers) using main
                                                                                                                  processor
                                                                                                                  Performance of main processor is affected by RAID
                                                                                                                  RAID-drivers creates array out of several devices
                                                                                                                  RAID array can be used with other controllers that use the same
                                                                                                                  drive geometry as long as driver is compatible
                                                                                                                  Hardware usually not optimized for RAID usage (number of disks per
                                                                                                                  controller, disks sharing I/O-channel, ...)
                                                                                                                  Usually operates on partitions

7–22                       M. Werner          Dependable Systems    Winter term 2012/2013   7–23                      M. Werner          Dependable Systems    Winter term 2012/2013
         Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.                                  Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  Mainboard RAIDs                                                                                                   Performance


                                                                                                                               Performance of RAID depends on all components along the data
               A lot of modern mainboards features “RAID”-controllers                                                          path
                                                                                                                                    Disk and the topology of disks (shared bus vs. point-to-point)
               Usually not hardware-RAID
                                                                                                                                    Controller
               RAID-algorithms are executed on main processor by driver using                                                       Bus system of computer (topology, bandwidth)
               specialized hardware                                                                                                 I/O-performance of system
               Acceleration by specialized hardware depends on implementation                                                     ¬ Performance depends on weakest part!

               Extreme cases are pure software (cheap) and pure hardware                                                       Software RAID:
               implementations (expensive)                                                                                              Performance of main processor
                                                                                                                                        Trade-off between performance needed for RAID and rest of system
               Caution: Array usually cannot be used by mainboards of other
               types!                                                                                                          Hardware RAID:
                                                                                                                                        Is the interface fast enough for the maximum bandwidth of chosen
                                                                                                                                        RAID configuration?




7–24                        M. Werner          Dependable Systems    Winter term 2012/2013                        7–25                      M. Werner          Dependable Systems    Winter term 2012/2013

         Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.                                  Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  Performance (cont.)                                                                                               Reliability
       Example for poor design:                                           Example for good design:
            Four EIDE disks delivering 80                                         Four SATA disks delivering 80
            MB/each                                                               MB/each
            ¬ Overall throughput: 320 MB/s                                        ¬ Overall throughput: 320
                                                                                                                               Assumption (fault model):
                                                                                  MB/s                                                  Disk fault results in read error
            Each pair shares a 100 MB/s PATA
                                                                                                                                        Controller recognizes read error and locates faulty disk
            channel                                                               Each device uses a 1.5 Gbs
                                                                                                                                        RAID continues to work in degrated mode
            ¬ Overall throughput: 200 MB/s                                        SATA 1 link
            Each PATA channel is connected to                                     ¬ Overall throughput: 600
                                                                                                                               Problem
            a PCI controller                                                      MB/s
                                                                                                                                        In reality, disk faults often result in errors only while accessing the
            ¬ Overall throughput: 133 MB/s                                        Two devices share a PCI-e x1                          data
            PCI bus limits theoretical peak                                       controller                                            Faults on seldom used parts of disk may exist for a long time not
            throughput                                                            ¬ Overall throughput: 500                             resulting in an error
                                                                                  MB/s                                                  Result: Risk for data loss increases as the next error forces
                                                                                                                                        rebuilding of RAID and hidden fault results in the second error
       Only one third of disk bandwidth is                                In parallel operation total overall                           Solution: “Disk scrubbing”
       usable in parallel operation.                                      disk bandwidth can be used.

            Reality: Much slower (overhead,
            other devices use PCI too, ...)
7–26                        M. Werner          Dependable Systems    Winter term 2012/2013                        7–27                      M. Werner          Dependable Systems    Winter term 2012/2013
       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  Disk Scrubbing                                                                             Spare disks

                                                                                                        Problem:
                                                                                                        Most RAID implementation have very low reliability in degraded
             Idea:                                                                                      mode
             Access all sectors of all disks of the RAID system at regular intervals                             Fast repair is necessary ¬human activity
                                                                                                                 Hardware RAID and some Software RAIDs: Hot swap is possible
             Implemented by forced rebuilding of array
                                                                                                                 But: low MTTR leads to high cost
                      Detects hidden faults (¬ error while accessing a disk)
                      Administrator replaces disk                                                       Solution: Spare disks
                      Reduces probability of double disk error                                                   Additional disk(s) not being part of array
                      Good practice: Once per month                                                              In case of failure controller uses spare disk instead of failed (cold
                                                                                                                 standby) disk and starts reconstruction of array
             Linux Software-RAID:                                                                                Results in intact array
             echo check > /sys/block/mdX/md/sync_action                                                          Defect disk is replaced later and becomes spare disk
                                                                                                        But: Using RAID with higher fault tolerance is better due to critical
                                                                                                        phase while reconstructing
                                                                                                        E.g., RAID 5 + Spare vs. RAID 6


7–28                      M. Werner          Dependable Systems    Winter term 2012/2013   7–29                      M. Werner          Dependable Systems    Winter term 2012/2013

       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  7.3 S.M.A.R.T.                                                                             Idea
  What is S.M.A.R.T.?


                                                                                                        Harddisks in many cases do not fail suddenly
                                                                                                        Failure is caused by slow changes of components
             S.M.A.R.T. = Self-Monitoring, Analysis and Reporting Technology
                                                                                                                 Mechanical wearout
             Technology to predict harddisk failures based on runtime observation                                Increased aging due to high temperature
             of parameters                                                                                       “Normal” aging
             Implemented in firmware of disks
                                                                                                        Changes have impact on measureable parameters
             Supported by BIOS/EFI and operating system
                                                                                                        Prediction by observation and analysis of these parameters
             S.M.A.R.T. tools are available for all platforms
                                                                                                                 Field data helps increasing prediction quality
             Linux: smartmontools (smartctl)                                                                     But: Disk may behave differently!
                                                                                                                 Problem: New technologies




7–30                      M. Werner          Dependable Systems    Winter term 2012/2013   7–31                      M. Werner          Dependable Systems    Winter term 2012/2013
       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.               Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  Parameters                                                                                   Interface
                                                                                                          Part of standard since ATA-3
                                                                                                          Also for other interfaces (SATA, SCSI)
                                                                                                          ATA-3
             Use of quite varying parameters                                                                       Draft: list of 30 attributes
             Attributes suggested in ATA-3 draft have never became part of                                         Values: 1–254
             standard                                                                                              Threshold
             Typical parameters are amongst others:                                                       ATA-4
                      Seek Error Rate = non-correctable read errors                                                Only “OK” or “NOT OK”
                      Hardware ECC Recovered = corrected read bit errors                                  ATA-5
                      Throughput Performance                                                                       Errorlog
                      Spin Up Time                                                                                 Interface for self-test
                      Reallocated Sector Count= number of defect sectors                                  ATA-6
                      Temperature                                                                                  Selective tests
                      Calibration retry count
                                                                                                          ATA-8
                                                                                                                   Draft suggested attributes again, but still not part of standard
                                                                                                                   Alternative (non-SMART) way to determine temperature
                                                                                                          Many implementations are downwardly compatible by implementing
                                                                                                          attribute lists
7–32                      M. Werner          Dependable Systems    Winter term 2012/2013     7–33                      M. Werner          Dependable Systems    Winter term 2012/2013

       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.               Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  S.M.A.R.T. Attributes                                                                        Self-tests



             Value (1-253) is calculated out of raw data based on
                      Statistical data
                                                                                                          Two types:
                      Experience of manufacturer
                                                                                                                   Short (1-2 minutes)
             Interpretation                                                                                        Long (appr. one hour)
                      value > threshold: drive ok                                                                  Starting with ATA-6/7 also selective tests
                      Else: Failure expected within next 24 hours or drive beyond expected
                      live time
                                                                                                          Are executed in parallel to normal function of drive
             Raw data: depend on manufacturer                                                             May reduce performance
             Often:                                                                                       Introduce additional stress on drive
                      Logging of extreme values
                      Manufacturer-defined attributes




7–34                      M. Werner          Dependable Systems    Winter term 2012/2013     7–35                      M. Werner          Dependable Systems    Winter term 2012/2013
       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  Using S.M.A.R.T.                                                                           Field Study by Google:




                                                                                                        More than 100.000 disks of different manufacturers
             S.M.A.R.T. supports system-wide failure prediction and prevention
                                                                                                        80-400 GB, PATA and SATA
             S.M.A.R.T. delivers warnings but no actions
                                                                                                        Disks up to five years old
       ¬ User or operating system are responsible for actions
                                                                                                        Continous operation: 24/7 in server rooms
             S.M.A.R.T. is not infallible                                                               Burn-in at beginning (not considered in study)
                      Failure may not occur, although warning
                      Failure may occur without warning                                                 Source (including figures used here):
                                                                                                        http://research.google.com/archive/disk_failures.pdf




7–36                      M. Werner          Dependable Systems    Winter term 2012/2013   7–37                      M. Werner          Dependable Systems    Winter term 2012/2013

       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  Field Study by Google: Criterias                                                           Field Study: Age of Disks




             Annual failure rate (AFR) depends on...
                      Age of disks
                      Load
                      Temperature
                      S.M.A.R.T. values
             Manufacturer is not considered
             But: Only minor dependence on disk model




7–38                      M. Werner          Dependable Systems    Winter term 2012/2013   7–39                      M. Werner          Dependable Systems    Winter term 2012/2013
       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  Field Study: Load of Disks                                                                 Field Study: Temperature




7–40                      M. Werner          Dependable Systems    Winter term 2012/2013   7–41                      M. Werner          Dependable Systems    Winter term 2012/2013

       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.             Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  Field Study: S.M.A.R.T. Attributes                                                         Predictability



                                                                                                        Correlation to actual failures exists
                                                                                                        But:
                                                                                                                 56% of failed disks do not show errors in four most important
                                                                                                                 counters
                                                                                                                 36% of failed disks do not have any error in S.M.A.R.T. data

                                                                                                        Conclusion:
                                                                                                                 S.M.A.R.T. alone is not sufficient for reliable failure prediction
                                                                                                                 Other data (e.g., from operating system or performance data) has to
                                                                                                                 be considered too




7–42                      M. Werner          Dependable Systems    Winter term 2012/2013   7–43                      M. Werner          Dependable Systems    Winter term 2012/2013
       Examples for Concrete Techniques: Hard Disks        Motivation RAID S.M.A.R.T.


  Literatur



            [PGK88]                David A. Patterson, Garth Gibson, and Randy H. Katz. “A case
                                   for redundant arrays of inexpensive disks (RAID)”. In: SIGMOD
                                   ’88: Proceedings of the 1988 ACM SIGMOD international
                                   conference on Management of data. Chicago, Illinois, United
                                   States: ACM, 1988, pp. 109–116. isbn: 0-89791-268-3. doi:
                                   http://doi.acm.org/10.1145/50202.50214

            [PWB07]                Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz A. Barroso.
                                   “Failure Trends in a Large Disk Drive Population”. In:
                                   Proceedings of the 5th USENIX Conference on File and Storage
                                   Technologies (FAST’07). 2007, pp. 17–28. url:
                                   http://www.usenix.org/events/fast07/tech/pinheiro.html




7–44                      M. Werner          Dependable Systems    Winter term 2012/2013

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:4/22/2013
language:Latin
pages:12
jiang lifang jiang lifang
About