Docstoc

Parity Lost and Parity Regained FAST Talk

Document Sample
Parity Lost and Parity Regained FAST Talk Powered By Docstoc
					  Parity Lost and Parity Regained
                 Andrew Krioukov,
           Lakshmi N. Bairavasundaram,
Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
           University of Wisconsin - Madison


  Garth R. Goodson, Kiran Srinivasan, Randy Thelen
                Bare-bones RAID
•   Stripe data across multiple drives
•   Store redundant parity data
•   Can reconstruct data with any single disk failure
•   Will RAID protect data in all single failure cases?

          Data 1    Data 2    Data 3    Parity

            A         B         C       P(ABC)


                                                     2
          Bare-bones RAID Problems
 • Stripe contains file ABC consisting of 3 blocks
 • RAID has redundancy to recover data
 • RAID does not detect corruption
           Data 1   Data 2    Data 3      Parity

 RAID           A     B        @#$%
                                 C        P(ABC)
 Stripe
                             Corruption

Read file ABC
                      Return Corrupt File            3
      Bare-bones RAID Problems
• RAID cannot detect partial disk failures:
  – Corruptions
  – Torn writes
  – Lost writes
  – Misdirected writes
• RAID only protects against
  – Complete disk failures
  – Errors reported by the disk (e.g. Latent Sector
    Errors)
                                                      4
     Data Protection Techniques
• Need improvements to bare-bones RAID
  – Techniques needed to help detect errors
• Checksums are common
  – Many kinds: block, sector, parent checksums
• Which type of checksums are used?
• We examined real systems to determine
  protection schemes
                                                  5
            Enterprise RAID Systems
  • Mixed bag of protections
          Scrub Sector Block Parent Write Phys Logical Write
                Cksum Cksum Cksum Verify Iden Ident    Stamp
                                          t
Dell
Power-
           √      √                                     √
vault
Hitachi
Thunder
           √      √                  √
NetApp
ONTAP
           √            √            √     √     √
Sun ZFS
           √                   √                         6
                 Question
• Which errors do these systems protect against?


• How can we ensure complete data protection?


• Need method to identify all corruption & data
  loss scenarios in a design


                                                  7
       Model Checking Solution
• Create a model of storage system design using
  primitives
• Checker exhaustively searches space of all
  possible states
  – Start with clean RAID stripe
  – Apply single disk error
  – Apply any number of disk operations (e.g. write)
• Identifies all possible data loss scenarios
                                                       8
            Results Summary
• Applied model checking on enterprise RAID
  system designs
• For all designs, a single error can cause data
  loss
• Identified a common problem, parity pollution
  – Partial disk failure goes undetected
  – The erroneous data is used to compute parity
  – Recovery is no longer possible
• Presented a design that protects against all
  single failures                                  9
                    Outline
•   Introduction
•   Background: Storage Errors
•   Model Checking Approach
•   Data Protection Design & Analysis
•   Conclusion




                                        10
                Storage Errors
• Latent Sector Errors
  – Data is inaccessible
  – Explicit error code returned
  – Affect 19% of nearline, 2% of enterprise disks in 2
    years *Bairavasundaram et al. SIGMETRICS’07+
• Corruptions
  – Data is silently corrupted
  – Affect 0.6% of nearline and 0.06% of enterprise
    disks in 17 months *Bairavasundaram et al. FAST’08+
• Reality: Partial disk failures happen
                                                          11
         Storage Errors (Cont’d)
• Torn Write
  – Only part of a block is written
  – Some sectors are lost                 A
                                          B

  – Write returns success code
                                Write B       Success

• Lost Writes
  – Write returns success code
                                          A
  – Data not reflected on disk

                               Write B        Success
                                                   12
        Storage Errors (Cont’d)
• Misdirected Writes
  – Write goes to wrong location
    (either wrong block or wrong disk)
  – Combination of lost write
    and corruption
                                   A     B
                                         A’


               Overwrite A A’                Success


                                                    13
                    Outline
•   Introduction
•   Background: Storage Errors
•   Model Checking Approach
•   Data Protection Design & Analysis
•   Conclusion




                                        14
       Modeling Storage System
• Use primitives to describe:
  – On disk layout in terms of sectors
  – Data protections
• Checker uses built-in models:
  – Storage errors
  – Disk operations (e.g. Read/Write)
  – Basic RAID functionality


                                         15
             Model Checking
• Assumptions
  – Single RAID stripe
  – Single storage error
  – Single parity protection
  – Data disks are interchangeable
• Apply error followed by any number of disk
  operations
• Generate state diagram with all data loss states
                                                16
                   State Diagram Example
   • Bare-bones RAID state diagram

Corrupt(p), Torn(p),                 Parity
Lost(p), Misdir(p)                   Error

                          Wadd()              Wsub(x+)

       Clean           Corrupt(x), Torn(x),
                       Lost(x), Misdir(x) Disk x             R(x)
                                                                                 Corrupt
                                            Error                                 Data
                       Wadd(x+)
                                                  Wadd(!x)                     R(x)

                                     W(x+)                          Polluted
                                                                     Parity            17
                    Outline
•   Introduction
•   Background: Storage Errors
•   Model Checking Approach
•   Data Protection Design & Analysis
•   Conclusion




                                        18
         Data Protection Design
• Need fault tolerance for all partial failures
• Bare-bones RAID handles latent sector errors
  and complete disk failures
• Corruption is next most common failure
• Add protections cumulatively until design has
  complete protection


                                                  19
                  Protections
Protections in red will be discussed in the talk
• Scrubbing
• Sector checksums
• Block checksums
• Parental checksums
• Write verify
• Physical identity
• Logical identity
• Version mirroring
                                                   20
                Checksums
• Checksum per data block                A
                                      cksum(A)




                                         a1
• Checksum per sector                  ck(a1)
                                         a2
                                  A
                                       ck(a2)
                                        …

• Parent checksum
  – Checksum stored in parent inode
                                                 21
                    Checksum Example
  • Corruption scenario is now fixed
            Data 1     Data 2        Data 3      Parity

                A         B            C
                                      @#$%       P(ABC)
            cksum(A)   cksum(B)      cksum(C)    cksum(P)
                                    Corruption
Read file ABC
                                  Perform reconstruction

                                           C
File is valid                                               22
          Checksum Problems
• Great for protecting against corruption errors
• Fails to protect when data and checksum are
  lost together:
  – Lost write (with any type of checksums)
  – Torn write (only with sector checksums)
• Parity pollution can occur



                                                   23
  Checksum Problems – Lost Write
• Block checksums
          Data 1     Data 2       Data 3     Parity

             A          B             C      P(ABC’)
                                             P(ABC)
          cksum(A)   cksum(B)     cksum(C)   cksum(P)

                         Lost Write
Overwrite C→C’
Read file ABC’
                                Return data (ABC)
Return Corrupt Data (C instead of C’)
                 Write Verify
• Attempt to solve lost write problem
• Costly solution, expect good protection
• Procedure:
  1. Write data to disk
  2. Read back to verify
                                                    C
                                                    C’
  3. If lost write detected, write again
                                              cksum(C)
                                              cksum(C’)
     or remap to new location
                                       Lost Write
                      Overwrite C→C’
                        Read back (C)
    Lost write detected, write C’ again              Success
                                                         25
         Write Verify Problems
• Protects against lost writes
• Susceptible to misdirected writes
  – Cannot detect/recover the overwritten data




                                                 26
     Write Verify – Misdirected Write
                    Data 1      Data 2     Data 3     Parity

                       X
                       X’          Y          Z       P(XYZ)
                                                      P(X’YZ)
      Misdirected
            Write      A
                       X’          B          C       P(ABC)
                    cksum(X’)
                    cksum(A)    cksum(B)   cksum(C)   cksum(P)
 Initially…
Overwrite X→X’
   Read back X
Lost, Re-write X
Later…
   Read file ABC

  Return Corrupt Data (A has been corrupted)               27
             Physical Identity
• Protection against misdirected writes
• Store disk & block number of destination in
  each block

            Misdirected        A    1    Data, Block Number
                  Write
Overwrite Block 1:             B
                               A’   1
                                    2
           A A’
      Read Block 2
    Returned (A’, 1)
                          Block num does not match (1≠2)
                          Misdirected Write Detected
                                                28
             Problem Solved?
• Write verify with block checksums and
  physical identity offers complete protection
• But… twice the I/O cost!
• Need a more efficient solution




                                                 29
              Logical Identity
• Less expensive protection against lost writes
• Store file identifier (e.g. inode number) in
  each data block
• Test that file identifier          A
  matches on a read              cksum(A)
                                 File 0
       Overwrite File 0
                         Lost Write
         with File 1 (X)
            Read File 1
                         Logical ID does not match.
                         Lost Write Detected 30
       Logical Identity Problem
• Cannot be verified when re-computing parity
  – Not reading a file

• Parity pollution may occur




                                                31
            Parity Pollution Example
    What should       Data 1      Data 2        Data 3     Parity
   be on the disk    A’ File0
                     A File2      B File2
                                  B’ File0     C’ File0
                                               C File1     P(A’B’C’)
                                                            P(ABC)
                                                           P(ABC’)

                         A’
                         A           B’
                                     B             C       P(ABC’)
                                                           P(ABC)
                      cksum(A’)
                      cksum(A)    cksum(B’)
                                  cksum(B)      cksum(C)   cksum(P)
                       File 0
                            2      File 0
                                        2        File 0
Write File 1                          Lost Write
 C→C’, New Parity
Later… Write File 2                        Parity consistent with
Overwrite AB →A’B’                         invalid data
Parity: Read Data 3       A’          B’                     P(A’B’C)
Later… Read File 1
 Logical ID mismatch (File 0 ≠ File 1)
 Reconstruct… Data is consistent!
                                               Report Data Loss
                  Version Mirroring
•   Lost write protection
•   Verifiable at RAID level
•   Store a version number in each data block
•   Mirror the version numbers on parity disk
•   Versions numbers verified on read

            A           B          C       P(ABC)
         cksum(A)    cksum(B)   cksum(C)   cksum(P)
           Ver0        Ver0      Ver0       0,0,0

                                                      33
            Parity Pollution Solved
   What should        Data 1      Data 2         Data 3        Parity
                   A
                   A’     Ver0
                          Ver1    B
                                  B’     Ver0
                                         Ver1   C
                                                C’      Ver0
                                                        Ver1   P(A’B’C’)
                                                               P(ABC’)
                                                                P(ABC)
  be on the disk

                         A’
                         A              B’
                                        B             C’
                                                      C        P(ABC’)
                                                               P(ABC)
                      cksum(A’)
                      cksum(A)    cksum(B’)
                                  cksum(B)      cksum(C’)
                                                cksum(C)       cksum(P)
                        Ver0
                        Ver1           Ver0
                                       Ver 1          Ver1
                                                      Ver0       0,0,0
                                                                 1,1,1
                                                                 0,0,1
Write File 1                             Lost Write
 C→C’, New Parity
Later… Write File 2
Overwrite AB →A’B’
Parity: Read Data 3
 Version mismatch         A                 B    P(ABC’)           C’
Reconstruct Data 3
                        A’             B’                       P(A’B’C’)
     Problem Solved… Efficiently
• Version mirroring with block checksums and
  physical identity provides complete protection
• Use with logical identity for efficiency
• More efficient than write verify




                                               35
                   Conclusion
• Applied model checking on real system designs
  – For all designs, a single error can cause data loss
  – Parity pollution is a common problem
  – Version mirroring is a key technique to offering
    complete and efficient data protection


• Partial failures are complex, no obvious data
  protection solution
  – Model checking is useful

                                                          36
  ADvanced Systems Laboratory
      www.cs.wisc.edu/adsl




        Advanced Technology Group
http://www.netapp.com/company/research/
                                          37

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:17
posted:6/15/2011
language:English
pages:37