Docstoc

Kenneth

Document Sample
Kenneth Powered By Docstoc
					Cooperative regenerating codes
for distributed storage systems

            Kenneth Shum
    (Joint work with Yuchong Hu)
            22nd July 2011
             Multiple node failures
• Large-scale storage system
      – Google data center, example from Kannan’s talk.
      – 800000 servers, fail rate = 4% per year
      – Repair in 2 days
      – Mean number of failed servers in 2 days = 175.
• The lazy-repair policy in TotalRecall
      – A repair process is triggered only after the number
        of failed nodes has reached a certain threshold.

Jul, 2011                    kshum                        2
            Jointly repair multiple failures
              Storage nodes           Newcomers


                                               Data exchange




                                       Can we further reduce the
                                       repair-bandwidth?




                                      Hu et al. (JSAC, Feb 2010)
Jul, 2011                     kshum                                3
             Distributed storage (erasure coding)
                                          Wu, Dimakis ISIT09

                        A1
                        A2

A1, A2,
B1, B2
                        B1
                        B2
                                           Data
                                         Collector
                       A1+B1
                       2 A2+B2



                       2 A1+B1
                       A2+B2
 Jul, 2011                       kshum                   4
             Naive Repair

              A1                              A1
              A2                              A2

A1, A2,
B1, B2
              B1
              B2


             A1+B1             4 packets required.
             2 A2+B2



             2 A1+B1
             A2+B2
 Jul, 2011             kshum                       5
          Repair with ``code alignment’’

                    A1                    A1
                    A2                    A2

A1, A2,
B1, B2
                    B1
                    B2


                   A1+B1
                   2 A2+B2
                                       packets
                                     3 Solve: required.
                                       P1 = A1+2 A2
                   2 A1+B1             P2 = 2 A1+ A2
                   A2+B2
 Jul, 2011                   kshum                  6
     Multiple failures, separate repair
                                 8 packets in total
                A1               4 packets per newcomer
                A2

A1, A2,
B1, B2                                            B1
                B1                                B2
                B2


               A1+B1
               2 A2+B2

                                                2 A1+B1
                                                A2+B2
               2 A1+B1
               A2+B2
 Jul, 2011               kshum                         7
  Multiple failures, cooperative repair (I)
                                 6 packets in total
               A1                3 packets per newcomer
               A2

A1, A2,
B1, B2                                              B1
                B1
                B2                                  B2



              A1+B1                                B1,B2
              2 A2+B2


                                              2 A1+B1
               2 A1+B1                        A2+B2
               A2+B2
  Jul, 2011              kshum                             8
  Multiple failures, cooperative repair (II)
                                 6 packets in total
                A1               3 packets per newcomer
                A2

A1, A2,
B1, B2                             A1             B1
                B1
                                   A1+B1
                B2                                B2


               A1+B1
               2 A2+B2

                                  A2            2A1+B1
               2 A1+B1            2A2+B2
                                                A2+B2
               A2+B2
  Jul, 2011              kshum                           9
                Outline of the talk
• Is it optimal in terms of repair-bandwidth?
• What is the tradeoff between storage and
  repair-bandwidth for cooperative repair?
• Can we achieve the Pareto-optimal operating
  points on the tradeoff curve by linear network
  coding?
      – Exact repair
      – Functional repair

Jul, 2011                   kshum              10
                Information flow graph
                                                             
                 In1       Out1              In6        Mid6       Out6
                                                   2
                      
                 In2       Out2    1              2          
                                       1   In7  Mid7            Out7
                                 1
 S               In3       Out3     1
                                                                      

                                 1
                 In4       Out4        1
                                                                      Data
                                                                  Collector
                 In5       Out5
Jul, 2011                          kshum                                    11
   Is this regenerating code optimal ?
                                6 packets in total
               A1               3 packets per newcomer
               A2

A1, A2,
B1, B2                            A1             A1
               B1
                                  A1+B1
               B2                                B2


              A1+B1
              2 A2+B2

                                 A2            2A1+B1
              2 A1+B1            2A2+B2
                                               A2+B2
              A2+B2
  Jul, 2011             kshum                          12
                           First cut
                                                               
                In1       Out1                 In6        Mid6       Out6
                                                     2
                                  1                                       
                In2       Out2                       2          
                                    1        In7        Mid7       Out7
                                                     
                      
 B              In3       Out3 1                                       
                                         1
                      
                In4       Out4
                                              B  4 1                  Data
                                                                      Collector


Jul, 2011                           kshum                                     13

                               Second cut
                                              2
    Out1                In1        Mid1            Out1            
                              2                                              Data
             1
                                                                            Collector
    Out2                      2           2
                   1 In
                         2
                              
                                   Mid2            Out2

    Out3 1                                        1                  1          
                   1

    Out4                                                In3        Mid3         Out3
                                                              2

                                                              2            
                                                        In4        Mid4         Out4
                                                               
            B  2+1+ 2
Jul, 2011                                 kshum                                         14
       A linear programming problem
• Minimize 21+ 2 (repair bandwidth)
• Subject to
      4  41
                                 2
      4  2+1 + 2
      1 , 2  0
                                 1


                                            1
 1  1  2  1                         1

                              At least 3 packets
Jul, 2011                kshum                   15
    Non-homogeneous download traffic
                                                              
                In1       Out1                In6        Mid6       Out6
                                                    2
                                  a                                      
                In2       Out2                      2          
                                    b       In7        Mid7       Out7
                                                    
                      
 B              In3       Out3 c                                      
                                         d
                      
                In4       Out4
                                 B  a +b + c +d                   Data
                                                                     Collector


Jul, 2011                           kshum                                    16

                  Non-homogeneous traffic
                                       2
    Out1              In1        Mid1       Out1                                         Data
                            2
           1                                                                          Collector
    Out2                    2                          e
                 1 In                  2
                       2
                            
                                 Mid2       Out2

    Out3 1                                             f                    f        
                                                   g
                 1                          h
                        B  2+f +j                         In3             Mid3       Out3
                                                                  i
    Out4
                                                                    j              
                                                             In4             Mid4       Out4
                                                                    




Jul, 2011                                    kshum                                              17

                  Non-homogeneous traffic
                                           2
    Out1              In1        Mid1       Out1                                             Data
                            2
           1                                                                              Collector
    Out2                    2                               e
                 1 In                  2
                       2
                            
                                 Mid2       Out2
                                                                                                  
    Out3 1                                                  f                   f
                                                        g
                 1                             h
                        B  2+f +j                              In3            Mid3       Out3
                                                                       i
    Out4                B  2+h +i
                                                                        j              
                                                                  In4            Mid4       Out4
                                                                        




Jul, 2011                                       kshum                                               18

                  Non-homogeneous traffic
                                           2
    Out1              In1        Mid1           Out1                                            Data
                            2                                                 
           1                                                                                Collector
    Out2                    2          2                     e
                 1 In
                       2
                            
                                 Mid2           Out2

    Out3 1                                                   f                    f        
                                                         g
                 1                              h
                        B  2+f +j                               In3             Mid3       Out3
                                                                        i
    Out4                B  2+h +i
                        B  2+e +j                                      j              
                                                                   In4             Mid4       Out4
                                                                          




Jul, 2011                                        kshum                                                19

                  Non-homogeneous traffic
                                       2
    Out1              In1        Mid1       Out1                                         Data
                            2                                            
           1                                        e
                                                                                       Collector
    Out2                    2      2
                 1 In
                       2
                            
                                 Mid2       Out2                                             

    Out3 1                                               f                  f
                                                     g
                 1                          h
                        B  2+f +j                           In3        Mid3         Out3
                                                                    i
    Out4                B  2+h +i
                        B  2+e +j                                 j            
                                                               In4        Mid4         Out4
                        B  2+g +i                                 




Jul, 2011                                    kshum                                             20
            The same LP problem
• Minimize
• Subject to




                            1


                                1

                         At least 3 packets
Jul, 2011           kshum                  21
   TRADEOFF BETWEEN
   STORAGE AND REPAIR-BANDWIDTH

Jul, 2011       kshum             22
                           Storage vs Repair-bandwidth  (S., ICC 2011, Kermarrec, Le Scouamec and Straub, Netcod 2011.)
                   140


                   135                                                                          File size = 420
                                       One-by-one repair                                        d=8
                   130
                                                                                                k=4
Storage per node




                   125


                   120


                   115


                   110


                   105

                               Repairing 3 newcomers jointly
                   100
                     120         130         140         150        160        170        180

                    d                  Repair bandwidth per failed node
                                        k
                                                   DC
Jul, 2011                                                       kshum                                            23
                            Fair comparison?                                 repair degree = 8


              One-by-one repair                                  Cooperative repair




                                                   Surviving nodes
 Surviving nodes




                   Number of connections
                                                                     Number of connections
                   per each newcomer = 8
                                                                     per each newcomer = 8+2

Jul, 2011                                  kshum                                           24
                                      MBCR and MSCR
                                      140

Minimum bandwidth     135
cooperative repair (MBCR)
                                      130
                   Storage per node
                                      125


                                      120


                                      115
                                                              One-by-one repair
                                      110
              Cooperative repair
                                      105


                                      100
                                        120   130         140        150     160       170   180
                                                    Repair bandwidth per failed node
                                                              Minimum storage
                                                              cooperative repair (MSCR)
 Jul, 2011                                            kshum                                  25
                            How much can we improve?
                      500

                                                                                    File size = 2275
                      490              One-by-one repair                            d = 30
Storage per node, 




                                                                                    k=5
                      480
                                                                                    When d is large,
                                                                                    joint repair does not have
                      470
                                                                                    significant advantage over
                                                                                    one-by-one repair.
                      460


                      450
                                Repairing 10 newcomers jointly
                        480      490     500     510     520      530   540   550
                                       Repair bandwidth per failed node
                      d
                                       k
                                              DC
Jul, 2011                                                   kshum                                       26
                                  How much can we improve?
                      200

                      190                    One-by-one repair                      File size = 616
Storage per node, 




                      180
                                                                                    d=8
                                                                                    k=4
                      170

                      160

                      150
                                  180       200          220          240    260
                                        Repair bandwidth per failed node

                         Repairing 10 newcomers jointly                     Repair-bandwidth reduction
                                                                            is more prominent
                                                                            when d is not so large.
                            d
                                            k
                                                   DC
                      Jul, 2011                                  kshum                               27
   AN EXPLICIT CONSTRUCTION FOR
   MINIMUM-BANDWIDTH
   COOPERATIVE REPAIR
Jul, 2011         kshum           28
   An explicit construction for MBCR
                                         (S., Hu, ISIT 2011.)
     Require d = k, r = n–d
• B = 8 information
  packets
                          • Minimum repair-
• n = 4 nodes              bandwidth
• Each node stores 5
  packets.
• Repair r = 2 failures
  simultaneously
                          • Storage per node
• No. of connections
  for each DC = k=2
• No. of helpers for
  each failed node =d=2

Jul, 2011                    kshum                              29
                                         Min-Bandwidth point
                                6



                               5.5
            Storage per node




                                5



                               4.5



                                4

 Repairing 2 new nodes cooperatively
                               3.5



                                     5    5.5   6       6.5        7         7.5     8   8.5   9
                                                    Repair bandwidth per failed node

Jul, 2011                                                     kshum                                30
              Data Distribution
                                                    XOR
                                    A, B, C, D, F+G


                                     C, D, E, F, H+A
  8 data packets:
  A, B, C, D, E, F, G, H
                                     E, F, G, H, B+C


                                    G, H, A, B, D+E


                            5 packets: 4 systematic, 1 parity-check
Jul, 2011                  kshum                                      31
                      Data collection

            A, B, C, D, F+G


            C, D, E, F, H+A
                                            Data
                                          collector
            E, F, G, H, B+C
                                        A,B,C,D,E,F,G,H
            G, H, A, B, D+E



Jul, 2011                     kshum                   32
                      Data collection

            A, B, C, D, F+G                       Data
                                                collector
            C, D, E, F, H+A                 AB C DE F GH
                                        A
                                        B
            E, F, G, H, B+C             C
                                        D
                                        E
                                        F
            G, H, A, B, D+E
                                      F+G
                                      H+A



Jul, 2011                     kshum                         33
                          Exact Repair                How to
                                                      repair?


       A, B, C, D, F+G                   A B C D F+G

        C, D, E, F, H+A
                                         B+C        F+G

        E, F, G, H, B+C                  E F G H B+C

       G, H, A, B, D+E

                                       Total repair-bandwidth=10
Jul, 2011                      kshum                               34
                          Exact Repair             How to
                                                   repair?


       A, B, C, D, F+G


        C, D, E, F, H+A                       E F
                                          C D D+EH+A
                                            E       F
        E, F, G, H, B+C
                                          F+G
                                          E F G H B+C
                                             F

       G, H, A, B, D+E

                                       Total repair-bandwidth=10

Jul, 2011                      kshum                               35
                                         Min-Bandwidth point
                                6



                               5.5
            Storage per node




                                5



                               4.5



                                4

 Repairing 2 new nodes cooperatively
                               3.5



                                     5    5.5   6       6.5        7         7.5     8   8.5   9
                                                    Repair bandwidth per failed node

Jul, 2011                                                     kshum                                36
   AN EXPLICIT CONSTRUCTION FOR
   MINIMUM-STORAGE COOPERATIVE
   REPAIR
Jul, 2011        kshum            37
   An explicit construction for MSCR
   Require d = k                               (S. ICC 2011.)


• B = 6 information      • Minimum repair-
  packets
• n nodes
                            bandwidth
• Each node stores 2
  packets.
• Repair r = 2 failures
  simultaneously          • Storage per node
• No. of connections
  for each DC = k=3
• No. of helpers for
  each failed node =d=3

Jul, 2011                    kshum                         38
                                      The min-storage point
                                  7                                                   3
                                                                                          
                                  6
            Storage per node, 




                                                                                              3
                                  5
                                                                                                  DC

                                  4                         Non-cooperative
                                                                                    k=3,d=3,
                                  3
                                                                                    r =2,B=6
                                  2                                              storage cost
    Cooperative                                                                    per node = 2
                                  1
                                   1       2     3      4     5     6        7   repair bandwidth
                                      Repair bandwidth per failed node, d         per node = 4
Jul, 2011                                                   kshum                                   39
                         Data retrieval
                                 MDS code with dimension k=3
  Source data

                                                       codeword
                encode
                                                       codeword



                                                         =2
       Storage nodes                    ……


             Data collector
                                           decode
Jul, 2011                     kshum                            40
                       Repair : phase 1
  Source data
                                                       codeword
                encode
                                                       codeword




                            lost
                                       lost
       Storage nodes




                            newcomers
                decode                        decode


Jul, 2011                      kshum                          41
                            Repair: phase 2
                                                               codeword
                    encode
                                                               codeword



             Storage nodes


                                 lost
                                            lost
Repair bandwidth per node
= 8/2 = 4


                                 newcomers
                   Re-encode                       Re-encode
                                 exchange
 Jul, 2011                          kshum                             42
            The construction is optimal
                                  7                                                   3
                                                                                          
                                  6
            Storage per node, 




                                                                                              3
                                  5
                                                                                                  DC

                                  4                         Non-cooperative
                                                                                    k=3,d=3,
                                  3
                                                                                    r =2,B=6
                                  2                                              storage cost
    Cooperative                                                                    per node = 2
                                  1
                                   1       2     3      4     5     6        7   repair bandwidth
                                      Repair bandwidth per failed node, d         per node = 4
Jul, 2011                                                   kshum                                   43
   EXISTENCE OF COOPERATIVE
   REGENERATING CODES UNDER
   FUNCTIONAL REPAIR
Jul, 2011        kshum        44
              Existence of optimal linear
            regenerating codes in general
                                                (S., Hu, Netcod 2011.)
• Sustainable storage system
      – Will it work after arbitrarily many repairs?
• Technical difficulty: The information flow
  graph is unbounded.
• Can we work over a fixed finite field, for
  unlimited number of regenerations?
      – Yes if we can construct an exact regenerating code.
      – The answer is also “yes” for cooperative functional
        repair in general.
Jul, 2011                     kshum                              45
                           Trellis structure
                                                                                   …
                                                                                   …
                                                                                   …
                                                                                   …
                     Stage 0           Stage 1                  Stage 2

      m
Message vector
(row vector)          mT0              mT0T1                 mT0T1T2
                 T0 is the “transfer   T1 is the “transfer   T2 is the “transfer
                 matrix” in stage 0    matrix” in stage 1    matrix” in stage 2

  Jul, 2011                               kshum                                    46
            Flow in information flow graph
                                                  5                             
                                                                                 4
                           In1              Mid1       Out1
            Out1
                                    0
                                    1                                                                  DC
    5              2
                   2
                                    1
                                    1                              1
                                                                   2
        5                                          3
                                                   5           0
                                                               2
S           Out2              In2           Mid2       Out2                                      
                                                                                                 4
                                        
        4
        5          2
                   2
                          2                                        2
                                                                   2             3
                                                                                           4
                                                                                            5
            Out3                                       Out3            In3           Mid3       Out3
                                                               2
                                                               1             1
    5                                                                        0
    4          2
                       The cut-set bound                                     1
                       says that the cut                                                    5
            Out4                                       Out4            In4           Mid4       Out4
                       capacity is at least 8.                                   
                       Can we construct
                       a flow with value 8?
Jul, 2011                                              kshum                                           47
              Cross-sectional flow pattern
                                                5                                 
                                                                                  4
            Out1   0       In1
                                 0
                                 1
                                         Mid1       Out1
                                                                5                            4          DC
    5              2
                   2
                                 1
                                 1                              1
                                                                2
        5
            Out2   0       In2           Mid2
                                                3
                                                5
                                                    Out2
                                                            0
                                                            2
                                                                                                  
                                                                                                        4
S                                                              3                                 4
        4
        5          2
                   2
                       2                                            2
                                                                    2             3
                                                                                            4
            Out3   4                                Out3
                                                            2
                                                            1   0       In1
                                                                              1
                                                                              0
                                                                                      Mid1       Out1   0
    5
    4          2
                                                                              1
            Out4   4                                Out4        0       In2           Mid2
                                                                                             5
                                                                                                 Out2       0
                                                                                  


Jul, 2011                                           kshum                                               48
       A recursive construction of flow
     Stage s                 Stage s+1        1. Identify a set of cross-
                                                 section flow pattern, say H.

                In1   Mid1    Out1            2. For any cross-section flow
g1                                       h1      pattern (h1, h2, h3, h4) in H
                                                 stage s+1, we can find a
                                                 flow in this segment of
                                                 graph, such that
g2              In2   Mid2     Out2
                                         h2      (g1, g2, g3, g4) is also in H.

                                              3. Each pattern corresponds
                                                 to a submatrix of the
g3       Out3                  Out3      h3      transfer matrix.

                                              4. By Schwartz-Zippel lemma,
                                                 we can find the local
                                                 encoding vectors so that all
g4       Out4                  Out4      h4      such determinants are non-
                                                 zero, if the finite field is
                                                 sufficiently large.

 Jul, 2011                     kshum                                         49
                       Summary
• Multiple node failures in medium-scale to
  large-scale storage system
• Formulation as a linear program
• Functional repair: Linear regenerating code
  over fixed finite field which matches the cut-
  set bound on repair-bandwidth exists.
• Exact repair: two families of explicit code
  constructions
      – Minimum-bandwidth point: d=k, r = n – d
      – Minimum-storage point: d=k, r arbitrary

Jul, 2011                    kshum                50
                                References
•    Y. Wu and A. G. Dimakis, Reducing repair traffic for erasure coding-based storage
     via interference alignment, ISIT, Jul, 2009.

•    Y. Hu, Y. Xu, X. Wang, C. Zhan and P. Li, Cooperative recovery of distributed storage
     systems from multiple losses with network coding, J. Sel. Area Comm., vol. 28, no.
     2, pp.268-275, Feb, 2010.

•    K. W. Shum, Cooperative Regenerating Codes for Distributed Storage Systems, ICC,
     Jun, 2011.

•    A.-M. Kermarrec and N. Le Scouarnec and G. Straub, Repairing Multiple Failures
     with Coordinated and Adaptive Regenerating Codes, Netcod, Jul, 2011.

•    K. W. Shum and Y. Hu, Existence of Minimum-Repair-Bandwidth Cooperative
     Regenerating Codes, Netcod, Jul, 2011.

•    K. W. Shum and Y. Hu, Exact Minimum-Repair-Bandwidth Cooperative
     Regenerating Codes for Distributed Storage Systems, ISIT, Aug, 2011.
Jul, 2011                                  kshum                                         51

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:15
posted:12/2/2011
language:English
pages:51