Embed
Email

Kenneth

Document Sample

Shared by: cuiliqing
Categories
Tags
Stats
views:
4
posted:
12/2/2011
language:
English
pages:
51
Cooperative regenerating codes

for distributed storage systems



Kenneth Shum

(Joint work with Yuchong Hu)

22nd July 2011

Multiple node failures

• Large-scale storage system

– Google data center, example from Kannan’s talk.

– 800000 servers, fail rate = 4% per year

– Repair in 2 days

– Mean number of failed servers in 2 days = 175.

• The lazy-repair policy in TotalRecall

– A repair process is triggered only after the number

of failed nodes has reached a certain threshold.



Jul, 2011 kshum 2

Jointly repair multiple failures

Storage nodes Newcomers





Data exchange









Can we further reduce the

repair-bandwidth?









Hu et al. (JSAC, Feb 2010)

Jul, 2011 kshum 3

Distributed storage (erasure coding)

Wu, Dimakis ISIT09



A1

A2



A1, A2,

B1, B2

B1

B2

Data

Collector

A1+B1

2 A2+B2







2 A1+B1

A2+B2

Jul, 2011 kshum 4

Naive Repair



A1 A1

A2 A2



A1, A2,

B1, B2

B1

B2





A1+B1 4 packets required.

2 A2+B2







2 A1+B1

A2+B2

Jul, 2011 kshum 5

Repair with ``code alignment’’



A1 A1

A2 A2



A1, A2,

B1, B2

B1

B2





A1+B1

2 A2+B2

packets

3 Solve: required.

P1 = A1+2 A2

2 A1+B1 P2 = 2 A1+ A2

A2+B2

Jul, 2011 kshum 6

Multiple failures, separate repair

8 packets in total

A1 4 packets per newcomer

A2



A1, A2,

B1, B2 B1

B1 B2

B2





A1+B1

2 A2+B2



2 A1+B1

A2+B2

2 A1+B1

A2+B2

Jul, 2011 kshum 7

Multiple failures, cooperative repair (I)

6 packets in total

A1 3 packets per newcomer

A2



A1, A2,

B1, B2 B1

B1

B2 B2







A1+B1 B1,B2

2 A2+B2





2 A1+B1

2 A1+B1 A2+B2

A2+B2

Jul, 2011 kshum 8

Multiple failures, cooperative repair (II)

6 packets in total

A1 3 packets per newcomer

A2



A1, A2,

B1, B2 A1 B1

B1

A1+B1

B2 B2





A1+B1

2 A2+B2



A2 2A1+B1

2 A1+B1 2A2+B2

A2+B2

A2+B2

Jul, 2011 kshum 9

Outline of the talk

• Is it optimal in terms of repair-bandwidth?

• What is the tradeoff between storage and

repair-bandwidth for cooperative repair?

• Can we achieve the Pareto-optimal operating

points on the tradeoff curve by linear network

coding?

– Exact repair

– Functional repair



Jul, 2011 kshum 10

Information flow graph

  

In1 Out1 In6 Mid6 Out6

2

 

In2 Out2 1 2 

 1 In7  Mid7 Out7

 1

S In3 Out3 1





 1

In4 Out4 1

Data

  Collector

In5 Out5

Jul, 2011 kshum 11

Is this regenerating code optimal ?

6 packets in total

A1 3 packets per newcomer

A2



A1, A2,

B1, B2 A1 A1

B1

A1+B1

B2 B2





A1+B1

2 A2+B2



A2 2A1+B1

2 A1+B1 2A2+B2

A2+B2

A2+B2

Jul, 2011 kshum 12

First cut

  

In1 Out1 In6 Mid6 Out6

2

  1 

In2 Out2 2 

 1 In7 Mid7 Out7





B In3 Out3 1 

1



In4 Out4

B  4 1 Data

Collector





Jul, 2011 kshum 13



Second cut

 2

Out1 In1 Mid1 Out1 

2 Data

 1

Collector

Out2 2 2

1 In

2



Mid2 Out2



Out3 1 1 1 

1



Out4 In3 Mid3 Out3

2



2 

In4 Mid4 Out4



B  2+1+ 2

Jul, 2011 kshum 14

A linear programming problem

• Minimize 21+ 2 (repair bandwidth)

• Subject to

4  41

2

4  2+1 + 2

1 , 2  0

1





1

1  1  2  1 1



 At least 3 packets

Jul, 2011 kshum 15

Non-homogeneous download traffic

  

In1 Out1 In6 Mid6 Out6

2

  a 

In2 Out2 2 

 b In7 Mid7 Out7





B In3 Out3 c 

d



In4 Out4

B  a +b + c +d Data

Collector





Jul, 2011 kshum 16



Non-homogeneous traffic

 2

Out1 In1 Mid1 Out1  Data

2

 1 Collector

Out2 2 e

1 In 2

2



Mid2 Out2



Out3 1 f f 

g

1 h

B  2+f +j In3 Mid3 Out3

 i

Out4

j 

In4 Mid4 Out4











Jul, 2011 kshum 17



Non-homogeneous traffic

 2

Out1 In1 Mid1 Out1  Data

2

 1 Collector

Out2 2 e

1 In 2

2



Mid2 Out2

 

Out3 1 f f

g

1 h

B  2+f +j In3 Mid3 Out3

 i

Out4 B  2+h +i

j 

In4 Mid4 Out4











Jul, 2011 kshum 18



Non-homogeneous traffic

 2

Out1 In1 Mid1 Out1 Data

2 

 1 Collector

Out2 2 2 e

1 In

2



Mid2 Out2



Out3 1 f f 

g

1 h

B  2+f +j In3 Mid3 Out3

 i

Out4 B  2+h +i

B  2+e +j j 

In4 Mid4 Out4











Jul, 2011 kshum 19



Non-homogeneous traffic

 2

Out1 In1 Mid1 Out1 Data

2 

 1 e

Collector

Out2 2 2

1 In

2



Mid2 Out2 



Out3 1 f f

g

1 h

B  2+f +j In3 Mid3 Out3

 i

Out4 B  2+h +i

B  2+e +j j 

In4 Mid4 Out4

B  2+g +i 









Jul, 2011 kshum 20

The same LP problem

• Minimize

• Subject to









1





1



 At least 3 packets

Jul, 2011 kshum 21

TRADEOFF BETWEEN

STORAGE AND REPAIR-BANDWIDTH



Jul, 2011 kshum 22

Storage vs Repair-bandwidth (S., ICC 2011, Kermarrec, Le Scouamec and Straub, Netcod 2011.)

140





135 File size = 420

One-by-one repair d=8

130

k=4

Storage per node









125





120





115





110





105



Repairing 3 newcomers jointly

100

120 130 140 150 160 170 180



d Repair bandwidth per failed node

 k

DC

Jul, 2011 kshum 23

Fair comparison? repair degree = 8





One-by-one repair Cooperative repair









Surviving nodes

Surviving nodes









Number of connections

Number of connections

per each newcomer = 8

per each newcomer = 8+2



Jul, 2011 kshum 24

MBCR and MSCR

140



Minimum bandwidth 135

cooperative repair (MBCR)

130

Storage per node

125





120





115

One-by-one repair

110

Cooperative repair

105





100

120 130 140 150 160 170 180

Repair bandwidth per failed node

Minimum storage

cooperative repair (MSCR)

Jul, 2011 kshum 25

How much can we improve?

500



File size = 2275

490 One-by-one repair d = 30

Storage per node, 









k=5

480

When d is large,

joint repair does not have

470

significant advantage over

one-by-one repair.

460





450

Repairing 10 newcomers jointly

480 490 500 510 520 530 540 550

Repair bandwidth per failed node

d

 k

DC

Jul, 2011 kshum 26

How much can we improve?

200



190 One-by-one repair File size = 616

Storage per node, 









180

d=8

k=4

170



160



150

180 200 220 240 260

Repair bandwidth per failed node



Repairing 10 newcomers jointly Repair-bandwidth reduction

is more prominent

when d is not so large.

d

 k

DC

Jul, 2011 kshum 27

AN EXPLICIT CONSTRUCTION FOR

MINIMUM-BANDWIDTH

COOPERATIVE REPAIR

Jul, 2011 kshum 28

An explicit construction for MBCR

(S., Hu, ISIT 2011.)

Require d = k, r = n–d

• B = 8 information

packets

• Minimum repair-

• n = 4 nodes bandwidth

• Each node stores 5

packets.

• Repair r = 2 failures

simultaneously

• Storage per node

• No. of connections

for each DC = k=2

• No. of helpers for

each failed node =d=2



Jul, 2011 kshum 29

Min-Bandwidth point

6







5.5

Storage per node









5







4.5







4



Repairing 2 new nodes cooperatively

3.5







5 5.5 6 6.5 7 7.5 8 8.5 9

Repair bandwidth per failed node



Jul, 2011 kshum 30

Data Distribution

XOR

A, B, C, D, F+G





C, D, E, F, H+A

8 data packets:

A, B, C, D, E, F, G, H

E, F, G, H, B+C





G, H, A, B, D+E





5 packets: 4 systematic, 1 parity-check

Jul, 2011 kshum 31

Data collection



A, B, C, D, F+G





C, D, E, F, H+A

Data

collector

E, F, G, H, B+C

A,B,C,D,E,F,G,H

G, H, A, B, D+E







Jul, 2011 kshum 32

Data collection



A, B, C, D, F+G Data

collector

C, D, E, F, H+A AB C DE F GH

A

B

E, F, G, H, B+C C

D

E

F

G, H, A, B, D+E

F+G

H+A







Jul, 2011 kshum 33

Exact Repair How to

repair?





A, B, C, D, F+G A B C D F+G



C, D, E, F, H+A

B+C F+G



E, F, G, H, B+C E F G H B+C



G, H, A, B, D+E



Total repair-bandwidth=10

Jul, 2011 kshum 34

Exact Repair How to

repair?





A, B, C, D, F+G





C, D, E, F, H+A E F

C D D+EH+A

E F

E, F, G, H, B+C

F+G

E F G H B+C

F



G, H, A, B, D+E



Total repair-bandwidth=10



Jul, 2011 kshum 35

Min-Bandwidth point

6







5.5

Storage per node









5







4.5







4



Repairing 2 new nodes cooperatively

3.5







5 5.5 6 6.5 7 7.5 8 8.5 9

Repair bandwidth per failed node



Jul, 2011 kshum 36

AN EXPLICIT CONSTRUCTION FOR

MINIMUM-STORAGE COOPERATIVE

REPAIR

Jul, 2011 kshum 37

An explicit construction for MSCR

Require d = k (S. ICC 2011.)





• B = 6 information • Minimum repair-

packets

• n nodes

bandwidth

• Each node stores 2

packets.

• Repair r = 2 failures

simultaneously • Storage per node

• No. of connections

for each DC = k=3

• No. of helpers for

each failed node =d=3



Jul, 2011 kshum 38

The min-storage point

7 3



6

Storage per node, 









3

5

DC



4 Non-cooperative

k=3,d=3,

3

r =2,B=6

2 storage cost

Cooperative per node = 2

1

1 2 3 4 5 6 7 repair bandwidth

Repair bandwidth per failed node, d per node = 4

Jul, 2011 kshum 39

Data retrieval

MDS code with dimension k=3

Source data



codeword

encode

codeword







=2

Storage nodes ……





Data collector

decode

Jul, 2011 kshum 40

Repair : phase 1

Source data

codeword

encode

codeword









lost

lost

Storage nodes









newcomers

decode decode





Jul, 2011 kshum 41

Repair: phase 2

codeword

encode

codeword







Storage nodes





lost

lost

Repair bandwidth per node

= 8/2 = 4





newcomers

Re-encode Re-encode

exchange

Jul, 2011 kshum 42

The construction is optimal

7 3



6

Storage per node, 









3

5

DC



4 Non-cooperative

k=3,d=3,

3

r =2,B=6

2 storage cost

Cooperative per node = 2

1

1 2 3 4 5 6 7 repair bandwidth

Repair bandwidth per failed node, d per node = 4

Jul, 2011 kshum 43

EXISTENCE OF COOPERATIVE

REGENERATING CODES UNDER

FUNCTIONAL REPAIR

Jul, 2011 kshum 44

Existence of optimal linear

regenerating codes in general

(S., Hu, Netcod 2011.)

• Sustainable storage system

– Will it work after arbitrarily many repairs?

• Technical difficulty: The information flow

graph is unbounded.

• Can we work over a fixed finite field, for

unlimited number of regenerations?

– Yes if we can construct an exact regenerating code.

– The answer is also “yes” for cooperative functional

repair in general.

Jul, 2011 kshum 45

Trellis structure









Stage 0 Stage 1 Stage 2



m

Message vector

(row vector) mT0 mT0T1 mT0T1T2

T0 is the “transfer T1 is the “transfer T2 is the “transfer

matrix” in stage 0 matrix” in stage 1 matrix” in stage 2



Jul, 2011 kshum 46

Flow in information flow graph

 5 

4

In1 Mid1 Out1

Out1

0

1 DC

5 2

2

1

1 1

2

5 3

5 0

2

S Out2 In2 Mid2 Out2 

4



4

5 2

2

2 2

2 3

 4

5

Out3 Out3 In3 Mid3 Out3

2

1 1

5 0

4 2

The cut-set bound 1

says that the cut 5

Out4 Out4 In4 Mid4 Out4

capacity is at least 8. 

Can we construct

a flow with value 8?

Jul, 2011 kshum 47

Cross-sectional flow pattern

5 

4

Out1 0 In1

0

1

Mid1 Out1

5 4 DC

5 2

2

1

1 1

2

5

Out2 0 In2 Mid2

3

5

Out2

0

2



4

S  3 4

4

5 2

2

2 2

2 3

 4

Out3 4 Out3

2

1 0 In1

1

0

Mid1 Out1 0

5

4 2

1

Out4 4 Out4 0 In2 Mid2

5

Out2 0







Jul, 2011 kshum 48

A recursive construction of flow

Stage s Stage s+1 1. Identify a set of cross-

section flow pattern, say H.



In1 Mid1 Out1 2. For any cross-section flow

g1 h1 pattern (h1, h2, h3, h4) in H

stage s+1, we can find a

flow in this segment of

graph, such that

g2 In2 Mid2 Out2

h2 (g1, g2, g3, g4) is also in H.



3. Each pattern corresponds

to a submatrix of the

g3 Out3 Out3 h3 transfer matrix.



4. By Schwartz-Zippel lemma,

we can find the local

encoding vectors so that all

g4 Out4 Out4 h4 such determinants are non-

zero, if the finite field is

sufficiently large.



Jul, 2011 kshum 49

Summary

• Multiple node failures in medium-scale to

large-scale storage system

• Formulation as a linear program

• Functional repair: Linear regenerating code

over fixed finite field which matches the cut-

set bound on repair-bandwidth exists.

• Exact repair: two families of explicit code

constructions

– Minimum-bandwidth point: d=k, r = n – d

– Minimum-storage point: d=k, r arbitrary



Jul, 2011 kshum 50

References

• Y. Wu and A. G. Dimakis, Reducing repair traffic for erasure coding-based storage

via interference alignment, ISIT, Jul, 2009.



• Y. Hu, Y. Xu, X. Wang, C. Zhan and P. Li, Cooperative recovery of distributed storage

systems from multiple losses with network coding, J. Sel. Area Comm., vol. 28, no.

2, pp.268-275, Feb, 2010.



• K. W. Shum, Cooperative Regenerating Codes for Distributed Storage Systems, ICC,

Jun, 2011.



• A.-M. Kermarrec and N. Le Scouarnec and G. Straub, Repairing Multiple Failures

with Coordinated and Adaptive Regenerating Codes, Netcod, Jul, 2011.



• K. W. Shum and Y. Hu, Existence of Minimum-Repair-Bandwidth Cooperative

Regenerating Codes, Netcod, Jul, 2011.



• K. W. Shum and Y. Hu, Exact Minimum-Repair-Bandwidth Cooperative

Regenerating Codes for Distributed Storage Systems, ISIT, Aug, 2011.

Jul, 2011 kshum 51



Related docs
Other docs by cuiliqing
Table 4 _AY and CY_
Views: 0  |  Downloads: 0
August 19_ 2010 - Maine ASSE
Views: 0  |  Downloads: 0
Appointment of Counsellors
Views: 0  |  Downloads: 0
Izmir - Sportslion NL
Views: 194  |  Downloads: 0
ADASTRA BOWLING CLUB
Views: 0  |  Downloads: 0
2 August 2011 Meeting Agenda
Views: 0  |  Downloads: 0
Outline
Views: 1  |  Downloads: 0
gislergianindictmentpr
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!