Docstoc

backup

Document Sample
backup Powered By Docstoc
					Cooperative backup on Social
          Network
    Nguyen Tran and Jinyang Li
                      Motivation
• Backup is important.
• State of the art solutions
   – Buy second harddisk
   – Manual backup to mobile disk / CDs
   – Sign up for online backup (10 bucks for 1GB/month)
• Manual backup is not good (additional harddisk, need to
  remember)
• Important data need long distance separation between
  original and backup copy, e.g. Wall Street center’s data.
• Idea: backup on p2p network (utilize idle space, backup
  daemon, remoteness).
           Solution overview
• How to make sure nodes w/ data stay in
  the system?
  – the malicious gets data and go.
• Idea: backup on your real friend’s node(s).
• Consequence: lose global space utilization
  but gain incentives.
• For backup service:
  – Data safety >> global space utilization.
            Model




Meta data
            Data
  Q#1: efficient space allocation
• If I join w/ 100G to back up and 100G to
  contribute, can I back up all the data?
     • Orkut: 2363 nodes, 78% space utilization
     • Venus: 39783 nodes, 81% space utilization
     • Over half of nodes can backup all data
• Which buddy to pick to further optimize
  global space efficiency?
     • Buddy with min degree?
 Q#2: space optimization w/ coding
• Q: If you only have 1G idle space, can you store 5G
  worth of your friends’ backups? Ans: yes!

                       a2
            a1




                             A = a1⊕a2⊕ …⊕ an



              an            an-1

    How about 2 friends crash at the same time?
How many disk space you need to
      store a1, a2, …, an?
  Disk Space   # concurrent crashes    Bandwidth
                     tolerable        redundancy

      n                 n                 0




  F(n) = ?              2                 ?
      1                 1               n-1
How many disk space you need to
      store a1, a2, …, an?
   Disk Space    # concurrent crashes    Bandwidth
                       tolerable        redundancy

       n                  n                 0




 F(n) = log(n)            2             O(n/2)
       1                  1               n-1
               Definition
• Let S = {a1, a2, …, an}
• Let T⊆ S, denote ∂(T) is the XOR of all
  elements in T.
• A solution X = {S1, S2, …,Sk} where Si⊆ S
  means you store ∂(S1), ∂(S2), …, ∂(Sk) on
  your machine, i.e. F(n) = k.
• Of course, ∪ Si = S
• Lemma: X is a solution that tolerates 2
  concurrent crashes iff ∀ p, q∈ [1..n], ∃ i∈
  [1..k]: Si contains either ap or aq but not
  both.
  a1                             a1            a2
               a2




            a1⊕a2⊕                           a1⊕a2 and a1⊕a3
            a3                         i.e. X={{a1, a2}, {a1,a3}}
        i.e. X={{a1, a2 , a3}}


  a3                             a3
       bad                            good
• Lemma: X is a solution that tolerates 2
  concurrent crashes iff ∀ p, q∈ [1..n], ∃ i∈ [1..k]: Si
  contains either ap or aq but not both.
• Proof:
• =>: suppose every Si contains both ap & aq or
  non of them, XOR them cannot reduce individual
  ap or aq.
• <=: Supose Si contains ap but not aq. For each
  element ai in Si\ap, get ai from the owner (not
  crash) and XOR with Si. Finally, we can get ap.
  Then getting aq is easy, i.e. X is the solution.
How small is k? Our ans: log(n)
• Solution construction: F(2n) = F(n) + 1
• If there are 2n data a1, a2, …, an,an+1, …,
  a2n to backup.
  – Put {a1, a2, …, an} to X
  – For every set in the solution of n data {a1, a2,
    …, an} union with it’s isomorphic in the set
    {an,an+1, …, a2n} and put in X
                        Example
                                            n = 4, F(n) = k = 3
n = 2, F(n) = k = 2




                      n = 8, F(n) = k = 4
How many disk space you need to
      store a1, a2, …, an?
   Disk Space    # concurrent crashes    Bandwidth
                       tolerable        redundancy

       n                  n                 0
       ?                ….                  ?
     2n/3                 2                 1
 F(n) = log(n)            2              n/2-1
       1                  1               n-1
             My questions
• Is this result known before?
• Log(n) is a lower bound for 2 concurrent
  crashes tolerable
• F(n) = ? for tolerating 3, 4, 5 …
  #concurrent crashes.
      Implementation Options
• #1: backup at which granularity?
  – Consolidate backup data into 1 log file:
     • Pros: hide file size, recover older version, incremental backup
     • Cons: bad space & bandwidth efficiency

  – Backup data at file granularity:
     • Pros: space & bandwidth efficiency
     • Cons: reveal file size, subtle detail about cutting big files, wise update…,
  #2: Wise transfer for updating file
• Problem: if two versions of the file have little difference,
  transfer the whole file again is expensive.
• Idea (rsync): only transfer the necessary bytes.
• Let A’ is the updated file on node N, A is the old version
  of the file kept by M.
• M:
   – Cut A to fix size chunks and compute the hash.
   – Send all hash h1, h2, … hn to N
• N:
   – Compute hash of chunks in A’ in sliding window fashion.
   – Compare with h1, h2, … hn to know overlapping.
   – Sent only necessary bytes to M.
 #3: Cutting big file into small parts
• Problem: One friend doesn’t have enough space for your
  big file. Therefore, you need to cut big file into smaller
  parts. But how to cut them so that later update is easy.
• Fix part size? No, if the file is insert/delete one byte, all
  the parts are shifted. Hence, you need to update all the
  old parts.
• Idea (LBFS): Using file bit pattern of the file to set the
  boundary rather than fix size. As a result, if one byte is
  inserted/deleted, only the part containing that file
  changes.                   boundary



      sliding window
                  Other issues
• #4: Trust but verify your friends.
   – check that backup is still there
   – how to check if friends contribute right share?
• #5: how to check if the backup copy still exists if
  you and your friend are not online at the same
  time.
   – Idea: ask other friends to help.
• #6: Sharing files among friends.
   – Viewers automatically cache/back up the file.
   – Backed up data increase availability of files shared.
The End

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:12
posted:3/25/2011
language:English
pages:20