Docstoc

saito_porcupine

Document Sample
saito_porcupine Powered By Docstoc
					     Porcupine: A Highly
     Available Cluster-based
     Mail Service
            Yasushi Saito
            Brian Bershad
             Hank Levy
    http://porcupine.cs.washington.edu/
           University of Washington
Department of Computer Science and Engineering,
                 Seattle, WA
                                                  1
           Why Email?
Mail is important
  Real demand
Mail is hard
  Write intensive
  Low locality
Mail is easy
  Well-defined API
  Large parallelism
  Weak consistency

                        2
                 Goals
Use commodity hardware to build a large,
  scalable mail service
Three facets of scalability ...
• Performance: Linear increase with cluster size
• Manageability: React to changes automatically
• Availability: Survive failures gracefully



                                             3
  Conventional Mail Solution
Static partitioning             SMTP/IMAP/POP


Performance problems:
  No dynamic load balancing
Manageability problems:
                              Ann’s Bob’s   Joe’s Suzy’s
  Manual data partition       mbox mbox     mbox mbox
   decision
Availability problems:
  Limited fault tolerance
                                     NFS servers    4
   Presentation Outline
Overview
Porcupine Architecture
  Key concepts and techniques
  Basic operations and data structures
  Advantages
Challenges and solutions
Conclusion
                                         5
        Key Techniques and
           Relationships
  Functional Homogeneity                 Framework
   “any node can perform any task”


              Automatic       Load       Techniques
Replication Reconfiguration Balancing



                                          Goals
Availability Manageability Performance
                                             6
Porcupine Architecture

                              SMTP          POP      IMAP
                              server       server    server

                                    Load Balancer
                                       User map
                               Membership        RPC
                                Manager

                          Replication Manager
                                                    Mail map
                              Mailbox     User
                              storage    profile
...   Node A   Node B   ...     Node Z


                                                               7
              Porcupine Operations
          Protocol              User      Load             Message
          handling             lookup   Balancing           store

    Internet                       A                        C
DNS-RR 1. “send                          4. “OK,
selection mail to                        bob has
                  3. “Verify             msgs on
           bob”                          C and D    6. “Store
                  bob”                              msg”
    ...   A     B    C   ...                 B


               2. Who                   5. Pick the best
               manages                  nodes to store
               bob?  A                 new msg  C             8
            Basic Data Structures
                       “bob”
                             Apply hash
                              function

B CACABAC B CACABAC B CACABAC User map

 bob: {A,C}           suzy: {A,C}         joe: {B}      Mail map
                      ann: {B}                          /user info

Bob’s       Suzy’s   Ann’s    Joe’s       Bob’s   Suzy’s Mailbox
MSGs        MSGs     MSGs     MSGs        MSGs    MSGs
                                                        storage
        A                    B                    C        9
    Porcupine Advantages
Advantages:
  Optimal resource utilization
  Automatic reconfiguration and task re-distribution
    upon node failure/recovery
  Fine-grain load balancing
Results:
  Better Availability
  Better Manageability
  Better Performance

                                                 10
   Presentation Outline
Overview
Porcupine Architecture
Challenges and solutions
  Scaling performance
  Handling failures and recoveries:
    Automatic soft-state reconstruction
    Hard-state replication
  Load balancing
Conclusion
                                          11
            Performance

Goals
  Scale performance linearly with cluster size
Strategy: Avoid creating hot spots
  Partition data uniformly among nodes
  Fine-grain data partition




                                                 12
 Measurement Environment
30 node cluster of not-quite-all-identical PCs
  100Mb/s Ethernet + 1Gb/s hubs
  Linux 2.2.7
  42,000 lines of C++ code
Synthetic load
Compare to sendmail+popd



                                             13
  How does Performance Scale?

         800                                                  68m/day
         700           Porcupine
         600           sendmail+popd
         500
Messages
         400
 /second
         300                                                  25m/day
         200
         100
           0
               0   5       10        15        20   25   30
                                Cluster size

                                                                14
               Availability

Goals:
  Maintain function after failures
  React quickly to changes regardless of cluster size
  Graceful performance degradation / improvement
Strategy: Two complementary mechanisms
  Hard state: email messages, user profile
     Optimistic fine-grain replication
  Soft state: user map, mail map
     Reconstruction after membership change

                                                    15
       Soft-state Reconstruction
                   1. Membership protocol        2. Distributed
                   Usermap recomputation            disk scan
    B C A B A B A C           B A A B A B A B           A C A C A C A C
A     bob: {A,C}                bob: {A,C}                bob: {A,C}
                                suzy:                     suzy: {A,B}

    B C A B A B A C            B A A B A B A B          A C A C A C A C
      joe: {C}                   joe: {C}                 joe: {C}
B                                ann:                     ann: {B}

    B C A B A B A C           B C A B A B A C           B C A B A B A C
      suzy: {A,B}               suzy: {A,B}               suzy: {A,B}
C     ann: {B}                  ann: {B}                  ann: {B}

                                Timeline                           16
     How does Porcupine React to
      Configuration Changes?
         700
                                                          No failure
         600                                              One node
                                                          failure
Messages                                                  Three node
         500
 /second                                                  failures
                                                          Six node
         400                                              failures

         300                                       Time(seconds)
               0    100 200 300 400 500 600 700 800
                                             New
               Nodes     New        Nodes
                                    recover  membership
               fail      membership          determined
                         determined

                                                                       17
     Hard-state Replication
Goals:
  Keep serving hard state after failures
  Handle unusual failure modes
Strategy: Exploit Internet semantics
  Optimistic, eventually consistent replication
  Per-message, per-user-profile replication
  Efficient during normal operation
  Small window of inconsistency


                                                  18
                  How Efficient is Replication?

                  800       Porcupine no replication                  68m/day
Messages/second




                  700       Porcupine with replication=2
                  600
                  500
                  400
                  300                                                 24m/day
                  200
                  100
                    0
                        0   5       10        15           20   25   30
                                         Cluster size

                                                                          19
                  How Efficient is Replication?
                            Porcupine no replication
                  800       Porcupine with replication=2                68m/day
Messages/second




                  700       Porcupine with replication=2, NVRAM
                  600
                  500
                  400                                                   33m/day
                  300                                                   24m/day
                  200
                  100
                    0
                        0    5         10        15        20     25   30
                                            Cluster size

                                                                            20
  Load balancing: Deciding
   where to store messages
Goals:
  Handle skewed workload well
  Support hardware heterogeneity
  No voodoo parameter tuning
Strategy: Spread-based load balancing
  Spread: soft limit on # of nodes per mailbox
     Large spread  better load balance
     Small spread  better affinity
  Load balanced within spread
  Use # of pending I/O requests as the load measure
                                                 21
How Well does Porcupine Support
Heterogeneous Clusters?

                         30%
Throughput increase(%)




                                  Spread=4                          +16.8m/day (+25%)

                         20%      Static



                         10%


                                                                    +0.5m/day (+0.8%)
                         0%
                               0%       3%         7%         10%
                                Number of fast nodes (% of total)


                                                                              22
           Conclusions

Fast, available, and manageable clusters can
  be built for write-intensive service
Key ideas can be extended beyond mail
  Functional homogeneity
  Automatic reconfiguration
  Replication
  Load balancing

                                               23
         Ongoing Work

More efficient membership protocol
Extending Porcupine beyond mail: Usenet,
 BBS, Calendar, etc
More generic replication mechanism




                                           24

				
DOCUMENT INFO