Google File System Simulator

Document Sample
Google File System Simulator Powered By Docstoc
					Google File System Simulator

         Pratima Kolan
      Vinod Ramachandran
                     Google File System




•   Master Manages Metadata
•   Data Transfer Happens directly between client and chunk server
•   Files broken into 64 MB chunks
•   Chunks replicated across three machines for safety
                Event Based Simulation
                                                Get Next
                                                High Priority
                                                Event from
 Component 1                                                       Simulator
                  Place Event in                Queue
                  Priority Queue

                        Priority Queue

Component 2              Event 1   Event 2   Event 3

                                                                Output of
                                                                simulated event


  Component 3
         Simplified GFS Architecture

Client               Switch: Infinite Bandwidth             Master Server


                              Switch




     Represent Network
     Queues
                         Network Disk 1 Network Disk 2 Network Disk 3 Network Disk 4 Network Disk 5
                          Data Flow

The client queries the master server for a Chunk ID it wants to read.




The master server returns a set of disks ids that contain the Chunk.




The client requests a disk for the Chunk




The disk transfers the data to the client
             Experiment Setup
• We have a client whose bandwidth can be varied
  from 0…..1000 Mbps

• We have 5 disks each a having a per disk
  bandwidth of 40 Mbps

• We have 3 chunk replicas per chunk of data as a
  baseline

• Each client request is for 1 Chunk of data from a
  disk
             Simplified GFS Architecture
Client Bandwidth varied
from 0…..1000 Mbps


    Client                Switch: Infinite Bandwidth             Master Server


                                   Switch




          Represent Network
          Queues
                              Network Disk 1 Network Disk 2 Network Disk 3 Network Disk 4 Network Disk 5


               Chunk ID:      0-1000         0-1000         0-2000        1001-2000 1001-2000
                              Per Disk Bandwidth : 40 Mbps
                Experiment 1
• Disk Requests Served With out Load Balancing
  – In this case we pick the first chunk server from the
    list of available chunk servers that contains the
    disk block.


• Disk Requests Served With Load Balancing
  – In this case we apply a greedy algorithm and
    balance the load of incoming requests across the
    5 disks
                Expectation
• In the Non load balancing case we expect the
  effective request/data rate to reach a peak
  value of 2 disks(80 Mbps)

• In the load balancing case we expect the
  effective request/data rate to reach a peak
  value of 5 disks(200 Mbps)
         Load Balancing Graph




This graph plots the data rate at client vs. client bandwidth
                         Experiment 2
• Disk Requests Served With No Dynamic Replication
   – In this case we have a fixed number of replicas(3 in our case) and the server
     does not create more replication based on statistics for read requests.

• Disk Requests Served With Dynamic Replication
   – In this case the server replicates certain chunks based on the frequency of the
     chunk requests.

   – We define a replication factor , which is fraction < 1

   – No of Replicas For Chunk = (replication factor) * No of requests For The
     Chunk

   – We Cap the Max No of Replicas by the Number of disks
                 Expectation
• Our Requests are all aimed on the chunks placed
  in disk 0,disk 1 , disk2.

• In the non replication case we expect the
  effective data rate at the client to me limited by
  the bandwidth provided by 3 disks(120 Mbps)

• In the replication case we expect the effective
  data rate at the client to me limited by the
  bandwidth provided by 5 disks(200 Mbps)
       Replication Graph




This graph plots the data rate at client vs. client bandwidth
                Experiment 3
• Disk Requests Served with no Rebalancing
  – In this case we do not implement any rebalancing
    of read requests based on frequency of chunk
    requests

• Disk Requests Served with Rebalancing
  – In this case we perform rebalancing of read
    requests by picking a request with highest
    frequency and transferring it to a disk with a
    lesser load
Graph 3
                Request Distribution Graph
            5000


            4500


            4000
No of
Requests     3500
In each disk
            3000
                                                                 No Re-Balancing,No Replication

            2500
                                                                 No Re-Balancing,Replication

            2000
                                                                 Re-Balancing,No Replication

            1500


            1000


             500


               0
                    Disk 0   Disk 1   Disk 2   Disk 3   Disk 4
     Conclusion and Future Work
• GFS is a simple file system for large-data
  intensive applications

• We studied the behavior of certain read
  workloads on this file system

• In the future we would like to come up with
  optimizations that could fine tune GFS

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:12
posted:9/1/2011
language:English
pages:17