BigTable _ Hbase A Distributed Storage System for Structured Data

Reviews
Shared by: Alon Shwartz
Tags
Stats
views:
307
rating:
not rated
reviews:
0
posted:
3/25/2009
language:
English
pages:
0
A Distributed Storage System for Structured Data BigTable & Hbase Edward J. Yoon edwardyoon@apache.org Three Major Component • Master Server - Responsible for assigning tablets to tablet servers, detecting the addition and expiration of tablet servers, balancing tablet-server load, and garbage collection of files in HDFS - Handles schema changed such as table and CF creations • Tablet Server - Manages a set tablets(10~1000 per tablet server) - Handles read write requests to the tablets - Splits tablets that have grown too large (100-200 MB) • Client Library - Communicate directly with tablet servers for reads and writes Architecture Assigning tablets Detecting the addition and expiration of tablet Balancing tablet-server load Handle schema changed Client Master Server GFS master server, CMS client Chubby Handle master election Store the bootstrap location of Hbase data Discover region server Store access control lists CMS server Tablet Server GFS chunk server, CMS client Tablet Server GFS chunk server, CMS client Tablet Server GFS chunk server, CMS client - Scheduling jobs - Managing resources on the cluster - dealing with machine failures Data Model • Doesn’t support a full relational data model • Multi-dimensional sorted map • Indexed by a row, column, timestamp (row: string, column : string, time : int 64)  string • Column-oriented storage - Most queries only involve a few columns out of many, so greatly reduces I/O. Tablet Location • Use three-level hierarchy analogous to that of a B+ tree - Location is ip : port of relevant server - 1st level: Bootstrapped from lock server, points to location of root tablet - 2nd level: Uses META 0 data to find owner of appropriate META 1 tablet - 3rd level: META1 table holds locations of tablets of all other tables Tablet Assignment Master keeps track of the set of live tablet servers the current assignment of tablets to region servers, including which tablets are unassigned. Tablet servers Cluster manager 1) Start a server Chubby 2) Create a lock 3) Acquire the lock 4) Monitor 5) Assign tablets 8) Acquire and Delete the lock 9) Reassign unassigned tablets Region Server Master Server 6) Check lock status Tablet Serving • To recover a tablet - reads its metadata from the METADATA table - metadata contains - the list of SS-Tables that comprise a tablet - a set of a redo points, which are pointers into any commit logs that may the tablet. contain data for - reads the indices of the SSTables into memory - reconstructs the memtable by applying all of the updates that have committed since the redo points Compaction Create new memtable V5.0 memtable Read op Frozen memtable Tablet log Write op V4.0 Memory DFS V3.0 V2.0 SSTable files Deleted data are removed Storage can be reused Major compaction Memtable + all SSTables -> to one SSTable V1.0 Merging compaction Memtable + a few SSTables -> A new SSTable Periodically done. Deleted data are still alive. Minor compaction Memtable -> a new SSTable V6.0 Compression • Clients can control whether or not SSTables for a locality group are compressed • Tow-pass custom compression scheme First-pass: long common strings across a large window (BMDiff) Second-pass: looks for repetitions in a small 16KB window (zippy) Both compression passes are very fast Space reduction • Allow to identify large amounts of shared boilerplate in pages from same host - Choose their row names so that similar data ends up clustered and therefore achieve very good performance Caching for read performance • Use two level of caching to improve read performance • Scan cache - Higher-level cache - Most useful for applications that tend to read the same data • Block cache - Lower-level cache - Useful for applications or random read of different columns in same locality group within a hot row Hbase : BigTable clone project • http://hadoop.apache.org/hbase/ • Written in java • we do not have chubby or a CMS server, we have JobTracker and zookeeper coming soon. • Since Hadoop (GFS) doesn't provide file-append function, Current Hbase have a problem of data loss when Hbase crashed. - Hadoop 0.19.x provides file append function

Related docs
HBase @ WorldLingo
Views: 2565  |  Downloads: 50
HBase nosql presentation
Views: 1035  |  Downloads: 31
BigTable
Views: 110  |  Downloads: 6
HBase Goes Realtime
Views: 3368  |  Downloads: 52
HBase at Hadoop World NYC
Views: 2500  |  Downloads: 48
Hadoop and HBase vs RDBMS
Views: 10667  |  Downloads: 357
HUG7 HBase 0.20 Intro
Views: 2546  |  Downloads: 19
HBase at Hadoop World NYC
Views: 87  |  Downloads: 8
Google Bigtable Presentation
Views: 329  |  Downloads: 26
Other docs by Alon Shwartz
Financial Aid Forms and Policies - IUPUI
Views: 0  |  Downloads: 0
Obama Ft. Hood Speech – November 10_ 2009
Views: 95  |  Downloads: 2
קישורים-לינקים חשובים
Views: 34  |  Downloads: 0
Erik Johannson
Views: 126  |  Downloads: 0
From Family Rental Houses to Low-rent Houses
Views: 50  |  Downloads: 0
2009-2010 Influenza Season - Week 40 CDC
Views: 79  |  Downloads: 0
2009 H1N1 Flu and You
Views: 36  |  Downloads: 1
The Real Estate Rollercoaster
Views: 35  |  Downloads: 1
Roller Coaster Leadership
Views: 33  |  Downloads: 0