Introduction to HBase

Document Sample
Introduction to HBase Powered By Docstoc
					Introduction of HBase

              Reporter: Hu Yi
                  2009-3-11
Overview

HBase is an Apache open source project
 whose goal is to provide storage for the
 Hadoop Distributed Computing
 Environment.
Data is logically organized into tables,
 rows and columns.
Outline
 Data Model
 Architecture and Implementation
 Examples & Tests
                                                                  <family>:<label>
   Conceptual Viewkey
                 Row
                                       Time  Column
                                      Stamp “contents:”
                                                               Column “anchor:”


 A data row has a                     t12   “<html>…”
  sortable row key       “com.apach
                                       t11   “<html>…”
  and an arbitrary           e.www”

  number of                            t10
                                                          “anchor:apache.
                                                                 com”
                                                                               “APACHE”
  columns.
                                       t15                “anchor:cnnsi.com”    “CNN”
 A Time Stamp is
                                                          “anchor:my.look.c
  designated                           t13
                                                                   a”
                                                                               “CNN.com”

  automatically if not   “com.cnn.w
                                        t6    “<html>…”
                              ww”
  artificially.
                                              “<html>…”
 <family>:<label>
                                        t5


                                        t3    “<html>…”
                                                                      HStore
Physical Storage View
                                                            Column
                                    Row key        TS
                                                             “contents:”
 Physically, tables are                           t12     “<html>…”
                                  “com.apache.w
  stored on a per-column                 ww”
                                                   t11     “<html>…”
  family basis.                                    t6      “<html>…”
                                                                       HStore
 Empty cells are not             “com.cn.www”     t5      “<html>…”
  stored in a column-                              t3      “<html>…”
  oriented storage format.
                                                           Column “anchor:”
 Each column family is             Row key       TS

  managed by an HStore.           “com.apache.             “anchor:
                                      www”        t10    apache.com”       “APACHE”


   Data MapFile       Key/Value
                                                  t9
                                                          “anchor:
                                                                            “CNN”
                                                         cnnsi.com”
  Index MapFile       Index key com.cn.www”
                                                          “anchor:         “CNN.co
                                                  t8
                                                         my.look.ca”           m”
    Memcache
                              Time  Column
Row Ranges: RegionsRow key
                             Stamp “contents:”
                                                   Column “anchor:”


                              t15                anchor:cc      value
 Row key/ Column ascending,
                            t13         ba
  Timestamp descending
                     aaaa   t12         bb
 Physically, tables are broken
                            t11                  anchor:cd      value
  into row ranges contain rows
  from start-key to end-keyt10          bc

                    aaab      t14

                    aaac                         anchor:be      value

                    aaad                         anchor:ad      value

                               t5       ae
                    aaae
                               t3       af
Outline
 Data Model
 Architecture and Implementation
 Examples & Tests
Three major components
 The HBaseMaster

 The HRegionServer

 The HBase client
                                                               Master


HBaseMaster                      2 META Region

                                           2 META Region
                                                                                    2 META Region
                                                                            2 META Region


                                                            1 ROOT Region

 Assign regions to
   HRegionServers.
1. ROOT region locates all the    Server         Server         Server          Server        Server
   META regions.
2. META region maps a number
   of user regions.                                                             USER Region

3. Assign user regions to the
                                                          META Region
   HRegionServers.
 Enable/Disable table and
                                                                                USER Region
   change table schema           ROOT Region

 Monitor the health of each                              META Region
   Server
                                                                                USER Region
ROOT/META Table
 Each row in the ROOT and META tables is
  approximately 1KB in size. At the default size of
  256MB.

      1ROOTtable  2 METAregions
                       18


       218  218USERregions
       254 KB  264 bytes


                             224TB
HRegionServer write                                      HLog
 Write Requests     Row key
                                  Time
                                  Stam
                                        Column
                                                         Column “anchor:”
                                       “contents:”
 Read Requests                      p
                                   t12   “<html>…”
 Cache Flushes      “com.apac
                         he.ww     t11   “<html>…”

 Compactions              w”
                                   t10
                                                     “anchor:apache
                                                           .com”
                                                                        “APACH
                                                                            E”
 Region Splits                    t9
                                                     “anchor:cnnsi.co
                                                              m”
                                                                        “CNN”

                                                     “anchor:my.look.   “CNN.co
                                   t8
        Mapfile1.1                                            ca”           m”
                     “com.cnn.w
        Mapfile1.2        ww”      t6    “<html>…”

                                   t5    “<html>…”
                                                                Memcache2
       Memcache1                   t3    “<html>…”



                         Hstore1                                        Hstore2
HRegionServer                     Read

                                    Time
 Write Requests        Row key     Stam
                                            Column
                                           “contents:”
                                                             Column “anchor:”
                                       p
 Read Requests                     t12    “<html>…”

 Cache Flushes        “com.apach
                           e.www”
                                    t11    “<html>…”
                                                         “anchor:apache.
 Compactions                       t10
                                                                com”
                                                                         “APACHE”

                                                         “anchor:cnnsi.co
 Region Splits                      t9
                                                                  m”
                                                                            “CNN”

                                                         “anchor:my.look.c
                                     t8                                    “CNN.com”
                       “com.cnn.w                                 a”
Mapfile1.1
                            ww”      t6    “<html>…”
Mapfile1.2
                                     t5    “<html>…”
 Memcache1
                                     t3    “<html>…”



             Hstore1
HRegionServer                        Cache Flushes
                                                HLog
 Write Requests Row key             Time
                                     Stam
                                             Column
                                                              Column “anchor:”
                                            “contents:”
 Read Requests                      t12
                                        p
                                            “<html>…”

 Cache Flushes “com.apach
                    e.www”
                                     t11    “<html>…”
                                                          “anchor:apache.
 Compactions                        t10
                                                                 com”
                                                                          “APACHE”

                                                          “anchor:cnnsi.co
 Mapfile1.1 Splits
  Region                              t9
                                                                   m”
                                                                             “CNN”

                                                          “anchor:my.look.c
 Mapfile1.2                           t8                                    “CNN.com”
Mapfile1.1              “com.cnn.w                                 a”
 Mapfile1.3                  ww”      t6    “<html>…”
Mapfile1.2
                                      t5    “<html>…”
 Memcache1                            t3    “<html>…”




              Hstore1
HRegionServer                       Compaction
                                        s
 Write Requests Row key            Time
                                    Stam
                                            Column
                                                             Column “anchor:”
                                           “contents:”
 Read Requests                     t12
                                       p
                                           “<html>…”

 Cache Flushes “com.apach
                    e.www”
                                    t11    “<html>…”
                                                         “anchor:apache.
 Compactions                       t10
                                                                com”
                                                                         “APACHE”

                                                         “anchor:cnnsi.co
 Region Splits                      t9
                                                                  m”
                                                                            “CNN”

                                                         “anchor:my.look.c
                                     t8                                    “CNN.com”
                       “com.cnn.w                                 a”
Mapfile1.1
Mapfile1                    ww”
                                     t6    “<html>…”
Mapfile1.2
                                     t5    “<html>…”
 Memcache1                           t3    “<html>…”




             Hstore1
HRegionServer                     Region Splits

 Write Requests     Row key
                                  Time
                                  Stam
                                         Column
                                         “contents       Column “anchor:”

 Read Requests                   t12
                                     p         :”
                                         “<html>…”
                     “com.apac
 Cache Flushes          he.ww
                                   t11   “<html>…”
                           w”                        “anchor:apache     “APACH
 Compactions                     t10
                                                           .com”            E”
                                                     “anchor:cnnsi.co
 Region Splits                    t9
                                                              m”
                                                                        “CNN”

                                                     “anchor:my.look.   “CNN.co
                                   t8
                     “com.cnn.w                               ca”           m”
Mapfile1                  ww”
                                   t6    “<html>…”
                                   t5    “<html>…”
 Memcache1                         t3    “<html>…”




           Hstore1
HBase Client
               ROOT Region
HBase Client
HBase Client

META Region
HBase Client         User Region




Information cached
Outline
 Data Model
 Architecture and Implementation
 Examples & Tests
Create MyTable
   Row Key   Timestamp columnFamily1: columnFamily2:
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new
  HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);
Insert Values
BatchUpdate batchUpdate = new
   BatchUpdate("myRow",timestamp);
batchUpdate.put("columnFamily1:labela",Bytes.toBytes("l
   abela value"));
batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“l
   abelb value"));
table.commit(batchUpdate);
Row Key   Timestamp       columnFamily1:

          ts1         labela   labela value

myRow
          ts2         labelb   labelb value
                                 Insert

160000
140000
120000
100000
80000                                                       Hbase
60000
40000
20000
     0
         100000   10000   1000      100     10       1
           1       10     100       1000   10000   100000
                                Insert

           1000000
            100000
time(ms)




             10000
              1000
               100
                10
                 1
                                                     Hbase
                     10




                                         0
                            0

                                  00




                                                00
                          10




                                         00
                                10




                                                00
                                                     MySQL
                                       10

                                              10
                           Row*10 Column=1
                   Select value from table where
Search             key=„com.apache.www‟ AND
                   label=„anchor:apache.com‟


                     Time
    Row key                                  Column “anchor:”
                    Stamp

                     t12

                     t11
“com.apache.www”

                     t10         “anchor:apache.com”            “APACHE”


                     t9           “anchor:cnnsi.com”             “CNN”


                     t8           “anchor:my.look.ca”           “CNN.com”
 “com.cnn.www”
                     t6

                     t5

                     t3
                                    Select value from table
Search Scanner                      where anchor=„cnnsi.com‟


                    Time
    Row key                            Column “anchor:”
                   Stamp

                    t12

                    t11
“com.apache.www”

                    t10    “anchor:apache.com”            “APACHE”

                    t9      “anchor:cnnsi.com”             “CNN”

                    t8      “anchor:my.look.ca”           “CNN.com”
 “com.cnn.www”
                    t6

                    t5

                    t3
Summary

Column-oriented modification more flexible.

Higher performance on row key clusters.
Future work

More test work

Optimization on search
Thank you

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:2/10/2012
language:English
pages:29