Introduction of HBase
Reporter: Hu Yi
2009-3-11
Overview
HBase is an Apache open source project
whose goal is to provide storage for the
Hadoop Distributed Computing
Environment.
Data is logically organized into tables,
rows and columns.
Outline
Data Model
Architecture and Implementation
Examples & Tests
:
Conceptual Viewkey
Row
Time Column
Stamp “contents:”
Column “anchor:”
A data row has a t12 “…”
sortable row key “com.apach
t11 “…”
and an arbitrary e.www”
number of t10
“anchor:apache.
com”
“APACHE”
columns.
t15 “anchor:cnnsi.com” “CNN”
A Time Stamp is
“anchor:my.look.c
designated t13
a”
“CNN.com”
automatically if not “com.cnn.w
t6 “…”
ww”
artificially.
“…”
:
t5
t3 “…”
HStore
Physical Storage View
Column
Row key TS
“contents:”
Physically, tables are t12 “…”
“com.apache.w
stored on a per-column ww”
t11 “…”
family basis. t6 “…”
HStore
Empty cells are not “com.cn.www” t5 “…”
stored in a column- t3 “…”
oriented storage format.
Column “anchor:”
Each column family is Row key TS
managed by an HStore. “com.apache. “anchor:
www” t10 apache.com” “APACHE”
Data MapFile Key/Value
t9
“anchor:
“CNN”
cnnsi.com”
Index MapFile Index key com.cn.www”
“anchor: “CNN.co
t8
my.look.ca” m”
Memcache
Time Column
Row Ranges: RegionsRow key
Stamp “contents:”
Column “anchor:”
t15 anchor:cc value
Row key/ Column ascending,
t13 ba
Timestamp descending
aaaa t12 bb
Physically, tables are broken
t11 anchor:cd value
into row ranges contain rows
from start-key to end-keyt10 bc
aaab t14
aaac anchor:be value
aaad anchor:ad value
t5 ae
aaae
t3 af
Outline
Data Model
Architecture and Implementation
Examples & Tests
Three major components
The HBaseMaster
The HRegionServer
The HBase client
Master
HBaseMaster 2 META Region
2 META Region
2 META Region
2 META Region
1 ROOT Region
Assign regions to
HRegionServers.
1. ROOT region locates all the Server Server Server Server Server
META regions.
2. META region maps a number
of user regions. USER Region
3. Assign user regions to the
META Region
HRegionServers.
Enable/Disable table and
USER Region
change table schema ROOT Region
Monitor the health of each META Region
Server
USER Region
ROOT/META Table
Each row in the ROOT and META tables is
approximately 1KB in size. At the default size of
256MB.
1ROOTtable 2 METAregions
18
218 218USERregions
254 KB 264 bytes
224TB
HRegionServer write HLog
Write Requests Row key
Time
Stam
Column
Column “anchor:”
“contents:”
Read Requests p
t12 “…”
Cache Flushes “com.apac
he.ww t11 “…”
Compactions w”
t10
“anchor:apache
.com”
“APACH
E”
Region Splits t9
“anchor:cnnsi.co
m”
“CNN”
“anchor:my.look. “CNN.co
t8
Mapfile1.1 ca” m”
“com.cnn.w
Mapfile1.2 ww” t6 “…”
t5 “…”
Memcache2
Memcache1 t3 “…”
Hstore1 Hstore2
HRegionServer Read
Time
Write Requests Row key Stam
Column
“contents:”
Column “anchor:”
p
Read Requests t12 “…”
Cache Flushes “com.apach
e.www”
t11 “…”
“anchor:apache.
Compactions t10
com”
“APACHE”
“anchor:cnnsi.co
Region Splits t9
m”
“CNN”
“anchor:my.look.c
t8 “CNN.com”
“com.cnn.w a”
Mapfile1.1
ww” t6 “…”
Mapfile1.2
t5 “…”
Memcache1
t3 “…”
Hstore1
HRegionServer Cache Flushes
HLog
Write Requests Row key Time
Stam
Column
Column “anchor:”
“contents:”
Read Requests t12
p
“…”
Cache Flushes “com.apach
e.www”
t11 “…”
“anchor:apache.
Compactions t10
com”
“APACHE”
“anchor:cnnsi.co
Mapfile1.1 Splits
Region t9
m”
“CNN”
“anchor:my.look.c
Mapfile1.2 t8 “CNN.com”
Mapfile1.1 “com.cnn.w a”
Mapfile1.3 ww” t6 “…”
Mapfile1.2
t5 “…”
Memcache1 t3 “…”
Hstore1
HRegionServer Compaction
s
Write Requests Row key Time
Stam
Column
Column “anchor:”
“contents:”
Read Requests t12
p
“…”
Cache Flushes “com.apach
e.www”
t11 “…”
“anchor:apache.
Compactions t10
com”
“APACHE”
“anchor:cnnsi.co
Region Splits t9
m”
“CNN”
“anchor:my.look.c
t8 “CNN.com”
“com.cnn.w a”
Mapfile1.1
Mapfile1 ww”
t6 “…”
Mapfile1.2
t5 “…”
Memcache1 t3 “…”
Hstore1
HRegionServer Region Splits
Write Requests Row key
Time
Stam
Column
“contents Column “anchor:”
Read Requests t12
p :”
“…”
“com.apac
Cache Flushes he.ww
t11 “…”
w” “anchor:apache “APACH
Compactions t10
.com” E”
“anchor:cnnsi.co
Region Splits t9
m”
“CNN”
“anchor:my.look. “CNN.co
t8
“com.cnn.w ca” m”
Mapfile1 ww”
t6 “…”
t5 “…”
Memcache1 t3 “…”
Hstore1
HBase Client
ROOT Region
HBase Client
HBase Client
META Region
HBase Client User Region
Information cached
Outline
Data Model
Architecture and Implementation
Examples & Tests
Create MyTable
Row Key Timestamp columnFamily1: columnFamily2:
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new
HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);
Insert Values
BatchUpdate batchUpdate = new
BatchUpdate("myRow",timestamp);
batchUpdate.put("columnFamily1:labela",Bytes.toBytes("l
abela value"));
batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“l
abelb value"));
table.commit(batchUpdate);
Row Key Timestamp columnFamily1:
ts1 labela labela value
myRow
ts2 labelb labelb value
Insert
160000
140000
120000
100000
80000 Hbase
60000
40000
20000
0
100000 10000 1000 100 10 1
1 10 100 1000 10000 100000
Insert
1000000
100000
time(ms)
10000
1000
100
10
1
Hbase
10
0
0
00
00
10
00
10
00
MySQL
10
10
Row*10 Column=1
Select value from table where
Search key=„com.apache.www‟ AND
label=„anchor:apache.com‟
Time
Row key Column “anchor:”
Stamp
t12
t11
“com.apache.www”
t10 “anchor:apache.com” “APACHE”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6
t5
t3
Select value from table
Search Scanner where anchor=„cnnsi.com‟
Time
Row key Column “anchor:”
Stamp
t12
t11
“com.apache.www”
t10 “anchor:apache.com” “APACHE”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6
t5
t3
Summary
Column-oriented modification more flexible.
Higher performance on row key clusters.
Future work
More test work
Optimization on search
Thank you