Embed
Email

Introduction to HBase

Document Sample

Shared by: Evan He
Categories
Tags
Stats
views:
0
posted:
2/9/2012
language:
pages:
29
Introduction of HBase



Reporter: Hu Yi

2009-3-11

Overview



HBase is an Apache open source project

whose goal is to provide storage for the

Hadoop Distributed Computing

Environment.

Data is logically organized into tables,

rows and columns.

Outline

 Data Model

 Architecture and Implementation

 Examples & Tests

:

Conceptual Viewkey

Row

Time Column

Stamp “contents:”

Column “anchor:”





 A data row has a t12 “…”

sortable row key “com.apach

t11 “…”

and an arbitrary e.www”



number of t10

“anchor:apache.

com”

“APACHE”

columns.

t15 “anchor:cnnsi.com” “CNN”

 A Time Stamp is

“anchor:my.look.c

designated t13

a”

“CNN.com”



automatically if not “com.cnn.w

t6 “…”

ww”

artificially.

“…”

 :

t5





t3 “…”

HStore

Physical Storage View

Column

Row key TS

“contents:”

 Physically, tables are t12 “…”

“com.apache.w

stored on a per-column ww”

t11 “…”

family basis. t6 “…”

HStore

 Empty cells are not “com.cn.www” t5 “…”

stored in a column- t3 “…”

oriented storage format.

Column “anchor:”

 Each column family is Row key TS



managed by an HStore. “com.apache. “anchor:

www” t10 apache.com” “APACHE”





Data MapFile Key/Value

t9

“anchor:

“CNN”

cnnsi.com”

Index MapFile Index key com.cn.www”

“anchor: “CNN.co

t8

my.look.ca” m”

Memcache

Time Column

Row Ranges: RegionsRow key

Stamp “contents:”

Column “anchor:”





t15 anchor:cc value

 Row key/ Column ascending,

t13 ba

Timestamp descending

aaaa t12 bb

 Physically, tables are broken

t11 anchor:cd value

into row ranges contain rows

from start-key to end-keyt10 bc



aaab t14



aaac anchor:be value



aaad anchor:ad value



t5 ae

aaae

t3 af

Outline

 Data Model

 Architecture and Implementation

 Examples & Tests

Three major components

 The HBaseMaster



 The HRegionServer



 The HBase client

Master





HBaseMaster 2 META Region



2 META Region

2 META Region

2 META Region





1 ROOT Region



 Assign regions to

HRegionServers.

1. ROOT region locates all the Server Server Server Server Server

META regions.

2. META region maps a number

of user regions. USER Region



3. Assign user regions to the

META Region

HRegionServers.

 Enable/Disable table and

USER Region

change table schema ROOT Region



 Monitor the health of each META Region

Server

USER Region

ROOT/META Table

 Each row in the ROOT and META tables is

approximately 1KB in size. At the default size of

256MB.



1ROOTtable  2 METAregions

18





 218  218USERregions

 254 KB  264 bytes





224TB

HRegionServer write HLog

 Write Requests Row key

Time

Stam

Column

Column “anchor:”

“contents:”

 Read Requests p

t12 “…”

 Cache Flushes “com.apac

he.ww t11 “…”



 Compactions w”

t10

“anchor:apache

.com”

“APACH

E”

 Region Splits t9

“anchor:cnnsi.co

m”

“CNN”



“anchor:my.look. “CNN.co

t8

Mapfile1.1 ca” m”

“com.cnn.w

Mapfile1.2 ww” t6 “…”



t5 “…”

Memcache2

Memcache1 t3 “…”







Hstore1 Hstore2

HRegionServer Read



Time

 Write Requests Row key Stam

Column

“contents:”

Column “anchor:”

p

 Read Requests t12 “…”



 Cache Flushes “com.apach

e.www”

t11 “…”

“anchor:apache.

 Compactions t10

com”

“APACHE”



“anchor:cnnsi.co

 Region Splits t9

m”

“CNN”



“anchor:my.look.c

t8 “CNN.com”

“com.cnn.w a”

Mapfile1.1

ww” t6 “…”

Mapfile1.2

t5 “…”

Memcache1

t3 “…”







Hstore1

HRegionServer Cache Flushes

HLog

 Write Requests Row key Time

Stam

Column

Column “anchor:”

“contents:”

 Read Requests t12

p

“…”



 Cache Flushes “com.apach

e.www”

t11 “…”

“anchor:apache.

 Compactions t10

com”

“APACHE”



“anchor:cnnsi.co

 Mapfile1.1 Splits

Region t9

m”

“CNN”



“anchor:my.look.c

Mapfile1.2 t8 “CNN.com”

Mapfile1.1 “com.cnn.w a”

Mapfile1.3 ww” t6 “…”

Mapfile1.2

t5 “…”

Memcache1 t3 “…”









Hstore1

HRegionServer Compaction

s

 Write Requests Row key Time

Stam

Column

Column “anchor:”

“contents:”

 Read Requests t12

p

“…”



 Cache Flushes “com.apach

e.www”

t11 “…”

“anchor:apache.

 Compactions t10

com”

“APACHE”



“anchor:cnnsi.co

 Region Splits t9

m”

“CNN”



“anchor:my.look.c

t8 “CNN.com”

“com.cnn.w a”

Mapfile1.1

Mapfile1 ww”

t6 “…”

Mapfile1.2

t5 “…”

Memcache1 t3 “…”









Hstore1

HRegionServer Region Splits



 Write Requests Row key

Time

Stam

Column

“contents Column “anchor:”



 Read Requests t12

p :”

“…”

“com.apac

 Cache Flushes he.ww

t11 “…”

w” “anchor:apache “APACH

 Compactions t10

.com” E”

“anchor:cnnsi.co

 Region Splits t9

m”

“CNN”



“anchor:my.look. “CNN.co

t8

“com.cnn.w ca” m”

Mapfile1 ww”

t6 “…”

t5 “…”

Memcache1 t3 “…”









Hstore1

HBase Client

ROOT Region

HBase Client

HBase Client



META Region

HBase Client User Region









Information cached

Outline

 Data Model

 Architecture and Implementation

 Examples & Tests

Create MyTable

Row Key Timestamp columnFamily1: columnFamily2:

HBaseAdmin admin= new HBaseAdmin(config);

HColumnDescriptor []column;

column= new HColumnDescriptor[2];

column[0]=new HColumnDescriptor("columnFamily1:");

column[1]=new HColumnDescriptor("columnFamily2:");

HTableDescriptor desc= new

HTableDescriptor(Bytes.toBytes("MyTable"));

desc.addFamily(column[0]);

desc.addFamily(column[1]);

admin.createTable(desc);

Insert Values

BatchUpdate batchUpdate = new

BatchUpdate("myRow",timestamp);

batchUpdate.put("columnFamily1:labela",Bytes.toBytes("l

abela value"));

batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“l

abelb value"));

table.commit(batchUpdate);

Row Key Timestamp columnFamily1:



ts1 labela labela value



myRow

ts2 labelb labelb value

Insert



160000

140000

120000

100000

80000 Hbase

60000

40000

20000

0

100000 10000 1000 100 10 1

1 10 100 1000 10000 100000

Insert



1000000

100000

time(ms)









10000

1000

100

10

1

Hbase

10









0

0



00









00

10









00

10









00

MySQL

10



10

Row*10 Column=1

Select value from table where

Search key=„com.apache.www‟ AND

label=„anchor:apache.com‟





Time

Row key Column “anchor:”

Stamp



t12



t11

“com.apache.www”



t10 “anchor:apache.com” “APACHE”





t9 “anchor:cnnsi.com” “CNN”





t8 “anchor:my.look.ca” “CNN.com”

“com.cnn.www”

t6



t5



t3

Select value from table

Search Scanner where anchor=„cnnsi.com‟





Time

Row key Column “anchor:”

Stamp



t12



t11

“com.apache.www”



t10 “anchor:apache.com” “APACHE”



t9 “anchor:cnnsi.com” “CNN”



t8 “anchor:my.look.ca” “CNN.com”

“com.cnn.www”

t6



t5



t3

Summary



Column-oriented modification more flexible.



Higher performance on row key clusters.

Future work



More test work



Optimization on search

Thank you



Other docs by Evan He
06.MR_Programing
Views: 0  |  Downloads: 0
Perl_06_Subroutines and Functions
Views: 0  |  Downloads: 0
RubyCourse_1.0-1
Views: 0  |  Downloads: 0
Hadoop
Views: 1  |  Downloads: 0
taobao_arch_qcon_2009
Views: 0  |  Downloads: 0
rubyonrails
Views: 0  |  Downloads: 0
10.Conclusions
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!