“My Life with HBase”
Lars George, CTO of WorldLingo
Apache Hadoop HBase Committer
www.worldlingo.com www.larsgeorge.com
WorldLingo
Co-founded 1999
Machine Translation Services
Professional Human Translations
Offices in US and UK
Microsoft Office Provider since 2001
Web based services
Customer Projects
Multilingual Archive
Multilingual Archive
SOAP API
Simple calls
◦ putDocument()
◦ getDocument()
◦ search()
◦ command()
◦ putTransformation()
◦ getTransformation()
Multilingual Archive (cont.)
Planned already, implemented as customer
project
Scale:
◦ 500million documents
◦ Random Access
◦ “100%” Uptime
Technologies?
◦ Database
◦ Zip-Archives on file system, or Hadoop
RDBMS Woes
Scaling MySQL hard, Oracle expensive (and
hard)
Machine cost goes up faster speed
Turn off all relational features to scale
Turn off secondary indexes too
Tables can be a problem at sizes as low as
500GB
Hard to read data quickly at these sizes
Write speed degrades with table size
Future growth uncertain
MySQL Limitations
Master becomes a problem
What if your write speed is greater than a
single machine
All slaves must have same write capacities
as master (can„t check out on slaves)
Single point of failure, no easy failover
Can (sort of) solve this with sharding
Sharding
Sharding Problems
Requires either a hashing function or
mapping table to determine shard
Data access code becomes complex
What if shard sizes become too large?
Resharding
Schema Changes
What about schema changes or
migrations?
MySQL not your friend here
Only gets harder with more data
HBase to the Rescue
Clustered, commodity(-ish) hardware
Mostly schema-less
Dynamic distribution
Spreads writes out over the cluster
HBase
Distributed database modeled on Bigtable
◦ Bigtable: A Distributed Storage System for
Structured Data by Chang et al.
Runs on top of Hadoop Core
Layers on HDFS for storage
Native connections to MapReduce
Distributed, High Availability, High
Performance, Strong Consistency
HBase
Column-oriented store
◦ Wide table costs only the data stored
◦ NULLs in row are 'free'
◦ Good compression: columns of similar type
◦ Column name is arbitrary
Rows stored in sorted order
Can random read and write
Goal of billions of rows X millions of cells
◦ Petabytes of data across thousands of servers
Tables
Table is split into roughly equal sized
„regions“
Each region is a contiguous range of keys,
from [start, to end)
Regions split as they grow, thus
dynamically adjusting to your data set
Tables (cont.)
Tables are sorted by Row
Table schema defines column families
◦ Families consist of any number of columns
◦ Columns consist of any number of versions
◦ Everything except table name is byte[]
(Table, Row, Family:Column, Timestamp) Value
Tables (cont.)
As a data structure
SortedMap(
RowKey, List(
SortedMap(
Column, List(
Value, Timestamp
)
)
)
)
Server Architecture
Similar to HDFS
◦ Master ≈ Namenode
◦ Regionserver ≈ Datanode
Often run these alongsaide each other!
Difference: HBase stores state in HDFS
HDFS provides robust data storage
across machines, insulating against failure
Master and Regionserver fairly stateless
and machine independent
Region Assignment
Each region from every table is assigned
to a Regionserver
Master Duties:
◦ Reponsible for assignment and handling
regionserver problems (if any!)
◦ When machines fail, move regions
◦ When regions split, move regions to balance
◦ Could move regions to respond to load
◦ Can run multiple backup masters
Master
The master does NOT
◦ Handle any write requests (not a DB master!)
◦ Handle location finding requests
◦ Not involved in the read/write path
◦ Generally does very little most of the time
Distributed Coordination
Zookeeper is used to manage master
election and server availability
Set up as a cluster, provides distributed
coordination primitives
An excellent tool for building cluster
management systems
HBase Storage Architecture
HBase Public Timeline
November 2006
◦ Google releases paper on Bigtable
February 2007
◦ Initial HBase prototype created as Hadoop contrib
October 2007
◦ First "useable" HBase (0.15.0 Hadoop)
December 2007
◦ First HBase User Group
January 2008
◦ Hadoop becomes TLP, HBase becomes subproject
October 2008
◦ HBase 0.18.1 released
January 2009
◦ HBase 0.19.0 released
September 2009
◦ HBase 0.20.0 released
HBase WorldLingo Timeline
HBase - Example
Store web crawl data
◦ Table crawl with family content
◦ Row is URL with columns
content:data stores raw crawled data
content:language stores http language header
content:type stores http content-type header
◦ If processing raw data for hyperlinks and
images, add families links and images
links: column for each hyperlink
links: column for each image
HBase - Clients
Native Java client/API
◦ get(Get get)
◦ put(Put put)
Non-Java clients
◦ Thrift server (Ruby, C++, Erlang, etc.)
◦ REST server (Stargate)
TableInput/TableOutputFormat for
MapReduce
HBase shell (jruby)
Scaling HBase
Add more machines to scale
◦ Automatic rebalancing
Base model (BigTable) scales past 1000TB
No inherent reason why Hbase couldn„t
What to store in HBase
Maybe not your raw log data...
... but the results of processing it with
Hadoop!
By storing the refined version in HBase,
can keep up with huge data demands and
serve to your website
!HBase
“NoSQL” Database!
◦ No joins
◦ No sophisticated query engine
◦ No transactions (sort of)
◦ No column typing
◦ No SQL, no ODBC/JDBC, etc. (but there is
HBql now!)
Not a replacement for your RDBMS...
Matching Impedance!
Why HBase?
Datasets are reaching Petabytes
Traditional databases are expensive to
scale and difficult to distribute
Commodity hardware is cheap and
powerful (but HBase can make use of
powerful machines too!)
Need for random access and batch
processing (which Hadoop does not
offer)
Numbers
Single reads are 1-10ms depending on
disk seeks and caching
Scans can return hundreds of rows in
dozens of ms
Serial read speeds
Multilingual Archive (cont.)
44 Dell PESC1435, 12GB RAM, 2 x 1TB
SATA drives
Java 6
Tomcat 5.5
88 Xen domU‟s
◦ Apache
◦ Hadoop/HBase
◦ Tomcat application servers
Currently split into two clusters
Lucene Search Server
43 fields indexed
166GB size
Automated merging/warm-up/swap
Looking into scalable solution
◦ Katta
◦ Hyper Estraier
◦ DLucene
◦ …
Sorting?
Multilingual Archive (cont.)
5 Tables
Up to 5 column families
XML Schemas
Automated table schema updates
Standard options tweaked over time
◦ Garbage Collection!
MemCached(b) layer
Layers
Network Firewall
LWS Director 1 Director n
Web Apache 1 Apache n …
App Tomcat 1 Tomcat n Tomcat 1 Tomcat n
Cache MemCached
1
MemCached
n
Data HBase
Map/Reduce
Backup/Restore
Index building
Cache filling
Mapping
Updates
Translation
HBase - Problems
Early versions (before HBase 0.19.0!)
◦ Data loss
◦ Migration nightmares
◦ Slow performance
Current version
◦ Read HBase Wiki!!!
Single point of failure (name node only!)
HBase - Notes
RTF M (ine)
HBase Wiki, IRC Channel
Personal Experience:
◦ Max. file handles (32k+)
◦ Hadoop xceiver limits (NIO?)
◦ Redundant meta data (on name node)
◦ RAM (4GB+)
◦ Deployment strategy
◦ Garbage collection (use CMS, G1?)
◦ Maybe not mix batch and interactive?
Graphing
Use supplied Ganglia context or JMX
bridge to enable Nagios and Cacti
JMXToolkit: swiss army knife for JMX
enabled servers:
http://github.com/larsgeorge/jmxtoolkit
HBase - Roadmap
HBase 0.20.x “Performance”
◦ New Key Format – KeyValue
◦ New File Format – Hfile
◦ New Block Cache – Concurrent LRU
◦ New Query and Result API
◦ New Scanners
◦ Zookeeper Integration – No SPOF in HBase
◦ New REST Interface
◦ Contrib
Transactional Tables
Secondary Indexes
Stargate
HBase - Roadmap (cont.)
HBase 0.21.x “Advanced Concepts”
◦ Master Rewrite – More Zookeeper
◦ New RPC Protocol (Avro)
◦ Multi-DC Replication
◦ Intra Row Scanning
◦ Further optimizations on algorithms and data
structures
◦ Discretionary Access Control
◦ Coprocessors
Questions?
Email: lars@worldlingo.com
larsgeorge@apache.org
lars@larsgeorge.com
Blog: www.larsgeorge.com
Twitter: larsgeorge