My Life with HBase - FOSDEM 2010 NoSQL

Document Sample
My Life with HBase - FOSDEM 2010 NoSQL
Shared by: Lars George
Categories
Tags
Stats
views:
18638
posted:
2/8/2010
language:
English
pages:
43
“My Life with HBase”

Lars George, CTO of WorldLingo

Apache Hadoop HBase Committer

www.worldlingo.com www.larsgeorge.com

WorldLingo

 Co-founded 1999

 Machine Translation Services

 Professional Human Translations

 Offices in US and UK

 Microsoft Office Provider since 2001

 Web based services

 Customer Projects

 Multilingual Archive

Multilingual Archive

 SOAP API

 Simple calls

◦ putDocument()

◦ getDocument()

◦ search()

◦ command()

◦ putTransformation()

◦ getTransformation()

Multilingual Archive (cont.)

 Planned already, implemented as customer

project

 Scale:

◦ 500million documents

◦ Random Access

◦ “100%” Uptime

 Technologies?

◦ Database

◦ Zip-Archives on file system, or Hadoop

RDBMS Woes

 Scaling MySQL hard, Oracle expensive (and

hard)

 Machine cost goes up faster speed

 Turn off all relational features to scale

 Turn off secondary indexes too

 Tables can be a problem at sizes as low as

500GB

 Hard to read data quickly at these sizes

 Write speed degrades with table size

 Future growth uncertain

MySQL Limitations

 Master becomes a problem

 What if your write speed is greater than a

single machine

 All slaves must have same write capacities

as master (can„t check out on slaves)

 Single point of failure, no easy failover

 Can (sort of) solve this with sharding

Sharding

Sharding Problems

 Requires either a hashing function or

mapping table to determine shard

 Data access code becomes complex

 What if shard sizes become too large?

Resharding

Schema Changes

 What about schema changes or

migrations?

 MySQL not your friend here

 Only gets harder with more data

HBase to the Rescue

 Clustered, commodity(-ish) hardware

 Mostly schema-less

 Dynamic distribution

 Spreads writes out over the cluster

HBase

 Distributed database modeled on Bigtable

◦ Bigtable: A Distributed Storage System for

Structured Data by Chang et al.

 Runs on top of Hadoop Core

 Layers on HDFS for storage

 Native connections to MapReduce

 Distributed, High Availability, High

Performance, Strong Consistency

HBase

 Column-oriented store

◦ Wide table costs only the data stored

◦ NULLs in row are 'free'

◦ Good compression: columns of similar type

◦ Column name is arbitrary

 Rows stored in sorted order

 Can random read and write

 Goal of billions of rows X millions of cells

◦ Petabytes of data across thousands of servers

Tables

 Table is split into roughly equal sized

„regions“

 Each region is a contiguous range of keys,

from [start, to end)

 Regions split as they grow, thus

dynamically adjusting to your data set

Tables (cont.)

 Tables are sorted by Row

 Table schema defines column families

◦ Families consist of any number of columns

◦ Columns consist of any number of versions

◦ Everything except table name is byte[]





(Table, Row, Family:Column, Timestamp)  Value

Tables (cont.)

 As a data structure

SortedMap(

RowKey, List(

SortedMap(

Column, List(

Value, Timestamp

)

)

)

)

Server Architecture

 Similar to HDFS

◦ Master ≈ Namenode

◦ Regionserver ≈ Datanode

 Often run these alongsaide each other!

 Difference: HBase stores state in HDFS

 HDFS provides robust data storage

across machines, insulating against failure

 Master and Regionserver fairly stateless

and machine independent

Region Assignment

 Each region from every table is assigned

to a Regionserver

 Master Duties:

◦ Reponsible for assignment and handling

regionserver problems (if any!)

◦ When machines fail, move regions

◦ When regions split, move regions to balance

◦ Could move regions to respond to load

◦ Can run multiple backup masters

Master

 The master does NOT

◦ Handle any write requests (not a DB master!)

◦ Handle location finding requests

◦ Not involved in the read/write path

◦ Generally does very little most of the time

Distributed Coordination

 Zookeeper is used to manage master

election and server availability

 Set up as a cluster, provides distributed

coordination primitives

 An excellent tool for building cluster

management systems

HBase Storage Architecture

HBase Public Timeline

 November 2006

◦ Google releases paper on Bigtable

 February 2007

◦ Initial HBase prototype created as Hadoop contrib

 October 2007

◦ First "useable" HBase (0.15.0 Hadoop)

 December 2007

◦ First HBase User Group

 January 2008

◦ Hadoop becomes TLP, HBase becomes subproject

 October 2008

◦ HBase 0.18.1 released

 January 2009

◦ HBase 0.19.0 released

 September 2009

◦ HBase 0.20.0 released

HBase WorldLingo Timeline

HBase - Example

 Store web crawl data

◦ Table crawl with family content

◦ Row is URL with columns

 content:data stores raw crawled data

 content:language stores http language header

 content:type stores http content-type header

◦ If processing raw data for hyperlinks and

images, add families links and images

 links: column for each hyperlink

 links: column for each image

HBase - Clients

 Native Java client/API

◦ get(Get get)

◦ put(Put put)

 Non-Java clients

◦ Thrift server (Ruby, C++, Erlang, etc.)

◦ REST server (Stargate)

 TableInput/TableOutputFormat for

MapReduce

 HBase shell (jruby)

Scaling HBase

 Add more machines to scale

◦ Automatic rebalancing

 Base model (BigTable) scales past 1000TB

 No inherent reason why Hbase couldn„t

What to store in HBase

 Maybe not your raw log data...

 ... but the results of processing it with

Hadoop!

 By storing the refined version in HBase,

can keep up with huge data demands and

serve to your website

!HBase

 “NoSQL” Database!

◦ No joins

◦ No sophisticated query engine

◦ No transactions (sort of)

◦ No column typing

◦ No SQL, no ODBC/JDBC, etc. (but there is

HBql now!)

 Not a replacement for your RDBMS...

 Matching Impedance!

Why HBase?

 Datasets are reaching Petabytes

 Traditional databases are expensive to

scale and difficult to distribute

 Commodity hardware is cheap and

powerful (but HBase can make use of

powerful machines too!)

 Need for random access and batch

processing (which Hadoop does not

offer)

Numbers

 Single reads are 1-10ms depending on

disk seeks and caching

 Scans can return hundreds of rows in

dozens of ms

 Serial read speeds

Multilingual Archive (cont.)

 44 Dell PESC1435, 12GB RAM, 2 x 1TB

SATA drives

 Java 6

 Tomcat 5.5

 88 Xen domU‟s

◦ Apache

◦ Hadoop/HBase

◦ Tomcat application servers

 Currently split into two clusters

Lucene Search Server

 43 fields indexed

 166GB size

 Automated merging/warm-up/swap

 Looking into scalable solution

◦ Katta

◦ Hyper Estraier

◦ DLucene

◦ …

 Sorting?

Multilingual Archive (cont.)

 5 Tables

 Up to 5 column families

 XML Schemas

 Automated table schema updates

 Standard options tweaked over time

◦ Garbage Collection!

 MemCached(b) layer

Layers

Network Firewall









LWS Director 1 Director n









Web Apache 1 Apache n …









App Tomcat 1 Tomcat n Tomcat 1 Tomcat n









Cache MemCached

1

MemCached

n









Data HBase

Map/Reduce

 Backup/Restore

 Index building

 Cache filling

 Mapping

 Updates

 Translation

HBase - Problems

 Early versions (before HBase 0.19.0!)

◦ Data loss

◦ Migration nightmares

◦ Slow performance



 Current version

◦ Read HBase Wiki!!!

 Single point of failure (name node only!)

HBase - Notes

 RTF M (ine)





 HBase Wiki, IRC Channel

 Personal Experience:

◦ Max. file handles (32k+)

◦ Hadoop xceiver limits (NIO?)

◦ Redundant meta data (on name node)

◦ RAM (4GB+)

◦ Deployment strategy

◦ Garbage collection (use CMS, G1?)

◦ Maybe not mix batch and interactive?

Graphing

 Use supplied Ganglia context or JMX

bridge to enable Nagios and Cacti

 JMXToolkit: swiss army knife for JMX

enabled servers:

http://github.com/larsgeorge/jmxtoolkit

HBase - Roadmap

 HBase 0.20.x “Performance”

◦ New Key Format – KeyValue

◦ New File Format – Hfile

◦ New Block Cache – Concurrent LRU

◦ New Query and Result API

◦ New Scanners

◦ Zookeeper Integration – No SPOF in HBase

◦ New REST Interface

◦ Contrib

 Transactional Tables

 Secondary Indexes

 Stargate

HBase - Roadmap (cont.)

 HBase 0.21.x “Advanced Concepts”

◦ Master Rewrite – More Zookeeper

◦ New RPC Protocol (Avro)

◦ Multi-DC Replication

◦ Intra Row Scanning

◦ Further optimizations on algorithms and data

structures

◦ Discretionary Access Control

◦ Coprocessors

Questions?

 Email: lars@worldlingo.com

larsgeorge@apache.org

lars@larsgeorge.com

 Blog: www.larsgeorge.com

 Twitter: larsgeorge


Share This Document


Related docs
Other docs by Lars George
Realtime Analytics with Hadoop and HBase
Views: 1009  |  Downloads: 3
HBase @ WorldLingo
Views: 8208  |  Downloads: 113
Advanced HBase
Views: 5409  |  Downloads: 182
My Life with HBase - FOSDEM 2010 NoSQL
Views: 18634  |  Downloads: 300
Social Networks and the Richness of Data
Views: 90  |  Downloads: 3
HBase at WorldLingo - Munich OpenHUG
Views: 1666  |  Downloads: 24
by registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!