HBase Goes Realtime

Document Sample
HBase Goes Realtime Powered By Docstoc
					HBase Goes Realtime

Wednesday, June 10, 2009 Santa Clara Marriott

Quick Overview
» » » » » » » Who are we? What is HBase? HBase 0.20 Primary Goal HBase 0.20 Architecture and Specifics HBase 0.20 By The Numbers Zookeeper Integration What’s Next?

Who are we?
» Jonathan Gray
› › › › › Co-Founder and CTO, Streamy.com Background in CE, databases, distributed systems User of technology as a competitive advantage Contributing to HBase for ~1 year HBase in production for ~9 months

» Jean-Daniel Cryans
› › › › HBase Committer Graduate student at the Université du Québec HBase consultant currently working for OpenPlaces.com Contributing to HBase for ~18 months

What is HBase?
» HBase is a…
› › › › › › › Sorted, Distributed, Column-Oriented, Multi-Dimensional, Highly-Available, High-Performance, Persisted Storage System

HBase adds random access reads and writes atop HDFS
Billions of Rows * Millions of Columns * Thousands of Versions

HBase 0.20 Primary Goal
» First ever Performance Release
1. Random Access Time 2. Scan Time 3. Insert Time

» As a random-access store, we are well suited for the storing and serving of Web applications
› But high latency and variability (100s of ms to seconds) has reduced the usefulness of HBase and required the use of external caching in the past

HBase 0.20 Architecture
» The Guiding Philosophy – Unjavafy Everything!
› › › › › › Zero-copy reads Block-based storage, reading, and indexing Drastically reduce Object instantiation Eliminate widespread usage of Trees Sorted merges using Heap structures Fast and intelligent caching with memory-awareness

» Effort Lead By…
› Jonathan Gray and Erik Holstad, Streamy.com › Michael Stack, Powerset/Microsoft › Ryan Rawson, StumbleUpon

HBase 0.20 Architecture – Storage
» New Key Format – KeyValue
› Contains only (byte [] buf, int offset, int length) › Compact binary format with binary comparators › Our “pointer” to keys inside blocks

» New File Format – HFile
› › › › Originally based on TFile (HADOOP-3315) and BigTable Block based binary format with a block index Contains any number of Meta blocks Persisted storage of List<KeyValue>

HBase 0.20 Architecture – API
» New Query API
› › › › Put, Get, Scan, Delete operations Extended support for versioning Drastically reduces API size and complexity An API that more closely mirrors implementation

» New Result API and optimized Serialization
› › › › Result is just a wrapper for KeyValue[] User-friendly Trees are built on-demand, client-side Deserialization allocates a single byte[] for all KVs Zero-copy building, single allocation receiving

HBase 0.20 Architecture – Algorithms
» New Scanners – KeyValueScanner / KeyValueHeap
› Replace linear sort logic with an encapsulated Heap › Abstract the handling of versions, deletes, query params › Now capable of processing individual rows with millions of columns and versions › Linear (or worse) to Logarithmic, Logarithmic to Constant

» New Block Cache - Concurrent LRU
› › › › Backed by ConcurrentHashMap LRU eviction with scan-resistance and block priorities Memory-bound using HeapSize interface Non-blocking and unsynchronized LRU map

HBase 0.20 By The Numbers (Uncached)
» Tall Table: 1 Million Rows with a single Column
› Sequential insert – 24 seconds (.024 ms/row) › Random reads – 1.42 ms/row (average) › Full scan – 11 seconds (117 ms/10,000 rows, .011ms/row)

» Wide Table: 1000 Rows with 20,000 Columns each
› Sequential insert – 312 seconds (312 ms/row) › Random reads – 121 ms/row (average) › Full scan – 146 seconds (14.6 seconds/100 rows, 146ms/row)

» Fat Table: 1000 Rows with 10 Columns,1MB values
› Sequential insert – 68 seconds (68 ms/row) › Random reads – 56.92 ms/row (average) › Full scan – 35 seconds (3.53 seconds/100 rows, 35ms/row)
Each test yielded >1 region, additional rows have no impact on performance

HBase 0.20 Performance Conclusion
» We surprised even ourselves
› Random read times similar to that of an RDBMS
• 20-100 times faster with far less variability

› Scan times reduced
• 30 times faster than previous versions

› Insert times reduced
• 2-10 times faster with less than half the memory usage

» We improved our performance by more than an order of magnitude in most cases
› While drastically improving our memory usage and code readability

Zookeeper Integration

Why?
» » » » Takes 2 mins to figure a RegionServer’s death Clients have to ask Master for -ROOT- address Managing shared state in HBase is a zoo ;) And...

»Master is a SPOF!

Zookeeper? • Project under Hadoop started by Y! • Centralized service for maintaining configuration information, naming, providing distributed synchronization, and group services. • Highly available when used on an ensemble of machines, typically 5 or more. • ZK’s data model is a simple namespace with permanent and ephemeral nodes.

Tough Decisions
» Should we impose the usage of a ZK quorum on every setup? » How much should we rely on ZK for HA, are there better alternate solutions for some of our problems? » Should we have an HBase implementation of ZK ? » Should we package our own version of ZK?

Major Integration Points
» » » » » » Master address is stored in ZK Master election is a race for that lock -ROOT- address is also stored in ZK Region Servers are all registered in ZK The RSs watch the Master’s node Backup Masters are watching both Master’s node and a “cluster state” node

What it Changes for You
» Standalone and pseudo-distributed setups:
› a ZK server that listens on localhost is started for you. It starts/stops with the rest of the cluster.

» Fully-distributed setup:
› poss. to keep the managed ZK server but have to make it point on a non-local IP/hostname. › better is to get a quorum, can also use it for other purposes, for higher availability.

Fully-distributed setup
» What you have to do with ZK:
› hbase-site.xml: set hbase.cluster.distributed to true, also notice that hbase.master is deprecated. › hbase-env.sh: export HBASE_MANAGES_ZK=false › zoo.cfg: server.0=… server.1=… etc. You also have to configure those servers per ZK doc.

» You want backup masters?
› ${HBASE_HOME}/bin/hbase-daemon.sh start master › It’s also a good idea to set hbase.master.dns.nameserver and hbase.master.dns.interface to have them binding at the right place.

New Features from ZK integration in 0.20
» No more SPOF
› Automatic Master failover

» Rolling upgrades of point releases » Modify some cluster configuration without full cluster restart

What’s next?
» More performance and reliability
› 0.20 was mostly a RegionServer rewrite
• But there are still more known bottlenecks left to tackle for 0.21

› 0.21 will rewrite Master with better ZK integration

» HBase 0.21 Roadmap
› Decentralized Master responsibilities + More ZK
• • • • Further capability to modify configurations at run time State sharing via ZK nodes Ephemeral nodes for region ownership Distributed queue for region assignment

› Language-agnostic, binary RPC › Native C/C++ client library › Multi-DC Replication

More Information about HBase
» HBase Website and Wiki
› http://www.hbase.org › http://wiki.apache.org/hadoop/Hbase

» Mailing List
› http://hadoop.apache.org/hbase/mailing_lists.html

» IRC Channel
› #hbase on Freenode › All committers and core contributors are here

» Follow us on Twitter!
› @hbase


				
DOCUMENT INFO
Shared By:
Stats:
views:14447
posted:6/19/2009
language:English
pages:21
Description: Talk given at the 2009 Hadoop Summit by Jonathan Gray and Jean-Daniel Cryans about the 0.20 release of HBase.