HUG7 HBase 0.20 Intro by jonathangray

VIEWS: 8,786 PAGES: 16

More Info
									HBase 0.20 Primary Goal
» First ever Performance Release
1. Random Access Time 2. Scan Time 3. Insert Time

» As a random-access store, we are well suited for the storing and serving of Web applications
› But high latency and variability (100s of ms to seconds) has reduced the usefulness of HBase and required the use of external caching in the past

HBase 0.20 Architecture
» The Guiding Philosophy – Unjavafy Everything!
› › › › › › Zero-copy reads Block-based storage, reading, and indexing Drastically reduce Object instantiation Eliminate widespread usage of Trees Sorted merges using Heap structures Fast and intelligent caching with memory-awareness

» Effort Lead By…
› Jonathan Gray and Erik Holstad, › Michael Stack, Powerset/Microsoft › Ryan Rawson, StumbleUpon

HBase 0.20 Architecture – Storage
» New Key Format – KeyValue
› Contains only (byte [] buf, int offset, int length) › Compact binary format with binary comparators › Our “pointer” to keys inside blocks

» New File Format – HFile
› › › › Originally based on TFile (HADOOP-3315) and BigTable Block based binary format with a block index Contains any number of Meta blocks Persisted storage of List<KeyValue>

HBase 0.20 Architecture – API
» New Query API
› › › › Put, Get, Scan, Delete operations Extended support for versioning Drastically reduces API size and complexity An API that more closely mirrors implementation

» New Result API and optimized Serialization
› › › › Result is just a wrapper for KeyValue[] User-friendly Trees are built on-demand, client-side Deserialization allocates a single byte[] for all KVs Zero-copy building, single allocation receiving

HBase 0.20 Architecture – Algorithms
» New Scanners – KeyValueScanner / KeyValueHeap
› Replace linear sort logic with an encapsulated Heap › Abstract the handling of versions, deletes, query params › Now capable of processing individual rows with millions of columns and versions › Linear (or worse) to Logarithmic, Logarithmic to Constant

» New Block Cache - Concurrent LRU
› › › › Backed by ConcurrentHashMap LRU eviction with scan-resistance and block priorities Memory-bound using HeapSize interface Non-blocking and unsynchronized LRU map

HBase 0.20 By The Numbers (Uncached)
» Tall Table: 1 Million Rows with a single Column
› Sequential insert – 24 seconds (.024 ms/row) › Random reads – 1.42 ms/row (average) › Full scan – 11 seconds (117 ms/10,000 rows, .011ms/row)

» Wide Table: 1000 Rows with 20,000 Columns each
› Sequential insert – 312 seconds (312 ms/row) › Random reads – 121 ms/row (average) › Full scan – 146 seconds (14.6 seconds/100 rows, 146ms/row)

» Fat Table: 1000 Rows with 10 Columns,1MB values
› Sequential insert – 68 seconds (68 ms/row) › Random reads – 56.92 ms/row (average) › Full scan – 35 seconds (3.53 seconds/100 rows, 35ms/row)
Each test yielded >1 region, additional rows have no impact on performance

HBase 0.20 Performance Conclusion
» We surprised even ourselves
› Random read times similar to that of an RDBMS
• 20-100 times faster with far less variability

› Scan times reduced
• 30 times faster than previous versions

› Insert times reduced
• 2-10 times faster with less than half the memory usage

» We improved our performance by more than an order of magnitude in most cases
› While drastically improving our memory usage and code readability

Zookeeper Integration

» » » » Takes 2 mins to figure a RegionServer’s death Clients have to ask Master for -ROOT- address Managing shared state in HBase is a zoo ;) And...

»Master is a SPOF!

Zookeeper? • Project under Hadoop started by Y! • Centralized service for maintaining configuration information, naming, providing distributed synchronization, and group services. • Highly available when used on an ensemble of machines, typically 5 or more. • ZK’s data model is a simple namespace with permanent and ephemeral nodes.

Major Integration Points
» » » » » » Master address is stored in ZK Master election is a race for that lock -ROOT- address is also stored in ZK Region Servers are all registered in ZK The RSs watch the Master’s node Backup Masters are watching both Master’s node and a “cluster state” node

What it Changes for You
» Standalone and pseudo-distributed setups:
› a ZK server that listens on localhost is started for you. It starts/stops with the rest of the cluster.

» Fully-distributed setup:
› poss. to keep the managed ZK server but have to make it point on a non-local IP/hostname. › better is to get a quorum, can also use it for other purposes, for higher availability.

Fully-distributed setup
» What you have to do with ZK:
› export HBASE_MANAGES_ZK=true/false › hbase-site.xml: set hbase.cluster.distributed to true, also notice that hbase.master is deprecated. › hbase-site.xml or zoo.cfg: set zookeeper configuration

» You want backup masters?
› ${HBASE_HOME}/bin/ start master › It’s also a good idea to set hbase.master.dns.nameserver and hbase.master.dns.interface to have them binding at the right place.

New Features from ZK integration in 0.20
» No more SPOF
› Automatic Master failover

» Rolling upgrades of point releases » Modify some cluster configuration without full cluster restart

Other 0.20 Goodies
» Binary pretty-print in shell/logs/web » Increment Column Value
› Fast, atomic increments

» New REST Server, Stargate » New MapReduce API
› Much cleaner and easier to use › Uses new Hadoop 0.20 API › Accepts a Scan object

What’s next?
» More performance and reliability
› 0.20 was mostly a RegionServer rewrite
• But there are still more known bottlenecks left to tackle for 0.21

› 0.21 will rewrite Master with better ZK integration

» HBase 0.21 Roadmap
› Decentralized Master responsibilities + More ZK
• • • • Further capability to modify configurations at run time State sharing via ZK nodes Ephemeral nodes for region ownership Distributed queue for region assignment

› Language-agnostic, binary RPC › Native C/C++ client library › Multi-DC Replication

To top