HBase, Hadoop World NYC
Ryan Rawson, Stumbleupon.com, su.pr Jonathan Gray, Streamy.com
Friday, October 2, 2009
A presentation in 2 parts
Friday, October 2, 2009
Part 1
Friday, October 2, 2009
About Me
• Ryan Rawson • Senior Software Developer @
Stumbleupon
• HBase committer, core contributor
Friday, October 2, 2009
Stumbleupon
• Uses HBase in production • Behind features of our su.pr service • More later
Friday, October 2, 2009
Adventures with MySQL
• Scaling MySQL hard, Oracle expensive (and
hard)
• Machine cost goes up faster speed • Turn off all relational features to scale • Turn off secondary (!) indexes too! (!!)
Friday, October 2, 2009
MySQL problems cont.
• Tables can be a problem at sizes as low as
500GB
• Hard to read data quickly at these sizes • Future doesn’t look so bright as we
contemplate 10x sizes
• MySQL master becomes a problem...
Friday, October 2, 2009
Limitations of masters
• What if your write speed is greater than a
single machine?
• All slaves must have same write capacity as
master (can’t cheap out on slaves)
• Single point of failure, no easy failover • Can (sort of) solve this with sharding...
Friday, October 2, 2009
Sharding
Friday, October 2, 2009
Sharding problems
• Requires either a hashing function or
mapping table to determine shard
• Data access code becomes complex • What if shard sizes become too large...
Friday, October 2, 2009
Resharding!
Friday, October 2, 2009
What about schema changes?
• What about schema changes or migrations? • MySQL not your friend here • Only gets harder with more data
Friday, October 2, 2009
HBase to the rescue
• Clustered, commodity(ish) hardware • Mostly schema-less • Dynamic distribution • Spreads writes out over the cluster
Friday, October 2, 2009
What is HBase?
• HBase is an open-source distributed • Part of the Hadoop ecosystem • Layers on HDFS for storage • Native connections to map reduce
Friday, October 2, 2009
database, inspired by Google’s bigtable
HBase storage model
• Column-oriented database • Column name is arbitrary data, can have • Rows stored in sorted order • Can random read and write
Friday, October 2, 2009
large, variable, number of columns per row
Friday, October 2, 2009
Friday, October 2, 2009
Tables
• Table is split into roughly equal sized
“regions”
• Each region is a contiguous range of keys,
from [start, to end)
• Regions split as they grow, thus dynamically
adjusting to your data set
Friday, October 2, 2009
Server architecture
• Similar to HDFS: • Master = Namenode (ish) • Regionserver = Datanode (ish) • Often run these alongside each other!
Friday, October 2, 2009
Server Architecture 2
• But not quite the same, HBase stores state
in HDFS
• HDFS provides robust data storage across
machines, insulating against failure and machine independent
• Master and Regionserver fairly stateless
Friday, October 2, 2009
Region assignment
• Each region from every table is assigned to
a Regionserver
• The master is responsible for assignment
and noticing if (when!) regionservers go down
Friday, October 2, 2009
Master Duties
• When machines fail, move regions from
affected machines to others balance cluster
• When regions split, move regions to • Could move regions to respond to load • Can run multiple backup masters
Friday, October 2, 2009
What Master does NOT do
• Does not handle any write requests (not a
DB master!)
• Does not handle location finding requests • Not involved in the read/write path! • Generally does very little most the time
Friday, October 2, 2009
Distributed coordination
• To manage master election and server
availability we use ZooKeeper coordination primitives management systems
• Set up as a cluster, provides distributed • An excellent tool for building cluster
Friday, October 2, 2009
Scaling HBase
• Add more machines to scale • Base model (bigtable) scales past 1000TB • No inherent reason why HBase couldn’t
Friday, October 2, 2009
What to store in HBase?
• Maybe not your raw log data...
Friday, October 2, 2009
• ... but the results of processing it with
Hadoop!
• By storing the refined version in HBase, can
keep up with huge data demands and serve to your website
Friday, October 2, 2009
HBase & Hadoop
• Provides a real time, structured storage
layer that integrates on your existing Hadoop clusters reduce.
• Provides “out of the box” hookups to map• Uses the same loved (or hated)
management model as Hadoop
Friday, October 2, 2009
HBase @
Friday, October 2, 2009
Stumbleupon & HBase
• Started investigating the field in Jan ’09 • Looked at 3 top (at the time) choices: • Cassandra • Hypertable • HBase
Friday, October 2, 2009
cassandra didnt work, didnt like data model - hypertable fast but community and project viability (no major users beyond zvents) - hbase local and good community
Stumbleupon & HBase
• Picked HBase: • Community • Features • Map-reduce, cascading, etc • Now highly involved and invested
Friday, October 2, 2009
su.pr marketing
• “Su.pr is the only URL shortener that also
helps your content get discovered! Every Su.pr URL exposes your content to StumbleUpon's nearly 8 million users!”
Friday, October 2, 2009
su.pr tech features
• Real time stats • Done directly in HBase • In depth stats • Use cascading, map reduce and put
results in hbase
Friday, October 2, 2009
su.pr web access
• Using thrift gateway, php code accesses
HBase
• No additional caching other than what
HBase provides
Friday, October 2, 2009
Large data storage
• Over 9 billion rows and 1300 GB in HBase • Can map reduce a 700GB table in ~ 20 min • That is about 6 million rows/sec • Scales to 2x that speed on 2x the hardware
Friday, October 2, 2009
Micro read benches
• Single reads are 1-10ms depending on disk
seeks and caching dozens of ms
• Scans can return hundreds of rows in
Friday, October 2, 2009
Serial read speeds
• A small table • A bigger table • (removed printlns from the code)
Friday, October 2, 2009
Deployment considerations
• Zookeeper requires IO to complete ops • Consider hosting on dedicated machines • Namenode and HBase master can co-exist
Friday, October 2, 2009
What to put on your nodes
• Regionserver requires 2-4 cores and 3gb+ • Can’t run HDFS, HBase, maps, reduces on a
2 core system
• On my 8 core systems I run datanode,
regionserver, 2 maps, 2 reduces
Friday, October 2, 2009
Garbage collection
• GC tuning becomes important. • Quick tip: use CMS, use -Xmx4000m • Interested in G1 (if it ever stops crashing)
Friday, October 2, 2009
Batch and interactive
• These may not be compatible • Latency goes up with heavy batch load • May need to use 2 clusters to ensure
responsive website
Friday, October 2, 2009
Part 2
Friday, October 2, 2009
HBase @ Streamy
• History of Data • RDBMS Issues • HBase to the Rescue • Streamy Today and Tomorrow • Future of HBase
Friday, October 2, 2009
About Me
• Co-Founder and CTO of Streamy.com • HBase Committer • Migrated Streamy from RDBMS to HBase
and Hadoop in June 2008
Friday, October 2, 2009
History of Data
The Prototype
• Streamy 1.0 built on PostgreSQL
‣ All of the bells and whistles
• Powered by single low-spec node
‣ 8 core / 8 GB / 2TB / $4k
Functionally powerful, Woefully slow
Friday, October 2, 2009
History of Data
The Alpha
• Streamy 1.5 built on optimized PostgreSQL
‣ Remove bells and whistles, add partitioning
• Powered by high-powered master node
‣ 16 core / 64 GB / 15x146GB 15k RPM / $40k
Less powerful, still slow... Insanely expensive
Friday, October 2, 2009
History of Data
The Beta
• Streamy 2.0 built entirely on HBase
‣ Custom caches, query engines, and API
• Powered by 10 low-spec nodes
‣ 4 core / 4GB / 1TB / $10k for entire cluster
Less functional but fast, scalable, and cheap
Friday, October 2, 2009
RDBMS Issues
• Poor disk usage patterns • Black box query engine • Write speed degrades with table size • Transactions/MVCC unnecessary overhead • Expensive
Friday, October 2, 2009
The Read Problem
• View 30 newest unread stories from blogs
‣ Not RDBMS friendly, no early-out ‣ PL/Python heap-merge hack helped ‣ We knew what to do but DB didn’t listen
Friday, October 2, 2009
The Write Problem
• Rapidly growing items table
‣ Crawl index from 1k to 100k feeds ‣ Indexes, static content, dynamic statistics ‣ Solutions are imperfect
Friday, October 2, 2009
RDBMS Conclusions
• Enormous functionality and flexibility
‣ But you throw it out the door at scale
• Stripped down RDBMS still not attractive • Turned entire team into DBAs • Gets in the way of domain-specific
optimizations
Friday, October 2, 2009
What We Wanted
• Transparent partitioning • Transparent distribution • Fast random writes • Good data locality • Fast random reads
Friday, October 2, 2009
What We Got
• Transparent partitioning • Transparent distribution • Fast random writes • Good data locality • Fast random reads
Friday, October 2, 2009
Regions RegionServers MemStore Column Families HBase 0.20
What Else We Got
• Transparent replication • High availability • MapReduce • Versioning • Fast Sequential Reads
Friday, October 2, 2009
HDFS No SPOF Input/OutputFormats Column Versions Scanners
HBase @ Streamy
Today
Friday, October 2, 2009
HBase @ Streamy
Today
• All data stored in HBase • Additional caching of hot data • Query and indexing engines • MapReduce crawling and analytics • Zookeeper/Katta/Lucene
Friday, October 2, 2009
HBase @ Streamy
Tomorrow
• Thumbnail media server • Slave replication for Backup/DR • More Cascading • Better Katta integration • Realtime MapReduce
Friday, October 2, 2009
HBase on a Budget
• HBase works on cheap nodes
‣ But you need a cluster (5+ nodes) ‣ $10k cluster has 10X capacity of $40k node
• Multiple instances on a single cluster • 24/7 clusters + bandwidth != EC2
Friday, October 2, 2009
Lessons Learned
• Layer of abstraction helps tremendously
‣ Internal Streamy Data API ‣ Storage of serialized types
• Schema design is about reads not writes • What’s good for HBase is good for Streamy
Friday, October 2, 2009
What’s Next for HBase
• Inter-cluster / Inter-DC replication
‣ Slave and Multi-Master
• Master rewrite, more Zookeeper • Batch operations, HDFS uploader • No more data loss
‣ Need HDFS appends
Friday, October 2, 2009
HBase Information
• Home Page http://hbase.org • Wiki http://wiki.apache.org/hadoop/Hbase • Twitter http://twitter.com/hbase • Freenode IRC #hbase • Mailing List hbase-user@hadoop.apache.org
Friday, October 2, 2009