My Life with HBase - FOSDEM 2010 NoSQL

Document Sample
My Life with HBase - FOSDEM 2010 NoSQL Powered By Docstoc
					“My Life with HBase”
      Lars George, CTO of WorldLingo
     Apache Hadoop HBase Committer
 Co-founded 1999
 Machine Translation Services
 Professional Human Translations
 Offices in US and UK
 Microsoft Office Provider since 2001
 Web based services
 Customer Projects
 Multilingual Archive
Multilingual Archive
 Simple calls
    ◦   putDocument()
    ◦   getDocument()
    ◦   search()
    ◦   command()
    ◦   putTransformation()
    ◦   getTransformation()
Multilingual Archive (cont.)
 Planned already, implemented as customer
 Scale:
    ◦ 500million documents
    ◦ Random Access
    ◦ “100%” Uptime
   Technologies?
    ◦ Database
    ◦ Zip-Archives on file system, or Hadoop
 Scaling MySQL hard, Oracle expensive (and
 Machine cost goes up faster speed
 Turn off all relational features to scale
 Turn off secondary indexes too
 Tables can be a problem at sizes as low as
 Hard to read data quickly at these sizes
 Write speed degrades with table size
 Future growth uncertain
MySQL Limitations
 Master becomes a problem
 What if your write speed is greater than a
  single machine
 All slaves must have same write capacities
  as master (can„t check out on slaves)
 Single point of failure, no easy failover
 Can (sort of) solve this with sharding
Sharding Problems
 Requires either a hashing function or
  mapping table to determine shard
 Data access code becomes complex
 What if shard sizes become too large?
Schema Changes
 What about schema changes or
 MySQL not your friend here
 Only gets harder with more data
HBase to the Rescue
 Clustered, commodity(-ish) hardware
 Mostly schema-less
 Dynamic distribution
 Spreads writes out over the cluster
   Distributed database modeled on Bigtable
    ◦ Bigtable: A Distributed Storage System for
      Structured Data by Chang et al.
 Runs on top of Hadoop Core
 Layers on HDFS for storage
 Native connections to MapReduce
 Distributed, High Availability, High
  Performance, Strong Consistency
   Column-oriented store
    ◦   Wide table costs only the data stored
    ◦   NULLs in row are 'free'
    ◦   Good compression: columns of similar type
    ◦   Column name is arbitrary
 Rows stored in sorted order
 Can random read and write
 Goal of billions of rows X millions of cells
    ◦ Petabytes of data across thousands of servers
 Table is split into roughly equal sized
 Each region is a contiguous range of keys,
  from [start, to end)
 Regions split as they grow, thus
  dynamically adjusting to your data set
Tables (cont.)
 Tables are sorted by Row
 Table schema defines column families
    ◦ Families consist of any number of columns
    ◦ Columns consist of any number of versions
    ◦ Everything except table name is byte[]

(Table, Row, Family:Column, Timestamp)  Value
Tables (cont.)
   As a data structure
        RowKey, List(
              Column, List(
                 Value, Timestamp
Server Architecture
   Similar to HDFS
    ◦ Master ≈ Namenode
    ◦ Regionserver ≈ Datanode
 Often run these alongsaide each other!
 Difference: HBase stores state in HDFS
 HDFS provides robust data storage
  across machines, insulating against failure
 Master and Regionserver fairly stateless
  and machine independent
Region Assignment
 Each region from every table is assigned
  to a Regionserver
 Master Duties:
    ◦ Reponsible for assignment and handling
      regionserver problems (if any!)
    ◦ When machines fail, move regions
    ◦ When regions split, move regions to balance
    ◦ Could move regions to respond to load
    ◦ Can run multiple backup masters
   The master does NOT
    ◦   Handle any write requests (not a DB master!)
    ◦   Handle location finding requests
    ◦   Not involved in the read/write path
    ◦   Generally does very little most of the time
Distributed Coordination
 Zookeeper is used to manage master
  election and server availability
 Set up as a cluster, provides distributed
  coordination primitives
 An excellent tool for building cluster
  management systems
HBase Storage Architecture
HBase Public Timeline
   November 2006
    ◦ Google releases paper on Bigtable
   February 2007
    ◦ Initial HBase prototype created as Hadoop contrib
   October 2007
    ◦ First "useable" HBase (0.15.0 Hadoop)
   December 2007
    ◦ First HBase User Group
   January 2008
    ◦ Hadoop becomes TLP, HBase becomes subproject
   October 2008
    ◦ HBase 0.18.1 released
   January 2009
    ◦ HBase 0.19.0 released
   September 2009
    ◦ HBase 0.20.0 released
HBase WorldLingo Timeline
HBase - Example
   Store web crawl data
    ◦ Table crawl with family content
    ◦ Row is URL with columns
      content:data stores raw crawled data
      content:language stores http language header
      content:type stores http content-type header
    ◦ If processing raw data for hyperlinks and
      images, add families links and images
      links:<url> column for each hyperlink
      links:<url> column for each image
HBase - Clients
   Native Java client/API
    ◦ get(Get get)
    ◦ put(Put put)
   Non-Java clients
    ◦ Thrift server (Ruby, C++, Erlang, etc.)
    ◦ REST server (Stargate)
 TableInput/TableOutputFormat for
 HBase shell (jruby)
Scaling HBase
   Add more machines to scale
    ◦ Automatic rebalancing
 Base model (BigTable) scales past 1000TB
 No inherent reason why Hbase couldn„t
What to store in HBase
 Maybe not your raw log data...
 ... but the results of processing it with
 By storing the refined version in HBase,
  can keep up with huge data demands and
  serve to your website
   “NoSQL” Database!
    ◦   No joins
    ◦   No sophisticated query engine
    ◦   No transactions (sort of)
    ◦   No column typing
    ◦   No SQL, no ODBC/JDBC, etc. (but there is
        HBql now!)
 Not a replacement for your RDBMS...
 Matching Impedance!
Why HBase?
 Datasets are reaching Petabytes
 Traditional databases are expensive to
  scale and difficult to distribute
 Commodity hardware is cheap and
  powerful (but HBase can make use of
  powerful machines too!)
 Need for random access and batch
  processing (which Hadoop does not
 Single reads are 1-10ms depending on
  disk seeks and caching
 Scans can return hundreds of rows in
  dozens of ms
 Serial read speeds
Multilingual Archive (cont.)
 44 Dell PESC1435, 12GB RAM, 2 x 1TB
  SATA drives
 Java 6
 Tomcat 5.5
 88 Xen domU‟s
    ◦ Apache
    ◦ Hadoop/HBase
    ◦ Tomcat application servers
   Currently split into two clusters
Lucene Search Server
 43 fields indexed
 166GB size
 Automated merging/warm-up/swap
 Looking into scalable solution
    ◦   Katta
    ◦   Hyper Estraier
    ◦   DLucene
    ◦   …
   Sorting?
Multilingual Archive (cont.)
 5 Tables
 Up to 5 column families
 XML Schemas
 Automated table schema updates
 Standard options tweaked over time
    ◦ Garbage Collection!
   MemCached(b) layer
  Network                                                                      Firewall

   LWS                                                 Director 1                                    Director n

   Web                           Apache 1                                      Apache n                 …

   App                Tomcat 1              Tomcat n                Tomcat 1              Tomcat n

   Cache    MemCached

   Data       HBase
 Backup/Restore
 Index building
 Cache filling
 Mapping
 Updates
 Translation
HBase - Problems
   Early versions (before HBase 0.19.0!)
    ◦ Data loss
    ◦ Migration nightmares
    ◦ Slow performance

   Current version
    ◦ Read HBase Wiki!!!
   Single point of failure (name node only!)
HBase - Notes
 RTF M   (ine)

 HBase Wiki, IRC Channel
 Personal Experience:
    ◦   Max. file handles (32k+)
    ◦   Hadoop xceiver limits (NIO?)
    ◦   Redundant meta data (on name node)
    ◦   RAM (4GB+)
    ◦   Deployment strategy
    ◦   Garbage collection (use CMS, G1?)
    ◦   Maybe not mix batch and interactive?
 Use supplied Ganglia context or JMX
  bridge to enable Nagios and Cacti
 JMXToolkit: swiss army knife for JMX
  enabled servers:
HBase - Roadmap
   HBase 0.20.x “Performance”
    ◦   New Key Format – KeyValue
    ◦   New File Format – Hfile
    ◦   New Block Cache – Concurrent LRU
    ◦   New Query and Result API
    ◦   New Scanners
    ◦   Zookeeper Integration – No SPOF in HBase
    ◦   New REST Interface
    ◦   Contrib
         Transactional Tables
         Secondary Indexes
         Stargate
HBase - Roadmap (cont.)
   HBase 0.21.x “Advanced Concepts”
    ◦ Master Rewrite – More Zookeeper
    ◦ New RPC Protocol (Avro)
    ◦ Multi-DC Replication
    ◦ Intra Row Scanning
    ◦ Further optimizations on algorithms and data
    ◦ Discretionary Access Control
    ◦ Coprocessors
   Email:
 Blog:
 Twitter: larsgeorge

Shared By:
Tags: nosql, hbase, hadoop