Docstoc

HUG11: HBase 0.90 Preview

Document Sample
HUG11: HBase 0.90 Preview Powered By Docstoc
					    HUG11
HBase 0.90 Preview
    June 30th 2010
     @ Facebook
    HBase 0.90 Preview   HUG11
                         June 30 2010
HBase 0.90 Intro
   Jonathan Gray
      Facebook



  HBase 0.90 Preview   HUG11
                       June 30 2010
                HBase 0.90
•   Why 0.90
•   What’s new in 0.90
•   Which Hadoop versions
•   Development releases
•   Production release




               HBase 0.90 Preview   HUG11
                                    June 30 2010
                  Why 0.90?
• Decision to break with Hadoop versioning
  – Last major Hadoop release was April 2009
  – Moving forward, releases not correlated
• HBase nearing 1.0
  – Not there yet, but we are getting close
• HBase 0.90 is a major upgrade from 0.20
  – 90 is a big number, so you know it’s good
  – (But most likely, no migration)

                HBase 0.90 Preview              HUG11
                                                June 30 2010
           What’s New in 0.90
                     (the schedule)


• Durability and Stability release
  – HDFS appends + WAL improvements, Testing
  – Todd Lipcon, Cloudera
• Master Rewrite
  – Cleanup of master, move region transitions to ZK
  – Karthik Ranganathan, Facebook
• Inter-cluster/Inter-DC Replication
  – Jean-Daniel Cryans, StumbleUpon

               HBase 0.90 Preview                HUG11
                                                 June 30 2010
          What’s New in 0.90
                     (the schedule)


• Bloom Filters
  – Nicolas Spiegelberg, Facebook
• Bulk loading improvements
  – Todd Lipcon, Cloudera
• Maven and other good stuff
  – Michael Stack, StumbleUpon
• Performance improvements
  – Jonathan Gray, Facebook

               HBase 0.90 Preview     HUG11
                                      June 30 2010
           What’s New in 0.90
                    (not on the schedule)


• Peripheral improvements
  – REST / Stargate, Shell, Avro server, EC2 scripts
• HBaseFSCK
• Contribs moved to github
• Lots and lots of other stuff
  – 506 fixes committed
  – Still have 123 unresolved and 9 blockers


                HBase 0.90 Preview                     HUG11
                                                       June 30 2010
             Hadoop Versions
• 0.90 is durable but requires HDFS appends
  – No released Hadoop version does, exists in trunk
• Dhruba from Facebook built it for 0.20
  – Lots of help from Nicolas, Todd, and others
  – Created a Hadoop 0.20-append branch in Apache
  – Release of branch in conjunction with HBase




               HBase 0.90 Preview               HUG11
                                                June 30 2010
              Hadoop Versions
• Durable HBase will require a special HDFS
  – Apache Hadoop 0.20-append
  – Cloudera Distribution of Hadoop version 3
     • Includes lots of other fixes and perf improvements
  – Facebook Distribution of Hadoop (0.20-append)
     • Yahoo! distro + FB data warehouse fixes + appends
     • Also includes AvatarNode (HA NN) and RAID




                 HBase 0.90 Preview                     HUG11
                                                        June 30 2010
        Development Releases
• Early cuts off TRUNK into the hands of users
  – Developers need help testing in the real world
• Versioned as 0.89 plus a date stamp
  – 0.89.20100621 is the first dev release
  – Available at hbase.org or as part of CDHv3 beta


• Plan to release every couple weeks until 0.90


                HBase 0.90 Preview               HUG11
                                                 June 30 2010
           Production Release
• Expected release in Q3
  – Hopefully early Q3, possibly late Q3
• Lots of factors at play
  – How many new things to let in
     • Some not done but are really good
  – How many and how early bugs get found
     • Try development releases and help us track them down!




                 HBase 0.90 Preview                   HUG11
                                                      June 30 2010
         HBase Durability and
              Stability
d Lipcon
@cloudera.com
0, 2010



                                      HUG11
                                      HUG11
                 HBase 0.90 Preview
                HBase 0.90 Preview    June 30 2010
                                       June 30 2010
                                             12
           Goals - Durability
– Many use cases cannot tolerate lost edits
– If a write is acknowledged to a client, it should not
  disappear - even when there are failures
– In HBase 0.90, data loss of any kind should be
  considered highest-priority bugs.




                                                 HUG11
                                                 HUG11
             HBase 0.90 Preview
            HBase 0.90 Preview                   June 30 2010
                                                  June 30 2010
                                                        13
           Goals - Reliability
– Fault tolerance is a key feature of HBase
– In previous versions, certain failures were not
  handled without manual recovery
– All normal failure scenarios should recover
  automatically




                                                HUG11
                                                HUG11
             HBase 0.90 Preview
            HBase 0.90 Preview                  June 30 2010
                                                 June 30 2010
                                                       14
           Durability in 0.89+
– HDFS support for hflush() /sync API
   • This API allows the region server to flush edits through
     the whole pipeline and wait for acknowledgement
     before returning to client.
   • Available in CDH3b2 as of this week
   • In progress on Apache branch-0.20-append
– Small performance hit - HDFS-895 allows pipelined
  flushes



                                                       HUG11
                                                       HUG11
              HBase 0.90 Preview
             HBase 0.90 Preview                        June 30 2010
                                                        June 30 2010
                                                              15
         Reliability in 0.89+
– Extensive manual testing of failure scenarios
– Improvements to test framework to automate
  failure tests
– Continued progress bugfixing issues revealed by
  testing above.
– Master overhaul in progress at Facebook




                                              HUG11
                                              HUG11
             HBase 0.90 Preview
            HBase 0.90 Preview                June 30 2010
                                               June 30 2010
                                                     16
          Operability in 0.89+
• Operability improves uptime

• HBase fsck
• More performance metrics (in progress)




                                           HUG11
                                           HUG11
17           HBase 0.90 Preview
            HBase 0.90 Preview             June 30 2010
                                            June 30 2010
                                                  17
Master Rewrite
 Karthik Ranganathan
      Facebook




  HBase 0.90 Preview   HUG11
                       June 30 2010
            Why do we need it?
•   Master failover does not always work
•   Zookeeper integration is patched on
•   Master to RS communication inefficient
•   Code gets difficult to work with
    – Logic around ROOT and META scattered
    – HMaster passed to many master components



                HBase 0.90 Preview               HUG11
                                                 June 30 2010
                          Region Move
                                          2. Close(Region1)
          Master Memory
                …
          Region1: RS1                    3. Closed(Region1)
               …                                                     Region1
     Region1: Closed
1              …

                                                               RS1
                   4. Update memory
Region1            and META
                                      5. Region1 is closed,
                                      ask RS2 to open it
                                                                     Region1



                                                               RS2


                          HBase 0.90 Preview                             HUG11
                                                                         June 30 2010
      Master Failover during move
                                  2. Close(Region1)
          Master Memory
                …
          Region1: RS1            3. Closed(Region1)
               …                                               Region1
1              …

                                                       RS1

Region1                                                      Region1 is
                                                             NEVER
                                                             OPENED!
          Master Memory
                 …
          Region1: RS1
                …
                …
                                                       RS2


                          HBase 0.90 Preview                        HUG11
                                                                    June 30 2010
             Master Rewrite – Use ZK
                                        2. Close(Region1)
          Master
         Memory
            …
      Region1: RS1                                                Region1
            …                         3. Closed(Region1)
1
            …
                                           ZooKeeper        RS1
                   4. Read META, ZK
Region1                                Region1: Closed



          Master
          Memory
           …
     Region1: RS1
            …
    Region1: Closed
            …                                               RS2


                             HBase 0.90 Preview                   HUG11
                                                                  June 30 2010
          What else can we get?
•   Master need not do META edits anymore
•   Limit concurrent major compactions in cluster
•   If RS restarts, it can pick up its own regions
•   Reporting on shutdown HBase




                 HBase 0.90 Preview           HUG11
                                              June 30 2010
HBase Replication
   Jean-Daniel Cryans




   HBase 0.90 Preview   HUG11
                        June 30 2010
                  Who?
• jdcryans
• Database Engineer at StumbleUpon since
  Oct09
• HBase committer since July08




             HBase 0.90 Preview            HUG11
                                           June 30 2010
                      What?
• HBase replication isn’t:
  – HDFS replication
  – Intra-cluster replication
• Getting edits from one HBase cluster to one or
  many others, eventually.
• Geo-graphically distributed (or not).
• Master-slave, master-master, circular
  replication.
                 HBase 0.90 Preview        HUG11
                                           June 30 2010
                      Why?
• Firstly: Disaster recovery
  – Earthquake ate my HBase!
• Then: High Availability
  – Serving my data from all over the world!
• Also: Clusters “Synchronization”
  – Production cluster replicated to “science” cluster,
    and back
• <insert other clever usages>

                HBase 0.90 Preview                 HUG11
                                                   June 30 2010
                      How?
• Umbrella jira: HBASE-1295
• Main jira of interest: HBASE-2223
• Main architectural features:
  – Master-push
  – Write-ahead-log shipping
  – Metadata stored in ZooKeeper for high availability
    and watchers


               HBase 0.90 Preview                HUG11
                                                 June 30 2010
How… does HBase work again?




       http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html



        HBase 0.90 Preview                                                             HUG11
                                                                                       June 30 2010
  Overview




HBase 0.90 Preview   HUG11
                     June 30 2010
   How to recover from RS failure
• Trust the magic.




               HBase 0.90 Preview   HUG11
                                    June 30 2010
           How to really recover
• The Quest for the Holy Lock:
  – Every region server gets notified that a RS “x” died via
    a ZooKeeper Watcher.
  – Everyone tries to create a znode called “lock” in the
    dead RS’s znode that contains all the HLog queues.
  – Only one will be able to do it, the rest will fail and let
    it go.
  – The winner transfers all the old queues in its own
    znode, appending the name of the dead RS.
• This works pretty well, except if the winner dies
  while recovering. Working on an improvement in
  HBASE-2611.
                  HBase 0.90 Preview                    HUG11
                                                        June 30 2010
                    When?
• Slated for 0.90

• Available in trunk

• And in the next 0.89 release!




               HBase 0.90 Preview   HUG11
                                    June 30 2010
             Questions?

 By fdbryant3, at http://www.flickr.com/photos/fdbryant3/2320570080/

HBase 0.90 Preview                                                     HUG11
                                                                       June 30 2010
    Bloom Filters
Nicolas Spiegelberg (Facebook)




      HBase 0.90 Preview         HUG11
                                 June 30 2010
       What are Bloom Filters?
• Determine if a “key” is a member in a set
  – Probabilistic (default: 1% error rate)
  – False Negatives NOT Allowed
  – Space Efficient (avg: 8-16 bits/entry)




                HBase 0.90 Preview            HUG11
                                              June 30 2010
   How HBase Uses Bloom Filters
• Per Column Family (Store)
   • Multiple HFiles
   • Your key may be in 1 HFile
• Bloom placed in LRU cache
   • Fraction of HFile size
   • Skip Block Index Search if
     you don’t have to
• When might you need this?
   • Exact queries
   • Large objects (10kb+)

                   HBase 0.90 Preview   HUG11
                                        June 30 2010
 Bloom Filters: Previous Problems
• Problem: Must estimate key size ahead of
  time
  – So we naively used Entry Count (BAD)
  100 entries/row + row-level blooms = 100x inflation!!!


• Solution: Fold/Compress Bloom
  – Bloom are inserted into array: hash       % array.size
     • If array.size % 2 == 0, can bitwise OR each half
     • If divisible by 1<<N, array can compress up to 1/1<<N
  – Size using Entry Count, compress at the end
                  HBase 0.90 Preview                      HUG11
                                                          June 30 2010
 Bloom Filters: Previous Problems
• Problem: Tricky bugs during compactions
  – Minor Compactions would also compact blooms


• Solution: Faster Hash + Keep it Simple
  – Jenkins => Murmur + Combinatoral Generation
  – Speedy now, so just recompute bloom
     • Protip: Can Turn on blooms after the fact. Will be added
       during compaction


                 HBase 0.90 Preview                      HUG11
                                                         June 30 2010
Bloom Filters: User Configuration
• Enable Blooms
  – Column Family granularity
  – HColumnDescriptor.setBloomFilterType()
  – Options: NONE | ROW | ROWCOL




               HBase 0.90 Preview            HUG11
                                             June 30 2010
 Bloom Filters: User Configuration
• Global Config Settings
  – “io.hfile.bloom.error.rate”
     • Average false positive rate. default = 1%
     • Decrease by 1⁄2 (.5%) == +1 bit per bloom entry
  – “io.hfile.bloom.max.fold”
     • Guaranteed fold rate.
     • Default = 7, so compress up to 1/(1<<7) = 1/128
  – “io.hfile.bloom.enabled”
     • Emergency kill switch

                  HBase 0.90 Preview                     HUG11
                                                         June 30 2010
     Further Bloom Filter Reading
• Wikipedia: http://en.wikipedia.org/wiki/Bloom_filters
• Technical: https://issues.apache.org/jira/browse/HBASE-1200
   – Official Patch
   – Thought flow & Future Feature Ideas: Bloom_Filters_in_Hbase.pdf




                     HBase 0.90 Preview                          HUG11
                                                                 June 30 2010
HBase Bulk Loads

    Todd Lipcon
   todd@cloudera.com
      Jun 30, 2010



                        HUG11
                        HUG11
   HBase 0.90 Preview
  HBase 0.90 Preview    June 30 2010
                         June 30 2010
                               43
                Overview
• Efficiently load MapReduce output into an
  HBase table
• Skips normal RPC paths, etc.
• Typically 10x or more improvement over API
  usage




                                         HUG11
                                         HUG11
44           HBase 0.90 Preview
            HBase 0.90 Preview           June 30 2010
                                          June 30 2010
                                                44
                 Use Cases
• Import new datasets from other formats
• Do MR analysis on a dataset and bulk load to
  HBase for real-time read access
• Import incremental updates from other
  systems




                                           HUG11
                                           HUG11
45            HBase 0.90 Preview
             HBase 0.90 Preview            June 30 2010
                                            June 30 2010
                                                  45
                    Bulk Load Process
1. Run MR job with total order,
  HFileOutputFormat
2. bin/hbase completebulkload /output-path/
  tablename




     http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html

                                                                   HUG11
                                                                   HUG11
46                     HBase 0.90 Preview
                      HBase 0.90 Preview                           June 30 2010
                                                                    June 30 2010
                                                                          46
           New importtsv tool
• bin/hbase importtsv
   -Dimporttsv.columns=foo,bar,HBASE_ROW_KEY
   -Dimporttsv.bulk.output=/output-path
   <tablename> <inputdir>




                                               HUG11
                                               HUG11
47             HBase 0.90 Preview
              HBase 0.90 Preview               June 30 2010
                                                June 30 2010
                                                      47
Miscellaneous
   Michael Stack




 HBase 0.90 Preview   HUG11
                      June 30 2010
                  Maven
• Its working?
• What needs fixing?




              HBase 0.90 Preview   HUG11
                                   June 30 2010
                    Logo
• Shall we change our logo?




              HBase 0.90 Preview   HUG11
                                   June 30 2010
      Site




HBase 0.90 Preview   HUG11
                     June 30 2010
    GSOC: HBASE-50, Snapshots
• Li Chongxin
• Sponsored by FB
• Current status
  – Design
  – Plan
  – Now implementing




              HBase 0.90 Preview   HUG11
                                   June 30 2010
HBase Performance
    Jonathan Gray
       Facebook



   HBase 0.90 Preview   HUG11
                        June 30 2010
       Reduce I/O around splits
• In 0.20, splits only triggered after compaction
  – This meant rewriting data on each side of split
• 0.90 changes splits to look at all StoreFiles
  – Checked after flush not after compaction
  – HBASE-2375


 Seeing 30-40% improvement on import speed


                HBase 0.90 Preview                HUG11
                                                  June 30 2010
              Reduce I/O around splits
                                              HBase 0.20
StoreFile1 (64MB)                                                                             StoreFile6
                                                            StoreFile5A




                                                                              Compactions
StoreFile2 (64MB)
                    Compaction                               (128MB)
                                                                                               (128MB)
                                 StoreFile5




                                                Split
StoreFile3 (64MB)                 (256MB)
                                                            StoreFile5B
                                                                                              StoreFile7
                                                             (128MB)
StoreFile4 (64MB)                                                                              (128MB)

                                              HBase 0.90
            StoreFile1 (64MB)                 StoreFile 1A 1B                    StoreFile6




                                                                Compactions
                                                                                  (128MB)
            StoreFile2 (64MB)                 StoreFile 2A 2B
                                  Split




            StoreFile3 (64MB)                 StoreFile 3A 3B
                                                                                 StoreFile7
            StoreFile4 (64MB)                 StoreFile 4A 4B                     (128MB)


                                 HBase 0.90 Preview                                              HUG11
                                                                                                 June 30 2010
      Reduce I/O around splits
• In above example, we cut the amount of data
  written to reach StoreFile 6 and 7 by 50%
  – And we removed the longest-running compaction


• Next steps
  – Perform the post-split compaction lazily
  – Need to be able to reference ranges of HFiles
     • References of references


                 HBase 0.90 Preview                 HUG11
                                                    June 30 2010
     Reduce time regions offline
• Regions go offline for short periods of time
  – Splits, load balancing, regionserver failover
• Make splits faster
  – Run only on RS, immediately re-open children
• Double-flush MemStore on region close
  – No long-running flush while region offline
• Use ZooKeeper for region movement
  – No waiting for heartbeats, event-triggered arch.

                HBase 0.90 Preview                  HUG11
                                                    June 30 2010
     Concurrency and Priorities
• Clusters being built with up to 12 disks/node
  – Added multi-threading to flushes and
    compactions
• Long-running operations starve processing
  – Added multi-threading to handling of Master msgs
• Not all flushes and compactions created equal
  – Added priorities to flushes and compactions
     • High priority when flushing for heap pressure,
       compacting for storefile count pressure
                 HBase 0.90 Preview                     HUG11
                                                        June 30 2010
             HFile seek/reseek
• Use input from query to intelligently skip
  unnecessary HFile blocks
  – Seek to the columns you want, not start of row
  – Seek to the versions you want, not start of column
     • In many cases, these allow you to skip over blocks
       or early-out from blocks


• Very much a work in progress

                 HBase 0.90 Preview                         HUG11
                                                            June 30 2010
HFile seek/reseek




  HBase 0.90 Preview   HUG11
                       June 30 2010
            Configurable WAL
• WAL is required for data durability under
  failure but can be tweaked for performance
  – Deferred log flush will execute appends constantly
    but does not block user requests
  – Periodic log flush will execute appends on a fixed-
    time basis
  – Disabling WAL will bypass it, some users do not
    need guarantees or perform a flush after import


                HBase 0.90 Preview                HUG11
                                                  June 30 2010
      Other stuff in the works…
• Storing max/min timestamp of each file to
  allow skipping those that do not overlap with
  specified TimeRange
• Faster enable/disable/drop table operations

• More focus on performance after 0.90 release



              HBase 0.90 Preview            HUG11
                                            June 30 2010
That is all for today.
Thanks for coming!



    Questions?
       (for anyone)



   HBase 0.90 Preview    HUG11
                         June 30 2010
Announcements




 HBase 0.90 Preview   HUG11
                      June 30 2010

				
DOCUMENT INFO
Shared By:
Tags: hbase, hadoop
Stats:
views:6796
posted:7/1/2010
language:English
pages:64
Description: HBase 0.90 Preview. Slides from HUG11, the 11th HBase User Group hosted at Facebook on June 30, 2010.