Advanced HBase

Document Sample
Advanced HBase
Shared by: Lars George
Stats
views:
5447
posted:
12/12/2010
language:
English
pages:
85
Advanced
 HBase
 

Navteq
 Architect
 Summit,
 December
 2010
 

Lars
 George
 

lars@cloudera.com
 

About
 Me
 

•  SoCware
 Engineer
 

•  Cloudera
 SoluGon
 Architect
 

•  Formerly
 CTO
 of
 WorldLingo
 

•  Scaleable
 system
 aficionado
 

•  Working
 with
 HBase
 since
 end
 of
 2007
 

•  Apache
 HBase
 CommiRer
 (larsgeorge@apache.org)
 

•  European
 HBase
 Ambassador
 (self
 proclaimed)
 

Outline
 

•  Why
 HBase?
 

•  MapReduce
 with
 HBase
 

•  IntegraGon
 with
 Indexing
 

•  Advanced
 Techniques
 

Why
 Hadoop/HBase?
 

•  Datasets
 are
 constantly
 growing
 and
 intake
 soars
 

–  Yahoo!
 has
 >82PB
 and
 >25k
 machines
 

–  Facebook
 adds
 15TB
 per
 day,
 >36PB
 raw
 data,
 >2200
 

machines
 

–  Are
 you
 “throwing”
 data
 away
 today?
 

•  TradiGonal
 databases
 are
 expensive
 to
 scale
 and
 

inherently
 difficult
 to
 distribute
 

•  Commodity
 hardware
 is
 cheap
 and
 powerful
 

–  $1000
 buys
 you
 4-­‐8
 cores/4GB/1TB
 

–  500GB
 15k
 RPM
 SAS
 nearly
 $500
 

•  Need
 for
 random
 access
 and
 batch
 processing
 

–  Hadoop
 only
 supports
 batch/streaming
 

History
 of
 Hadoop/HBase
 

•  Google
 solved
 its
 scalability
 problems
 

–  “The
 Google
 File
 System”
 published
 October
 2003
 

•  Hadoop
 DFS
 

–  “MapReduce:
 Simplified
 Data
 Processing
 on
 Large
 

Clusters”
 published
 December
 2004
 

•  Hadoop
 MapReduce
 

–  “BigTable:
 A
 Distributed
 Storage
 System
 for
 

Structured
 Data”
 published
 November
 2006
 

•  HBase
 

Hadoop
 IntroducGon
 

•  Two
 main
 components
 

–  Hadoop
 Distributed
 File
 System
 (HDFS)
 

•  A
 scalable,
 fault-­‐tolerant,
 high
 performance
 distributed
 

file
 system
 capable
 of
 running
 on
 commodity
 hardware
 

–  Hadoop
 MapReduce
 

•  SoCware
 framework
 for
 distributed
 computaGon
 

•  Significant
 adopGon
 


 

–  Used
 in
 producGon
 in
 hundreds
 of
 organizaGons
 

–  Primary
 contributors:
 Yahoo!,
 Facebook,
 Cloudera
 

HDFS:
 Hadoop
 Distributed
 File
 System
 

•  Reliably
 store
 petabytes
 of
 replicated
 data
 across
 

thousands
 of
 nodes
 

–  Data
 divided
 into
 64MB
 blocks,
 each
 block
 replicated
 

three
 Gmes
 

•  Master/Slave
 architecture
 

–  Master
 NameNode
 contains
 block
 locaGons
 

–  Slave
 DataNode
 manages
 block
 on
 local
 file
 system
 

•  Built
 on
 commodity
 hardware
 

–  No
 15k
 RPM
 disks
 or
 RAID
 required
 (nor
 wanted!)
 

HDFS
 Example
 

•  Store
 1TB
 flat
 text
 file
 on
 10
 node
 cluster
 

–  Can
 use
 Java
 API
 or
 command
 line
 

./hadoop
 dfs
 -­‐put
 ./srcFille
 /destFile
 

–  File
 split
 into
 64MB
 blocks
 (16,384
 total)
 

–  Each
 block
 sent
 to
 three
 nodes
 (49,152
 total,
 3TB)
 

–  Has
 noGon
 of
 racks
 to
 ensure
 replicaGon
 across
 

disGnct
 clusters/geographic
 locaGons
 

–  Build
 in
 check-­‐summing
 (CRC)
 

MapReduce
 

•  Distributed
 programming
 model
 to
 reliably
 

process
 petabytes
 of
 data
 using
 its
 locality
 

–  Built-­‐in
 bindings
 for
 Java
 and
 C
 

–  Can
 be
 used
 with
 any
 language
 via
 Hadoop
 

Streaming
 

•  Inspired
 by
 map
 and
 reduce
 funcGons
 in
 

funcGonal
 programming
 




 Input
 -­‐>
 Map()
 -­‐>
 Copy/Sort
 -­‐>
 Reduce()
 -­‐>
 Output
 

MapReduce
 Example
 

•  Perform
 “word
 count”
 on
 1TB
 file
 in
 HDFS
 

–  Map
 task
 launched
 for
 each
 block
 of
 file
 

–  Within
 each
 task,
 Map
 funcGon
 called
 for
 each
 

line:
 Map(LineNumber,
 LineString)
 

•  For
 each
 word
 in
 LineString
 -­‐>
 Output(Word,
 1)
 

–  Map
 output
 is
 sorted,
 grouped
 and
 copied
 to
 

reducer
 

–  Reduce(Word,
 List)
 called
 for
 each
 word
 
 

•  Output(Word,
 Length(List))
 

–  Final
 output
 contains
 total
 count
 for
 each
 word
 

Hadoop…
 

•  …
 is
 designed
 to
 store
 and
 stream
 extremely
 large
 

datasets
 in
 batch
 

•  …
 is
 not
 intended
 for
 realCme
 querying
 

•  …
 does
 not
 support
 random
 access
 

•  …
 does
 not
 handle
 billions
 of
 small
 files
 well
 

–  Less
 than
 default
 block
 size
 of
 64MB
 and
 smaller
 

–  Keeps
 “inodes”
 in
 memory
 on
 master
 

•  …
 is
 not
 supporGng
 structured
 data
 more
 than
 

unstructured
 or
 complex
 data
 



That
 is
 why
 we
 have
 HBase!
 

Why
 HBase?
 

•  QuesGon:
 Why
 HBase
 and
 not
 ?
 

•  What
 else
 is
 there?
 

–  Key/value
 stores
 
 

–  Document-­‐oriented
 stores
 

–  Column-­‐oriented
 stores
 

–  Graph-­‐oriented
 stores
 

•  Features
 to
 ask
 for
 

–  In
 memory
 or
 persistent?
 

–  Strict
 or
 eventual
 consistency?
 

–  Distributed
 or
 single
 machine
 (or
 aCerthought)?
 

–  Designed
 for
 read
 and/or
 write
 speeds?
 

–  How
 does
 it
 scale?
 (if
 that
 is
 what
 you
 need)
 

Key/Value
 Stores
 

•  Choices
 (a
 small
 selecGon)
 

–  MemCached,
 
 

–  Tokyo
 Cabinet,
 MemCacheDB,
 Membase,
 Redis
 

–  Voldemort,
 Dynomite,
 Scalaris
 

–  Dynamo,
 Dynomite
 

–  Berkeley
 DB
 



•  Pros
 

–  Used
 as
 caches
 

–  Simple
 APIs
 

–  Fast
 

•  Cons
 

–  Keys
 must
 be
 known
 (or
 recomputed)
 

–  Scale
 only
 with
 manual
 intervenGon
 (consistent
 hashing
 etc.)
 

–  Cannot
 represent
 structured
 data
 

Document
 Stores
 

•  More
 choices
 

–  MongoDB
 

–  CouchDB
 



•  Pros
 

–  Structured
 data
 supported
 

–  Schema
 free
 

–  Supports
 changes
 to
 documents
 without
 reconfiguraGon
 

–  May
 support
 secondary
 indexes
 and/or
 search
 

•  Cons
 

–  Everything
 is
 stored
 in
 the
 same
 place,
 does
 not
 work
 well
 with
 

heterogeneous
 payloads
 

–  Scalability
 is
 either
 not
 proven
 or
 similar
 to
 RDBMS
 models
 

–  Not
 well
 integrated
 with
 MapReduce
 (no
 block
 loads
 or
 locality
 

advantages)
 

Column-­‐Oriented
 Stores
 

•  Hybrid
 architectures
 

–  HBase,
 BigTable
 

–  Cassandra
 

–  VerGca,
 C-­‐Store
 



•  Pros
 

–  Allow
 access
 to
 only
 relevant
 data
 

•  Cons
 

–  Limit
 funcGonality
 to
 fit
 model
 

Which
 One
 To
 Choose?
 

•  Key/value
 stores
 

–  Caches
 

–  Simple
 data
 

–  Need
 for
 speed
 

•  Document
 stores
 

–  Evolving
 schemas
 

–  Higher
 level
 document
 related
 features
 
 

•  Column-­‐oriented
 stores
 

–  Scalability
 

–  Mixture
 of
 payloads
 

Which
 One
 To
 Choose?
 

•  In
 Memory
 or
 On-­‐Disk
 

–  Cache
 or
 Database
 

•  Strict
 consistency
 

–  Easy
 to
 handle
 on
 ApplicaGon
 level
 

–  Content-­‐management
 systems,
 banking
 etc.
 

•  Eventual
 consistency
 

–  Higher
 availability
 but
 may
 read
 stale
 data
 

–  Deal
 with
 conflict
 resoluGon
 and
 repairs
 in
 your
 code
 

–  Shopping
 carts,
 Gaming
 

What
 is
 HBase?
 

•  Distributed
 

•  Column-­‐Oriented
 

•  MulG-­‐Dimensional
 

•  High-­‐Availability
 (CAP
 anyone?)
 

•  High-­‐Performance
 

•  Storage
 System
 

Project
 Goals
 

Billions
 of
 Rows
 *
 Millions
 of
 Columns
 *
 Thousands
 of
 Versions
 

Petabytes
 across
 thousands
 of
 commodity
 servers
 

HBase
 is
 not…
 

•  A
 SQL
 Database
 

–  No
 joins,
 no
 query
 engine,
 no
 types,
 no
 SQL
 

–  TransacGons
 and
 secondary
 indexes
 only
 as
 add-­‐ons
 but
 

immature
 

•  A
 drop-­‐in
 replacement
 for
 your
 RDBMS
 

•  You
 must
 be
 OK
 with
 RDBMS
 anC-­‐schema
 

–  Denormalized
 data
 

–  Wide
 and
 sparsely
 populated
 tables
 

–  Just
 say
 “no”
 to
 your
 inner
 DBA
 



Keyword:
 Impedance
 Match
 
 

HBase
 Architecture
 

•  Table
 is
 made
 up
 of
 any
 number
 if
 regions
 

•  Region
 is
 specified
 by
 its
 startKey
 and
 endKey
 

–  Empty
 table:
 (Table,
 NULL,
 NULL)
 

–  Two-­‐region
 table:
 (Table,
 NULL,
 

“com.cloudera.www”)
 and
 (Table,
 

“com.cloudera.www”,
 NULL)
 

•  Each
 region
 may
 live
 on
 a
 different
 node
 and
 is
 

made
 up
 of
 several
 HDFS
 files
 and
 blocks,
 each
 

of
 which
 is
 replicated
 by
 Hadoop
 

HBase
 Architecture
 (cont.)
 

•  Two
 types
 of
 HBase
 nodes:
 


 
 
 Master
 and
 RegionServer
 

•  Special
 tables
 -­‐ROOT-­‐
 and.META.
 store
 schema
 

informaGon
 and
 region
 locaGons
 

•  Master
 server
 responsible
 for
 RegionServer
 

monitoring
 as
 well
 as
 assignment
 and
 load
 

balancing
 of
 regions
 

•  Uses
 ZooKeeper
 as
 its
 distributed
 coordinaGon
 

service
 

–  Manages
 Master
 elecGon
 and
 server
 availability
 

HBase
 Tables
 

•  Tables
 are
 sorted
 by
 Row
 in
 lexicographical
 order
 

•  Table
 schema
 only
 defines
 its
 column
 families
 

–  Each
 family
 consists
 of
 any
 number
 of
 columns
 

–  Each
 column
 consists
 of
 any
 number
 of
 versions
 

–  Columns
 only
 exist
 when
 inserted,
 NULLs
 are
 free
 

–  Columns
 within
 a
 family
 are
 sorted
 and
 stored
 

together
 

–  Everything
 except
 table
 names
 are
 byte[]
 



(Table,
 Row,
 Family:Column,
 Timestamp)
 -­‐>
 Value
 

HBase
 Table
 as
 Data
 Structures
 

SortedMap(
 


 RowKey,
 List(
 


 
 
 SortedMap(
 


 
 
 
 Column,
 List(
 


 
 
 
 
 Value,
 Timestamp
 


 
 
 
 )
 


 
 
 )
 


 
 )
 

)
 

SortedMap(RowKey,
 List(SortedMap(Column,
 List(Value,
 Timestamp))))
 

Web
 Crawl
 Example
 

•  Canonical
 use-­‐case
 for
 BigTable
 

•  Store
 web
 crawl
 data
 

–  Table
 webtable
 with
 family
 content
 and
 meta
 

–  Row
 is
 reversed
 URL
 with
 Columns
 

•  content:data
 stores
 the
 raw
 crawled
 data
 

•  meta:language
 stores
 hRp
 language
 header
 

•  meta:type
 stores
 hRp
 content-­‐type
 header
 

–  While
 processing
 raw
 data
 for
 hyperlinks
 and
 images,
 

add
 families
 links
 and
 images
 

•  links:
 column
 for
 each
 hyperlink
 

•  images:
 column
 for
 each
 image
 

HBase
 Clients
 

•  NaGve
 Java
 Client/API
 

–  get(Get
 get),
 put(Put
 put),
 delete(Delete
 delete)
 

–  getScanner(Scan
 scan)
 

•  Non-­‐Java
 Clients
 

–  REST
 server
 

–  Avro
 server
 

–  ThriC
 server
 

–  Jython,
 Scala,
 Groovy
 DSL
 

•  TableInputFormat/TableOutputFormat
 for
 MapReduce
 

–  HBase
 as
 MapReduce
 source
 and/or
 target
 

•  HBase
 Shell
 

–  JRuby
 shell
 adding
 get,
 put,
 scan
 and
 admin
 calls
 

HBase
 Extensions
 

•  Hive,
 Pig,
 Cascading
 

–  Hadoop-­‐targeted
 MapReduce
 tools
 with
 HBase
 

integraGon
 

•  Sqoop
 

–  Read
 and
 write
 to
 HBase
 for
 further
 processing
 in
 

Hadoop
 

•  HBase
 Explorer,
 Nutch,
 Heretrix
 

•  SpringData?
 (volunteers?)
 

•  Karmasphere?
 

History
 of
 HBase
 

•  November
 2006
 

–  Google
 releases
 paper
 on
 BigTable
 

•  February
 2007
 

–  IniGal
 HBase
 prototype
 created
 as
 Hadoop
 contrib
 

•  October
 2007
 

–  First
 “useable”
 HBase
 (Hadoop
 0.15.0)
 

•  January
 2008
 

–  Hadoop
 becomes
 TLP,
 HBase
 becomes
 subproject
 

•  October
 2008
 

–  HBase
 0.18.1
 released
 

•  January
 2009
 

–  HBase
 0.19.0
 

•  September
 2009
 

–  HBase
 0.20.0
 released
 (Performance
 Release)
 

•  May
 2010
 

–  HBase
 becomes
 TLP
 
 

•  June
 2010
 
 

–  HBase
 0.89.20100621,
 first
 developer
 release
 

•  Imminent…
 

–  HBase
 0.90
 release
 (any
 day
 now)
 

Current
 Project
 Status
 

•  HBase
 0.90.x
 “Advanced
 Concepts”
 

–  Master
 Rewrite
 –
 More
 Zookeeper
 

–  MulG-­‐DC
 ReplicaGon
 

–  Intra
 Row
 Scanning
 

–  Further
 opGmizaGons
 on
 algorithms
 and
 data
 

structures
 

–  DiscreGonary
 Access
 Control
 

–  Coprocessors
 

HBase
 Users
 

•  Adobe
 

•  Facebook
 

•  Mozilla
 (Socorro)
 

•  StumbleUpon
 

•  Trend
 Micro
 (Advanced
 Threat
 Research)
 

•  TwiRer
 

•  Groups
 at
 Yahoo!
 

•  Many
 startups
 with
 amazing
 services…
 

QuesGon?
 

Comparison
 with
 RDBMS
 

•  Very
 simple
 example
 use-­‐case
 

–  Please
 note:
 not
 an
 example
 of
 how
 to
 implement
 

this
 with
 HBase
 necessarily
 

•  System
 to
 store
 a
 shopping
 cart
 

–  Customers,
 Products,
 Orders
 

Simple
 SQL
 Schema
 

CREATE
 TABLE
 customers
 (
 


 
 customerid
 UUID
 PRIMARY
 KEY,
 


 
 name
 TEXT,
 email
 TEXT)
 

CREATE
 TABLE
 products
 (
 


 
 producGd
 UUID
 PRIMARY
 KEY,
 


 
 name
 TEXT,
 price
 DOUBLE)
 

CREATE
 TABLE
 orders
 (
 


 
 orderid
 UUID
 PRIMARY
 KEY,
 


 
 customerid
 UUID
 INDEXED
 REFERENCES(customers.customerid),
 
 


 
 date
 TIMESTAMP,
 total
 DOUBLE)
 

CREATE
 TABLE
 orderproducts
 (
 


 
 orderid
 UUID
 INDEXED
 REFERENCES(orders.orderid),
 


 
 producGd
 UUID
 REFERENCES(products.producGd))
 

Simple
 HBase
 Schema
 

CREATE
 TABLE
 customers
 (content,
 orders)
 

CREATE
 TABLE
 products
 (content)
 

CREATE
 TABLE
 orders
 (content,
 products)
 

Efficient
 Queries
 with
 Both
 

•  Get
 name,
 email,
 orders
 for
 customers
 

•  Get
 name,
 price
 for
 product
 

•  Get
 customer,
 stamp,
 total
 for
 order
 

•  Get
 list
 of
 products
 in
 order
 

Where
 SQL
 Makes
 Life
 Easy
 

•  Joining
 

–  In
 a
 single
 query,
 get
 all
 products
 in
 an
 order
 with
 their
 

product
 informaGon
 

•  Secondary
 Indexing
 

–  Get
 customerid
 by
 email
 

•  ReferenGal
 Integrity
 

–  DeleGng
 an
 order
 would
 delete
 links
 out
 of
 ‘orderproducts’
 

–  ID
 updates
 propagate
 

•  RealGme
 Analysis
 

–  GROUP
 BY
 and
 ORDER
 BY
 allow
 for
 simple
 staGsGcal
 

analysis
 

Where
 HBase
 Makes
 Life
 Easy
 

•  Dataset
 Scale
 

–  We
 have
 1M
 customers
 and
 100M
 products
 

–  Product
 informaGon
 includes
 large
 text
 datasheet
 or
 PDF
 files
 

–  Want
 to
 track
 every
 Gme
 a
 customer
 looks
 at
 a
 product
 page
 

•  Read/Write
 Scale
 

–  Tables
 distributed
 across
 nodes
 means
 reads/writes
 are
 fully
 

distributed
 

–  Writes
 are
 extremely
 fast
 and
 require
 no
 index
 updates
 

•  ReplicaGon
 

–  Comes
 for
 free
 

•  Batch
 Analysis
 

–  Massive
 and
 convoluted
 SQL
 queries
 executed
 serially
 become
 

efficient
 MapReduce
 jobs
 distributed
 and
 executed
 in
 parallel
 

Conclusion
 

•  For
 small
 instances
 of
 simple/straigh}orward
 

systems,
 relaGonal
 databases
 offer
 a
 much
 more
 

convenient
 way
 to
 model
 and
 access
 data
 

–  Can
 outsource
 most
 work
 to
 transacGon
 and
 query
 

engine
 

–  HBase
 will
 force
 you
 to
 pull
 complexity
 into
 

ApplicaGon
 layer
 

•  Once
 you
 need
 to
 scale,
 the
 properGes
 and
 

flexibility
 of
 HBase
 can
 relieve
 you
 from
 the
 

headaches
 associated
 with
 scaling
 an
 RDBMS
 

QuesGon?
 

HBase
 Architecture
 (cont.)
 

•  Based
 on
 Log-­‐Structured
 Merge-­‐Trees
 (LSM-­‐Trees)
 

•  Inserts
 are
 done
 in
 write-­‐ahead
 log
 first
 

•  Data
 is
 stored
 in
 memory
 and
 flushed
 to
 disk
 on
 regular
 

intervals
 or
 based
 on
 size
 

•  Small
 flushes
 are
 merged
 in
 the
 background
 to
 keep
 

number
 of
 files
 small
 

•  Reads
 read
 memory
 stores
 first
 and
 then
 disk
 based
 

files
 second
 

•  Deletes
 are
 handled
 with
 “tombstone”
 markers
 

•  Atomicity
 on
 row
 level
 no
 maRer
 how
 many
 columns
 
 

–  keeps
 locking
 model
 easy
 

HBase
 Architecture
 (cont.)
 

Write-­‐Ahead-­‐Log
 (WAL)
 Flow
 

Write-­‐Ahead-­‐Log
 (cont.)
 

HFile
 and
 KeyValue
 

Raw
 Data
 View
 

$ ./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -f file:///tmp/

hbase-larsgeorge/hbase/testtable/272a63b23bdb5fae759be5192cabc0ce/

f1/4992515006010131591 -p



K: row1/f1:/1290345071149/Put/vlen=6 V: value1

K: row2/f1:/1290345078351/Put/vlen=6 V: value2

K: row3/f1:/1290345089750/Put/vlen=6 V: value3

K: row4/f1:/1290345095724/Put/vlen=6 V: value4

K: row5/f1:c1/1290347447541/Put/vlen=6 V: value5

K: row6/f1:c2/1290347461068/Put/vlen=6 V: value6

K: row7/f1:c1/1290347581879/Put/vlen=7 V: value10

K: row7/f1:c1/1290347469553/Put/vlen=6 V: value7

K: row7/f1:c10/1290348157074/DeleteColumn/vlen=0 V:

K: row7/f1:c10/1290347625771/Put/vlen=7 V: value11

K: row7/f1:c11/1290347971849/Put/vlen=7 V: value14

K: row7/f1:c12/1290347979559/Put/vlen=7 V: value15

K: row7/f1:c13/1290347986384/Put/vlen=7 V: value16

K: row7/f1:c2/1290347569785/Put/vlen=6 V: value8

K: row7/f1:c3/1290347575521/Put/vlen=6 V: value9

K: row7/f1:c8/1290347638008/Put/vlen=7 V: value13

K: row7/f1:c9/1290347632777/Put/vlen=7 V: value12

MemStores
 

•  ACer
 data
 is
 wriRen
 to
 the
 WAL
 the
 RegionServer
 

saves
 KeyValues
 in
 memory
 store
 

•  Flush
 to
 disk
 based
 on
 size,
 see
 

hbase.hregion.memstore.flush.size
 

•  Default
 size
 is
 64MB
 

•  Uses
 snapshot
 mechanism
 to
 write
 flush
 to
 disk
 

while
 sGll
 serving
 from
 it
 and
 accepGng
 new
 data
 

at
 the
 same
 Gme
 

•  Snapshots
 are
 released
 when
 flush
 has
 succeeded
 
 

Block
 Cache
 

•  Acts
 as
 very
 large,
 in-­‐memory
 distributed
 cache
 

•  Assigned
 a
 large
 part
 of
 the
 JVM
 heap
 in
 the
 RegionServer
 process,
 

see
 hfile.block.cache.size
 

•  OpGmizes
 reads
 on
 subsequent
 columns
 and
 rows
 

•  Has
 priority
 to
 keep
 “in-­‐memory”
 column
 families
 in
 cache
 

if(inMemory) {

this.priority = BlockPriority.MEMORY;

} else {

this.priority = BlockPriority.SINGLE;

}



•  Cache
 needs
 to
 be
 used
 properly
 to
 get
 best
 read
 performance
 

–  Turn
 off
 block
 cache
 on
 operaGons
 that
 cause
 large
 churn
 

–  Store
 related
 data
 “close”
 to
 each
 other
 

•  Uses
 LRU
 cache
 with
 threaded
 (asynchronous)
 evicGons
 based
 on
 

prioriGes
 

CompacGons
 

•  General
 Concepts
 

–  Two
 types:
 Minor
 and
 Major
 CompacGons
 

–  Asynchronous
 and
 transparent
 to
 client
 

–  Manage
 file
 bloat
 from
 MemStore
 flushes
 

•  Minor
 CompacGons
 

–  Combine
 last
 “few”
 flushes
 

–  Triggered
 by
 number
 of
 storage
 files
 

•  Major
 CompacGons
 

–  Rewrite
 all
 storage
 files
 

–  Drop
 deleted
 data
 and
 those
 values
 exceeding
 TTL
 and/or
 number
 of
 versions
 
 

–  Triggered
 by
 Gme
 threshold
 

–  Cannot
 be
 scheduled
 automaGcally
 starGng
 at
 a
 specific
 Gme
 (bummer!)
 

–  May
 (most
 definitely)
 tax
 overall
 HDFS
 IO
 performance
 



Tip:
 Disable
 major
 compacGons
 and
 schedule
 to
 run
 manually
 (e.g.
 cron)
 at
 

off-­‐peak
 Gmes
 

Region
 Splits
 

•  Triggered
 by
 configured
 maximum
 file
 size
 of
 any
 

store
 file
 

–  This
 is
 checked
 directly
 aAer
 the
 compacGon
 call
 to
 

ensure
 store
 files
 are
 actually
 approaching
 the
 

threshold
 

•  Runs
 as
 asynchronous
 thread
 on
 RegionServer
 

• 
 Splits
 are
 fast
 and
 nearly
 instant
 

–  Reference
 files
 point
 to
 original
 region
 files
 and
 

represent
 each
 half
 of
 the
 split
 

•  CompacGons
 take
 care
 of
 spli~ng
 original
 files
 

into
 new
 region
 directories
 

ReplicaGon
 

QuesGon?
 

MapReduce
 with
 HBase
 

•  Framework
 to
 use
 HBase
 as
 source
 and/or
 sink
 for
 

MapReduce
 jobs
 

•  Thin
 layer
 over
 naGve
 Java
 API
 

•  Provides
 helper
 class
 to
 set
 up
 jobs
 easier
 



TableMapReduceUtil.initTableMapperJob(

“test”, scan, MyMapper.class,

ImmutableBytesWritable.class,

RowResult.class, job);





TableMapReduceUtil.initTableReducerJob(

“table”, MyReducer.class, job);

MapReduce
 with
 HBase
 (cont.)
 

•  Special
 use-­‐case
 in
 regards
 to
 Hadoop
 

•  Tables
 are
 sorted
 and
 have
 unique
 keys
 

–  OCen
 we
 do
 not
 need
 a
 Reducer
 phase
 

–  Combiner
 not
 needed
 

•  Need
 to
 make
 sure
 load
 is
 distributed
 properly
 by
 

randomizing
 keys
 (or
 use
 bulk
 import)
 

•  ParGal
 or
 full
 table
 scans
 possible
 

•  Scans
 are
 very
 efficient
 as
 they
 make
 use
 of
 block
 caches
 

–  But
 then
 make
 sure
 you
 do
 not
 create
 to
 much
 churn,
 or
 beRer
 

switch
 caching
 off
 when
 doing
 full
 table
 scans.
 

•  Can
 use
 filters
 to
 limit
 rows
 being
 processed
 
 

TableInputFormat
 

•  Transforms
 a
 HBase
 table
 into
 a
 source
 for
 

MapReduce
 jobs
 

•  Internally
 uses
 a
 TableRecordReader
 which
 

wraps
 a
 Scan
 instance
 

–  Supports
 restarts
 to
 handle
 temporary
 issues
 

•  Splits
 table
 by
 region
 boundaries
 and
 stores
 

current
 region
 locality
 
 

TableOutputFormat
 

•  Allows
 to
 use
 HBase
 table
 as
 output
 target
 

•  Put
 and
 Delete
 support
 from
 mapper
 or
 

reducer
 class
 

•  Uses
 TableOutputCommiRer
 to
 write
 data
 

•  Disables
 auto-­‐commit
 on
 table
 to
 make
 use
 of
 

client
 side
 write
 buffer
 

•  Handles
 final
 flush
 in
 close()
 

HFileOutputFormat
 

•  Used
 to
 bulk
 load
 data
 into
 HBase
 

•  Bypasses
 normal
 API
 and
 generates
 low-­‐level
 

store
 files
 

•  Prepares
 files
 for
 final
 bulk
 insert
 
 

•  Needs
 special
 handling
 of
 sort
 order
 and
 

parGGoning
 

•  Only
 supports
 one
 column
 family
 (for
 now)
 

•  Can
 load
 bulk
 updates
 into
 exisGng
 tables
 

MapReduce
 Helper
 

•  TableMapReduceUGl
 

•  IdenGtyTableMapper
 

–  Passes
 on
 key
 and
 value,
 where
 value
 is
 a
 Result
 

instance
 and
 key
 is
 set
 to
 value.getRow()
 
 

•  IdenGtyTableReducer
 

–  Stores
 values
 into
 HBase,
 must
 be
 Put
 or
 Delete
 

instances
 

•  HRegionParGGoner
 

–  Not
 set
 by
 default,
 use
 it
 to
 control
 parGoning
 on
 

Hadoop
 level
 

Custom
 MapReduce
 over
 Tables
 

•  No
 requirement
 to
 use
 provided
 framework
 

•  Can
 read
 from
 or
 write
 to
 one
 or
 many
 tables
 

in
 mapper
 and
 reducer
 

•  Can
 split
 not
 on
 regions
 but
 arbitrary
 

boundaries
 

•  Make
 sure
 to
 use
 write
 buffer
 in
 

OutputFormat
 to
 get
 best
 performance
 (do
 

not
 forget
 to
 call
 flushCommits()
 at
 the
 end!)
 

QuesGon?
 

Advanced
 Techniques
 

•  Key/Table
 Design
 

•  DDI
 

•  SalGng
 

•  Hashing
 vs.
 SequenGal
 Keys
 

•  ColumnFamily
 vs.
 Column
 

•  Using
 BloomFilter
 

•  Data
 Locality
 

•  checkAndPut()
 and
 checkAndDelete()
 

•  Coprocessors
 

Key/Table
 Design
 

•  Crucial
 to
 gain
 best
 performance
 

–  Why
 do
 I
 need
 to
 know?
 Well,
 you
 also
 need
 to
 

know
 that
 RDBMS
 is
 only
 working
 well
 when
 

columns
 are
 indexed
 and
 query
 plan
 is
 OK
 

•  Absence
 of
 secondary
 indexes
 forces
 use
 of
 

row
 key
 or
 column
 name
 sorGng
 

•  Transfer
 mulGple
 indexes
 into
 one
 

–  Generate
 large
 table
 -­‐>
 Good
 since
 fits
 

architecture
 and
 spreads
 across
 cluster
 

DDI
 

•  Stands
 for
 DenormalizaGon,
 DuplicaGon
 and
 
 

Intelligent
 Keys
 

•  Needed
 to
 overcome
 shortcomings
 of
 

architecture
 

•  DenormalizaGon
 -­‐>
 Replacement
 for
 JOINs
 

•  DuplicaGon
 -­‐>
 Design
 for
 reads
 

•  Intelligent
 Keys
 -­‐>
 Implement
 indexing
 and
 

sorGng,
 opGmize
 reads
 

Pre-­‐materialize
 Everything
 

•  Achieve
 one
 read
 per
 customer
 request
 if
 

possible
 

•  Otherwise
 keep
 at
 lowest
 number
 

•  Reads
 between
 10ms
 (cache
 miss)
 and
 1ms
 

(cache
 hit)
 

•  Use
 MapReduce
 to
 compute
 exacts
 in
 batch
 

•  Store
 and
 merge
 updates
 live
 

•  Use
 incrementColumnValue
 



MoRo:
 “Design
 for
 Reads”
 

SalGng
 

•  Prefix
 row
 keys
 to
 gain
 spread
 

•  Use
 well
 known
 or
 numbered
 prefixes
 

•  Use
 modulo
 to
 spread
 across
 servers
 

•  Enforce
 common
 data
 stay
 close
 to
 each
 other
 for
 subsequent
 

scanning
 or
 MapReduce
 processing
 

0_rowkey1, 1_rowkey2, 2_rowkey3

0_rowkey4, 1_rowkey5, 2_rowkey6
 

•  Sorted
 by
 prefix
 first
 

0_rowkey1

0_rowkey4

1_rowkey2

1_rowkey5



Hashing
 vs.
 SequenGal
 Keys
 

•  Uses
 hashes
 for
 best
 spread
 

–  Use
 for
 example
 MD5
 to
 be
 able
 to
 recreate
 key
 

•  Key
 =
 MD5(customerID)
 

–  Counter
 producGve
 for
 range
 scans
 





•  Use
 sequenGal
 keys
 for
 locality
 

–  Makes
 use
 of
 block
 caches
 

–  May
 tax
 one
 server
 overly,
 may
 be
 avoided
 by
 salGng
 

or
 spli~ng
 regions
 while
 keeping
 them
 small
 

ColumnFamily
 vs.
 Column
 

•  Use
 only
 a
 few
 column
 families
 

–  Causes
 many
 files
 that
 need
 to
 stay
 open
 per
 

region
 plus
 class
 overhead
 per
 family
 

•  Best
 used
 when
 logical
 separaGon
 between
 

data
 and
 meta
 columns
 

•  SorGng
 per
 family
 can
 be
 used
 to
 convey
 

applicaGon
 logic
 or
 access
 paRern
 

•  Define
 compression
 or
 in-­‐memory
 aRributes
 

to
 opGmize
 access
 and
 performance
 

Using
 Bloomfilters
 

•  Defines
 a
 filter
 that
 allows
 to
 determine
 if
 a
 store
 

file
 does
 not
 contain
 a
 row
 or
 column
 

•  Error
 rate
 can
 control
 overhead
 but
 is
 usually
 very
 

low,
 1%
 or
 less
 

•  Stored
 with
 each
 storage
 file
 on
 flush
 and
 

compacGons
 

•  Good
 for
 large
 regions
 with
 many
 disGnct
 row
 

keys
 and
 many
 expected
 misses
 

•  Trick:
 “OpGmize”
 compacGon
 to
 gain
 advantage
 

while
 scanning
 files
 
 

Data
 Locality
 

•  Provided
 by
 DFSClient
 

•  Transparent
 for
 Hbase
 

•  ACer
 restart,
 data
 may
 not
 be
 local
 

–  Work
 is
 done
 to
 improve
 on
 this
 

•  Over
 Gme
 and
 caused
 be
 compacGons
 data
 is
 

stored
 where
 it
 is
 needed,
 i.e.
 local
 to
 

RegionServer
 

•  Could
 enforce
 major
 compacGon
 before
 

starGng
 MapReduce
 jobs
 

checkAndPut()
 and
 checkAndDelete()
 

•  Helps
 with
 atomic
 operaGons
 on
 single
 row
 

•  Absence
 of
 value
 is
 treated
 as
 check
 for
 non-­‐

existence
 


 public boolean checkAndPut(final byte[] row,

final byte[] family, final byte[] qualifier,

final byte[] value, final Put put)



public boolean checkAndDelete(final byte[] row,

final byte[] family, final byte[] qualifier,

final byte[] value, final Delete delete)

Locks
 

•  Locks
 can
 be
 set
 explicitly
 for
 client
 operaGons
 

•  Lock
 a
 row
 from
 modificaGons
 by
 other
 clients
 

–  Clients
 block
 on
 locked
 rows
 –>
 keep
 locking
 

reasonably
 short!
 

•  Use
 HTable’s
 lockRow
 to
 acquire
 and
 unlockRow
 

to
 release
 

•  Locks
 are
 guarded
 by
 leases
 on
 RegionServer
 and
 

configured
 with
 hbase.regionserver.lease.period
 

–  By
 default
 set
 to
 60
 seconds
 

–  Leases
 are
 refreshed
 by
 any
 mutaGon
 call,
 e.g.
 get(),
 

put()
 or
 delete().
 

Coprocessors
 

•  New
 addiGon
 to
 feature
 set
 

•  Based
 on
 talk
 by
 Jeff
 Dean
 at
 LADIS
 2009
 

–  Run
 arbitrary
 code
 on
 each
 region
 in
 RegionServer
 

–  High
 level
 call
 interface
 for
 clients
 

•  Calls
 are
 addressed
 to
 rows
 or
 ranges
 of
 rows
 while
 

Coprocessors
 client
 library
 resolves
 locaGons
 

•  Calls
 to
 mulGple
 rows
 are
 atomically
 split
 

–  Provides
 model
 for
 distributed
 services
 

•  AutomaGc
 scaling,
 load
 balancing,
 request
 rouGng
 

Coprocessors
 in
 HBase
 

•  Use
 for
 efficient
 computaGonal
 parallelism
 

•  Secondary
 indexing
 (HBASE-­‐2038)
 

•  Column
 Aggregates
 (HBASE-­‐1512)
 

–  SQL-­‐like
 sum(),
 avg(),
 max(),
 min(),
 etc.
 

•  Access
 control
 (HBASE-­‐3025,
 HBASE-­‐3045)
 

–  Provide
 basic
 access
 control
 

•  Table
 Metacolumns
 

•  New
 filtering
 
 

–  predicate
 pushdown
 

•  Table/Region
 access
 staGsGcs
 

•  HLog
 extensions
 (HBASE-­‐3257)
 

Coprocessors
 in
 HBase
 

•  Java
 classes
 implemenGng
 interfaces
 

•  Load
 through
 configuraGon
 or
 table
 aRribute
 



'COPROCESSOR$1' => 'hdfs://localhost:8020/

hbase/coprocessors/test.jar:Test:1000‘



'COPROCESSOR$2' => '/hbase/coprocessors/

test2.jar:AnotherTest:1001‘
 



•  Can
 be
 chained
 like
 servlet
 filters
 

•  Dynamic
 RPC
 allows
 funcGonal
 extensibility
 

Coprocessor
 and
 RegionObserver
 

•  The
 Coprocessor
 interface
 defines
 these
 hooks
 

–  preOpen,
 postOpen:
 Called
 before
 and
 aCer
 the
 

region
 is
 reported
 as
 online
 to
 the
 master
 

–  preFlush,
 postFlush:
 Called
 before
 and
 aCer
 the
 

memstore
 is
 flushed
 into
 a
 new
 store
 file
 

–  preCompact,
 postCompact:
 Called
 before
 and
 

aCer
 compacGon
 

–  preSplit,
 postSplit:
 Called
 aCer
 the
 region
 is
 split
 

–  preClose,
 postClose:
 Called
 before
 and
 aCer
 the
 

region
 is
 reported
 as
 closed
 to
 the
 master
 

Coprocessor
 and
 RegionObserver
 

•  The
 RegionObserver
 interface
 is
 defines
 these
 hooks:
 

–  preGet,
 postGet:
 Called
 before
 and
 aCer
 a
 client
 makes
 a
 Get
 request
 

–  preExists,
 postExists:
 Called
 before
 and
 aCer
 the
 client
 tests
 for
 

existence
 using
 a
 Get
 

–  prePut,
 postPut:
 Called
 before
 and
 aCer
 the
 client
 stores
 a
 value
 

–  preDelete,
 postDelete:
 Called
 before
 and
 aCer
 the
 client
 deletes
 a
 

value
 

–  preScannerOpen,
 postScannerOpen:
 Called
 before
 and
 aCer
 the
 client
 

opens
 a
 new
 scanner
 

–  preScannerNext,
 postScannerNext:
 Called
 before
 and
 aCer
 the
 client
 

asks
 for
 the
 next
 row
 on
 a
 scanner
 

–  preScannerClose,
 postScannerClose:
 Called
 before
 and
 aCer
 the
 client
 

closes
 a
 scanner
 

–  preCheckAndPut,
 postCheckAndPut:
 Called
 before
 and
 aCer
 the
 client
 

calls
 checkAndPut()
 

–  preCheckAndDelete,
 postCheckAndDelete:
 Called
 before
 and
 aCer
 the
 

client
 calls
 checkAndDelete()
 

RegionObserver
 Call
 Sequence
 

Example
 

public class RBACCoprocessor extends BaseRegionObserver {

@Override

public List preGet(CoprocessorEnvironment e, Get get,

List results) throws CoprocessorException {



// check permissions...

if (access_not_allowed) {

throw new AccessDeniedException(

"User is not allowed to access.");

}

return results;

}



// override prePut(), preDelete(), etc.

}

Endpoint
 and
 Dynamic
 RPC
 

HBase
 and
 Indexing
 

•  Secondary
 indexing
 or
 search?
 

•  HBasene
 

–  Port
 of
 Lucandra
 

•  Nutch,
 Solr,
 Lucene
 

•  ITHBase
 and
 IHBase
 
 

–  Moved
 out
 from
 contrib
 into
 GitHub
 

•  HSearch
 

Secondary
 Index
 or
 Search?
 

•  Can
 keep
 “lookup”
 tables
 

–  But
 could
 also
 be
 in
 the
 same
 table
 

–  Could
 even
 be
 in
 the
 same
 row
 

•  Use
 ColumnFamily
 per
 index
 (but
 keep
 number
 low)
 

•  Make
 use
 of
 column
 sorGng
 

•  Does
 it
 fit
 your
 access
 paRern?
 

•  How
 to
 guarantee
 updates?
 

–  Use
 some
 sort
 of
 “transacGon”
 

•  Offer
 sorGng
 in
 one
 direcGon
 

Example:
 HBasene
 

•  Based
 on
 Lucandra
 

•  Implements
 Lucene
 API
 over
 HBase
 

•  Stores
 term
 vector
 as
 rows
 in
 a
 table
 

–  Each
 row
 is
 one
 term
 and
 the
 columns
 are
 the
 

index
 with
 value
 being
 the
 posiGon
 in
 the
 text
 

•  Document
 fields
 are
 stored
 as
 columns
 using
 

“field/term”
 combinaGons
 

•  Perform
 boolean
 operaGons
 in
 code
 

ITHBase
 and
 IHBase
 

•  Provided
 by
 contributors
 

•  May
 not
 be
 supporGng
 latest
 HBase
 release
 

•  Indexed-­‐TransacGonal
 HBase
 
 

–  Extends
 RegionServer
 code
 

–  Intrusive
 

–  Provides
 noGon
 of
 TransacGons
 over
 rows
 

–  Maintains
 lookup
 tables
 

•  Indexed
 HBase
 

–  Implemented
 by
 Powerset/MicrosoC
 

–  Support?
 

–  Intrusive
 

–  Keeps
 state
 in
 memory
 

–  Hooks
 into
 region
 operaGons
 to
 maintain
 state
 

–  Replace
 with
 Coprocessors
 (HBASE-­‐2038)
 

Custom
 Search
 Index
 

•  Facebook
 is
 using
 Cassandra
 to
 power
 inbox
 

search
 

–  150TB
 of
 data
 stored
 

–  Row
 is
 user
 inbox
 ID
 

–  Uses
 super
 columns
 to
 index
 terms
 

–  Each
 column
 is
 document
 that
 contains
 the
 term
 

•  Make
 use
 of
 parGal
 scans
 

–  Can
 be
 done
 on
 row
 and
 column
 level
 


 “Find
 email
 with
 albert*”
 

•  SorGng
 of
 columns
 allows
 for
 performance
 

opGmizaGons
 during
 term
 retrieval
 

QuesGons?
 


Share This Document


Related docs
Other docs by Lars George
HBase @ WorldLingo
Views: 8217  |  Downloads: 113
Advanced HBase
Views: 5445  |  Downloads: 182
Social Networks and the Richness of Data
Views: 91  |  Downloads: 3
My Life with HBase - FOSDEM 2010 NoSQL
Views: 18762  |  Downloads: 303
Realtime Analytics with Hadoop and HBase
Views: 1028  |  Downloads: 3
HBase at WorldLingo - Munich OpenHUG
Views: 1668  |  Downloads: 24
by registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!