Docstoc

Advanced HBase

Document Sample
Advanced HBase Powered By Docstoc
					    Advanced	
  HBase	
  
Navteq	
  Architect	
  Summit,	
  December	
  2010	
  
                 Lars	
  George	
  
             lars@cloudera.com	
  
                         About	
  Me	
  
•    SoCware	
  Engineer	
  
•    Cloudera	
  SoluGon	
  Architect	
  
•    Formerly	
  CTO	
  of	
  WorldLingo	
  
•    Scaleable	
  system	
  aficionado	
  
•    Working	
  with	
  HBase	
  since	
  end	
  of	
  2007	
  
•    Apache	
  HBase	
  CommiRer	
  (larsgeorge@apache.org)	
  
•    European	
  HBase	
  Ambassador	
  (self	
  proclaimed)	
  
                           Outline	
  
•    Why	
  HBase?	
  
•    MapReduce	
  with	
  HBase	
  
•    IntegraGon	
  with	
  Indexing	
  
•    Advanced	
  Techniques	
  
                 Why	
  Hadoop/HBase?	
  
•  Datasets	
  are	
  constantly	
  growing	
  and	
  intake	
  soars	
  
    –  Yahoo!	
  has	
  >82PB	
  and	
  >25k	
  machines	
  
    –  Facebook	
  adds	
  15TB	
  per	
  day,	
  >36PB	
  raw	
  data,	
  >2200	
  
       machines	
  
    –  Are	
  you	
  “throwing”	
  data	
  away	
  today?	
  
•  TradiGonal	
  databases	
  are	
  expensive	
  to	
  scale	
  and	
  
   inherently	
  difficult	
  to	
  distribute	
  
•  Commodity	
  hardware	
  is	
  cheap	
  and	
  powerful	
  
    –  $1000	
  buys	
  you	
  4-­‐8	
  cores/4GB/1TB	
  
    –  500GB	
  15k	
  RPM	
  SAS	
  nearly	
  $500	
  
•  Need	
  for	
  random	
  access	
  and	
  batch	
  processing	
  
    –  Hadoop	
  only	
  supports	
  batch/streaming	
  
         History	
  of	
  Hadoop/HBase	
  
•  Google	
  solved	
  its	
  scalability	
  problems	
  
   –  “The	
  Google	
  File	
  System”	
  published	
  October	
  2003	
  
       •  Hadoop	
  DFS	
  
   –  “MapReduce:	
  Simplified	
  Data	
  Processing	
  on	
  Large	
  
      Clusters”	
  published	
  December	
  2004	
  
       •  Hadoop	
  MapReduce	
  
   –  “BigTable:	
  A	
  Distributed	
  Storage	
  System	
  for	
  
      Structured	
  Data”	
  published	
  November	
  2006	
  
       •  HBase	
  
               Hadoop	
  IntroducGon	
  
•  Two	
  main	
  components	
  
   –  Hadoop	
  Distributed	
  File	
  System	
  (HDFS)	
  
       •  A	
  scalable,	
  fault-­‐tolerant,	
  high	
  performance	
  distributed	
  
          file	
  system	
  capable	
  of	
  running	
  on	
  commodity	
  hardware	
  
   –  Hadoop	
  MapReduce	
  
       •  SoCware	
  framework	
  for	
  distributed	
  computaGon	
  
•  Significant	
  adopGon	
  
                	
  
   –  Used	
  in	
  producGon	
  in	
  hundreds	
  of	
  organizaGons	
  
   –  Primary	
  contributors:	
  Yahoo!,	
  Facebook,	
  Cloudera	
  
 HDFS:	
  Hadoop	
  Distributed	
  File	
  System	
  
•  Reliably	
  store	
  petabytes	
  of	
  replicated	
  data	
  across	
  
   thousands	
  of	
  nodes	
  
    –  Data	
  divided	
  into	
  64MB	
  blocks,	
  each	
  block	
  replicated	
  
       three	
  Gmes	
  
•  Master/Slave	
  architecture	
  
    –  Master	
  NameNode	
  contains	
  block	
  locaGons	
  
    –  Slave	
  DataNode	
  manages	
  block	
  on	
  local	
  file	
  system	
  
•  Built	
  on	
  commodity	
  hardware	
  
    –  No	
  15k	
  RPM	
  disks	
  or	
  RAID	
  required	
  (nor	
  wanted!)	
  
                           HDFS	
  Example	
  
•  Store	
  1TB	
  flat	
  text	
  file	
  on	
  10	
  node	
  cluster	
  
    –  Can	
  use	
  Java	
  API	
  or	
  command	
  line	
  
         ./hadoop	
  dfs	
  -­‐put	
  ./srcFille	
  /destFile	
  
    –  File	
  split	
  into	
  64MB	
  blocks	
  (16,384	
  total)	
  
    –  Each	
  block	
  sent	
  to	
  three	
  nodes	
  (49,152	
  total,	
  3TB)	
  
    –  Has	
  noGon	
  of	
  racks	
  to	
  ensure	
  replicaGon	
  across	
  
       disGnct	
  clusters/geographic	
  locaGons	
  
    –  Build	
  in	
  check-­‐summing	
  (CRC)	
  
                              MapReduce	
  
•  Distributed	
  programming	
  model	
  to	
  reliably	
  
   process	
  petabytes	
  of	
  data	
  using	
  its	
  locality	
  
    –  Built-­‐in	
  bindings	
  for	
  Java	
  and	
  C	
  
    –  Can	
  be	
  used	
  with	
  any	
  language	
  via	
  Hadoop	
  
       Streaming	
  
•  Inspired	
  by	
  map	
  and	
  reduce	
  funcGons	
  in	
  
   funcGonal	
  programming	
  

  	
  Input	
  -­‐>	
  Map()	
  -­‐>	
  Copy/Sort	
  -­‐>	
  Reduce()	
  -­‐>	
  Output	
  
                 MapReduce	
  Example	
  
•  Perform	
  “word	
  count”	
  on	
  1TB	
  file	
  in	
  HDFS	
  
    –  Map	
  task	
  launched	
  for	
  each	
  block	
  of	
  file	
  
    –  Within	
  each	
  task,	
  Map	
  funcGon	
  called	
  for	
  each	
  
       line:	
  Map(LineNumber,	
  LineString)	
  
        •  For	
  each	
  word	
  in	
  LineString	
  -­‐>	
  Output(Word,	
  1)	
  
    –  Map	
  output	
  is	
  sorted,	
  grouped	
  and	
  copied	
  to	
  
       reducer	
  
    –  Reduce(Word,	
  List)	
  called	
  for	
  each	
  word	
  	
  
        •  Output(Word,	
  Length(List))	
  
    –  Final	
  output	
  contains	
  total	
  count	
  for	
  each	
  word	
  
                                  Hadoop…	
  
•  …	
  is	
  designed	
  to	
  store	
  and	
  stream	
  extremely	
  large	
  
   datasets	
  in	
  batch	
  
•  …	
  is	
  not	
  intended	
  for	
  realCme	
  querying	
  
•  …	
  does	
  not	
  support	
  random	
  access	
  
•  …	
  does	
  not	
  handle	
  billions	
  of	
  small	
  files	
  well	
  
    –  Less	
  than	
  default	
  block	
  size	
  of	
  64MB	
  and	
  smaller	
  
    –  Keeps	
  “inodes”	
  in	
  memory	
  on	
  master	
  
•  …	
  is	
  not	
  supporGng	
  structured	
  data	
  more	
  than	
  
   unstructured	
  or	
  complex	
  data	
  

                        That	
  is	
  why	
  we	
  have	
  HBase!	
  
                                   Why	
  HBase?	
  
•  QuesGon:	
  Why	
  HBase	
  and	
  not	
  <put-­‐your-­‐favorite-­‐nosql-­‐
   soluCon-­‐here>?	
  
•  What	
  else	
  is	
  there?	
  
     –    Key/value	
  stores	
  	
  
     –    Document-­‐oriented	
  stores	
  
     –    Column-­‐oriented	
  stores	
  
     –    Graph-­‐oriented	
  stores	
  
•  Features	
  to	
  ask	
  for	
  
     –    In	
  memory	
  or	
  persistent?	
  
     –    Strict	
  or	
  eventual	
  consistency?	
  
     –    Distributed	
  or	
  single	
  machine	
  (or	
  aCerthought)?	
  
     –    Designed	
  for	
  read	
  and/or	
  write	
  speeds?	
  
     –    How	
  does	
  it	
  scale?	
  (if	
  that	
  is	
  what	
  you	
  need)	
  
                            Key/Value	
  Stores	
  
•  Choices	
  (a	
  small	
  selecGon)	
  
     –    MemCached,	
  	
  
     –    Tokyo	
  Cabinet,	
  MemCacheDB,	
  Membase,	
  Redis	
  
     –    Voldemort,	
  Dynomite,	
  Scalaris	
  
     –    Dynamo,	
  Dynomite	
  
     –    Berkeley	
  DB	
  

•  Pros	
  
     –  Used	
  as	
  caches	
  
     –  Simple	
  APIs	
  
     –  Fast	
  
•  Cons	
  
     –  Keys	
  must	
  be	
  known	
  (or	
  recomputed)	
  
     –  Scale	
  only	
  with	
  manual	
  intervenGon	
  (consistent	
  hashing	
  etc.)	
  
     –  Cannot	
  represent	
  structured	
  data	
  
                             Document	
  Stores	
  
•  More	
  choices	
  
     –  MongoDB	
  
     –  CouchDB	
  

•  Pros	
  
     –    Structured	
  data	
  supported	
  
     –    Schema	
  free	
  
     –    Supports	
  changes	
  to	
  documents	
  without	
  reconfiguraGon	
  
     –    May	
  support	
  secondary	
  indexes	
  and/or	
  search	
  
•  Cons	
  
     –  Everything	
  is	
  stored	
  in	
  the	
  same	
  place,	
  does	
  not	
  work	
  well	
  with	
  
        heterogeneous	
  payloads	
  
     –  Scalability	
  is	
  either	
  not	
  proven	
  or	
  similar	
  to	
  RDBMS	
  models	
  
     –  Not	
  well	
  integrated	
  with	
  MapReduce	
  (no	
  block	
  loads	
  or	
  locality	
  
        advantages)	
  
              Column-­‐Oriented	
  Stores	
  
•  Hybrid	
  architectures	
  
    –  HBase,	
  BigTable	
  
    –  Cassandra	
  
    –  VerGca,	
  C-­‐Store	
  

•  Pros	
  
    –  Allow	
  access	
  to	
  only	
  relevant	
  data	
  
•  Cons	
  
    –  Limit	
  funcGonality	
  to	
  fit	
  model	
  
             Which	
  One	
  To	
  Choose?	
  
•  Key/value	
  stores	
  
   –  Caches	
  
   –  Simple	
  data	
  
   –  Need	
  for	
  speed	
  
•  Document	
  stores	
  
   –  Evolving	
  schemas	
  
   –  Higher	
  level	
  document	
  related	
  features	
  	
  
•  Column-­‐oriented	
  stores	
  
   –  Scalability	
  
   –  Mixture	
  of	
  payloads	
  
              Which	
  One	
  To	
  Choose?	
  
•  In	
  Memory	
  or	
  On-­‐Disk	
  
    –  Cache	
  or	
  Database	
  
•  Strict	
  consistency	
  
    –  Easy	
  to	
  handle	
  on	
  ApplicaGon	
  level	
  
    –  Content-­‐management	
  systems,	
  banking	
  etc.	
  
•  Eventual	
  consistency	
  
    –  Higher	
  availability	
  but	
  may	
  read	
  stale	
  data	
  
    –  Deal	
  with	
  conflict	
  resoluGon	
  and	
  repairs	
  in	
  your	
  code	
  
    –  Shopping	
  carts,	
  Gaming	
  
                          What	
  is	
  HBase?	
  
•    Distributed	
  
•    Column-­‐Oriented	
  
•    MulG-­‐Dimensional	
  
•    High-­‐Availability	
  (CAP	
  anyone?)	
  
•    High-­‐Performance	
  
•    Storage	
  System	
  
                                       Project	
  Goals	
  
 Billions	
  of	
  Rows	
  *	
  Millions	
  of	
  Columns	
  *	
  Thousands	
  of	
  Versions	
  
        Petabytes	
  across	
  thousands	
  of	
  commodity	
  servers	
  
                           HBase	
  is	
  not…	
  
•  A	
  SQL	
  Database	
  
    –  No	
  joins,	
  no	
  query	
  engine,	
  no	
  types,	
  no	
  SQL	
  
    –  TransacGons	
  and	
  secondary	
  indexes	
  only	
  as	
  add-­‐ons	
  but	
  
       immature	
  
•  A	
  drop-­‐in	
  replacement	
  for	
  your	
  RDBMS	
  
•  You	
  must	
  be	
  OK	
  with	
  RDBMS	
  anC-­‐schema	
  
    –  Denormalized	
  data	
  
    –  Wide	
  and	
  sparsely	
  populated	
  tables	
  
    –  Just	
  say	
  “no”	
  to	
  your	
  inner	
  DBA	
  

                       Keyword:	
  Impedance	
  Match	
  	
  
                  HBase	
  Architecture	
  
•  Table	
  is	
  made	
  up	
  of	
  any	
  number	
  if	
  regions	
  
•  Region	
  is	
  specified	
  by	
  its	
  startKey	
  and	
  endKey	
  
    –  Empty	
  table:	
  (Table,	
  NULL,	
  NULL)	
  
    –  Two-­‐region	
  table:	
  (Table,	
  NULL,	
  
       “com.cloudera.www”)	
  and	
  (Table,	
  
       “com.cloudera.www”,	
  NULL)	
  
•  Each	
  region	
  may	
  live	
  on	
  a	
  different	
  node	
  and	
  is	
  
   made	
  up	
  of	
  several	
  HDFS	
  files	
  and	
  blocks,	
  each	
  
   of	
  which	
  is	
  replicated	
  by	
  Hadoop	
  
         HBase	
  Architecture	
  (cont.)	
  
•  Two	
  types	
  of	
  HBase	
  nodes:	
  
  	
  	
   	
  Master	
  and	
  RegionServer	
  
•  Special	
  tables	
  -­‐ROOT-­‐	
  and.META.	
  store	
  schema	
  
     informaGon	
  and	
  region	
  locaGons	
  
•  Master	
  server	
  responsible	
  for	
  RegionServer	
  
     monitoring	
  as	
  well	
  as	
  assignment	
  and	
  load	
  
     balancing	
  of	
  regions	
  
•  Uses	
  ZooKeeper	
  as	
  its	
  distributed	
  coordinaGon	
  
     service	
  
    –  Manages	
  Master	
  elecGon	
  and	
  server	
  availability	
  
                          HBase	
  Tables	
  
•  Tables	
  are	
  sorted	
  by	
  Row	
  in	
  lexicographical	
  order	
  
•  Table	
  schema	
  only	
  defines	
  its	
  column	
  families	
  
    –  Each	
  family	
  consists	
  of	
  any	
  number	
  of	
  columns	
  
    –  Each	
  column	
  consists	
  of	
  any	
  number	
  of	
  versions	
  
    –  Columns	
  only	
  exist	
  when	
  inserted,	
  NULLs	
  are	
  free	
  
    –  Columns	
  within	
  a	
  family	
  are	
  sorted	
  and	
  stored	
  
       together	
  
    –  Everything	
  except	
  table	
  names	
  are	
  byte[]	
  

  (Table,	
  Row,	
  Family:Column,	
  Timestamp)	
  -­‐>	
  Value	
  
     HBase	
  Table	
  as	
  Data	
  Structures	
  
         SortedMap(	
  
            	
  RowKey,	
  List(	
  
            	
   	
   	
  SortedMap(	
  
            	
   	
   	
   	
  Column,	
  List(	
  
            	
   	
   	
   	
   	
  Value,	
  Timestamp	
  
            	
   	
   	
   	
  )	
  
            	
   	
   	
  )	
  
            	
   	
  )	
  
         )	
  
SortedMap(RowKey,	
  List(SortedMap(Column,	
  List(Value,	
  Timestamp))))	
  
                  Web	
  Crawl	
  Example	
  
•  Canonical	
  use-­‐case	
  for	
  BigTable	
  
•  Store	
  web	
  crawl	
  data	
  
    –  Table	
  webtable	
  with	
  family	
  content	
  and	
  meta	
  
    –  Row	
  is	
  reversed	
  URL	
  with	
  Columns	
  
        •  content:data	
  stores	
  the	
  raw	
  crawled	
  data	
  
        •  meta:language	
  stores	
  hRp	
  language	
  header	
  
        •  meta:type	
  stores	
  hRp	
  content-­‐type	
  header	
  
    –  While	
  processing	
  raw	
  data	
  for	
  hyperlinks	
  and	
  images,	
  
       add	
  families	
  links	
  and	
  images	
  
        •  links:<rurl>	
  column	
  for	
  each	
  hyperlink	
  
        •  images:<rurl>	
  column	
  for	
  each	
  image	
  
                                HBase	
  Clients	
  
•  NaGve	
  Java	
  Client/API	
  
     –  get(Get	
  get),	
  put(Put	
  put),	
  delete(Delete	
  delete)	
  
     –  getScanner(Scan	
  scan)	
  
•  Non-­‐Java	
  Clients	
  
     –    REST	
  server	
  
     –    Avro	
  server	
  
     –    ThriC	
  server	
  
     –    Jython,	
  Scala,	
  Groovy	
  DSL	
  
•  TableInputFormat/TableOutputFormat	
  for	
  MapReduce	
  
     –  HBase	
  as	
  MapReduce	
  source	
  and/or	
  target	
  
•  HBase	
  Shell	
  
     –  JRuby	
  shell	
  adding	
  get,	
  put,	
  scan	
  and	
  admin	
  calls	
  
                    HBase	
  Extensions	
  
•  Hive,	
  Pig,	
  Cascading	
  
    –  Hadoop-­‐targeted	
  MapReduce	
  tools	
  with	
  HBase	
  
       integraGon	
  
•  Sqoop	
  
    –  Read	
  and	
  write	
  to	
  HBase	
  for	
  further	
  processing	
  in	
  
       Hadoop	
  
•  HBase	
  Explorer,	
  Nutch,	
  Heretrix	
  
•  SpringData?	
  (volunteers?)	
  
•  Karmasphere?	
  
                                      History	
  of	
  HBase	
  
•    November	
  2006	
  
       –  Google	
  releases	
  paper	
  on	
  BigTable	
  
•    February	
  2007	
  
       –  IniGal	
  HBase	
  prototype	
  created	
  as	
  Hadoop	
  contrib	
  
•    October	
  2007	
  
       –  First	
  “useable”	
  HBase	
  (Hadoop	
  0.15.0)	
  
•    January	
  2008	
  
       –  Hadoop	
  becomes	
  TLP,	
  HBase	
  becomes	
  subproject	
  
•    October	
  2008	
  
       –  HBase	
  0.18.1	
  released	
  
•    January	
  2009	
  
       –  HBase	
  0.19.0	
  
•    September	
  2009	
  
       –  HBase	
  0.20.0	
  released	
  (Performance	
  Release)	
  
•    May	
  2010	
  
       –  HBase	
  becomes	
  TLP	
  	
  
•    June	
  2010	
  	
  
       –  HBase	
  0.89.20100621,	
  first	
  developer	
  release	
  
•    Imminent…	
  
       –  HBase	
  0.90	
  release	
  (any	
  day	
  now)	
  
            Current	
  Project	
  Status	
  
•  HBase	
  0.90.x	
  “Advanced	
  Concepts”	
  
   –  Master	
  Rewrite	
  –	
  More	
  Zookeeper	
  
   –  MulG-­‐DC	
  ReplicaGon	
  
   –  Intra	
  Row	
  Scanning	
  
   –  Further	
  opGmizaGons	
  on	
  algorithms	
  and	
  data	
  
      structures	
  
   –  DiscreGonary	
  Access	
  Control	
  
   –  Coprocessors	
  
                        HBase	
  Users	
  
•    Adobe	
  
•    Facebook	
  
•    Mozilla	
  (Socorro)	
  
•    StumbleUpon	
  
•    Trend	
  Micro	
  (Advanced	
  Threat	
  Research)	
  
•    TwiRer	
  
•    Groups	
  at	
  Yahoo!	
  
•    Many	
  startups	
  with	
  amazing	
  services…	
  
QuesGon?	
  
           Comparison	
  with	
  RDBMS	
  
•  Very	
  simple	
  example	
  use-­‐case	
  
    –  Please	
  note:	
  not	
  an	
  example	
  of	
  how	
  to	
  implement	
  
       this	
  with	
  HBase	
  necessarily	
  
•  System	
  to	
  store	
  a	
  shopping	
  cart	
  
    –  Customers,	
  Products,	
  Orders	
  
                   Simple	
  SQL	
  Schema	
  
CREATE	
  TABLE	
  customers	
  (	
  
  	
   	
  customerid	
  UUID	
  PRIMARY	
  KEY,	
  
  	
   	
  name	
  TEXT,	
  email	
  TEXT)	
  
CREATE	
  TABLE	
  products	
  (	
  
  	
   	
  producGd	
  UUID	
  PRIMARY	
  KEY,	
  
  	
   	
  name	
  TEXT,	
  price	
  DOUBLE)	
  
CREATE	
  TABLE	
  orders	
  (	
  
  	
   	
  orderid	
  UUID	
  PRIMARY	
  KEY,	
  
  	
   	
  customerid	
  UUID	
  INDEXED	
  REFERENCES(customers.customerid),	
  	
  
  	
   	
  date	
  TIMESTAMP,	
  total	
  DOUBLE)	
  
CREATE	
  TABLE	
  orderproducts	
  (	
  
  	
   	
  orderid	
  UUID	
  INDEXED	
  REFERENCES(orders.orderid),	
  
  	
   	
  producGd	
  UUID	
  REFERENCES(products.producGd))	
  
            Simple	
  HBase	
  Schema	
  
CREATE	
  TABLE	
  customers	
  (content,	
  orders)	
  
CREATE	
  TABLE	
  products	
  (content)	
  
CREATE	
  TABLE	
  orders	
  (content,	
  products)	
  
          Efficient	
  Queries	
  with	
  Both	
  
•    Get	
  name,	
  email,	
  orders	
  for	
  customers	
  
•    Get	
  name,	
  price	
  for	
  product	
  
•    Get	
  customer,	
  stamp,	
  total	
  for	
  order	
  
•    Get	
  list	
  of	
  products	
  in	
  order	
  
          Where	
  SQL	
  Makes	
  Life	
  Easy	
  
•  Joining	
  
    –  In	
  a	
  single	
  query,	
  get	
  all	
  products	
  in	
  an	
  order	
  with	
  their	
  
       product	
  informaGon	
  
•  Secondary	
  Indexing	
  
    –  Get	
  customerid	
  by	
  email	
  
•  ReferenGal	
  Integrity	
  
    –  DeleGng	
  an	
  order	
  would	
  delete	
  links	
  out	
  of	
  ‘orderproducts’	
  
    –  ID	
  updates	
  propagate	
  
•  RealGme	
  Analysis	
  
    –  GROUP	
  BY	
  and	
  ORDER	
  BY	
  allow	
  for	
  simple	
  staGsGcal	
  
       analysis	
  
       Where	
  HBase	
  Makes	
  Life	
  Easy	
  
•  Dataset	
  Scale	
  
    –  We	
  have	
  1M	
  customers	
  and	
  100M	
  products	
  
    –  Product	
  informaGon	
  includes	
  large	
  text	
  datasheet	
  or	
  PDF	
  files	
  
    –  Want	
  to	
  track	
  every	
  Gme	
  a	
  customer	
  looks	
  at	
  a	
  product	
  page	
  
•  Read/Write	
  Scale	
  
    –  Tables	
  distributed	
  across	
  nodes	
  means	
  reads/writes	
  are	
  fully	
  
       distributed	
  
    –  Writes	
  are	
  extremely	
  fast	
  and	
  require	
  no	
  index	
  updates	
  
•  ReplicaGon	
  
    –  Comes	
  for	
  free	
  
•  Batch	
  Analysis	
  
    –  Massive	
  and	
  convoluted	
  SQL	
  queries	
  executed	
  serially	
  become	
  
       efficient	
  MapReduce	
  jobs	
  distributed	
  and	
  executed	
  in	
  parallel	
  
                            Conclusion	
  
•  For	
  small	
  instances	
  of	
  simple/straigh}orward	
  
   systems,	
  relaGonal	
  databases	
  offer	
  a	
  much	
  more	
  
   convenient	
  way	
  to	
  model	
  and	
  access	
  data	
  
    –  Can	
  outsource	
  most	
  work	
  to	
  transacGon	
  and	
  query	
  
       engine	
  
    –  HBase	
  will	
  force	
  you	
  to	
  pull	
  complexity	
  into	
  
       ApplicaGon	
  layer	
  
•  Once	
  you	
  need	
  to	
  scale,	
  the	
  properGes	
  and	
  
   flexibility	
  of	
  HBase	
  can	
  relieve	
  you	
  from	
  the	
  
   headaches	
  associated	
  with	
  scaling	
  an	
  RDBMS	
  
QuesGon?	
  
           HBase	
  Architecture	
  (cont.)	
  
•  Based	
  on	
  Log-­‐Structured	
  Merge-­‐Trees	
  (LSM-­‐Trees)	
  
•  Inserts	
  are	
  done	
  in	
  write-­‐ahead	
  log	
  first	
  
•  Data	
  is	
  stored	
  in	
  memory	
  and	
  flushed	
  to	
  disk	
  on	
  regular	
  
   intervals	
  or	
  based	
  on	
  size	
  
•  Small	
  flushes	
  are	
  merged	
  in	
  the	
  background	
  to	
  keep	
  
   number	
  of	
  files	
  small	
  
•  Reads	
  read	
  memory	
  stores	
  first	
  and	
  then	
  disk	
  based	
  
   files	
  second	
  
•  Deletes	
  are	
  handled	
  with	
  “tombstone”	
  markers	
  
•  Atomicity	
  on	
  row	
  level	
  no	
  maRer	
  how	
  many	
  columns	
  	
  
     –  keeps	
  locking	
  model	
  easy	
  
HBase	
  Architecture	
  (cont.)	
  
Write-­‐Ahead-­‐Log	
  (WAL)	
  Flow	
  
Write-­‐Ahead-­‐Log	
  (cont.)	
  
HFile	
  and	
  KeyValue	
  
                      Raw	
  Data	
  View	
  
$ ./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -f file:///tmp/
   hbase-larsgeorge/hbase/testtable/272a63b23bdb5fae759be5192cabc0ce/
   f1/4992515006010131591 -p

K:   row1/f1:/1290345071149/Put/vlen=6 V: value1
K:   row2/f1:/1290345078351/Put/vlen=6 V: value2
K:   row3/f1:/1290345089750/Put/vlen=6 V: value3
K:   row4/f1:/1290345095724/Put/vlen=6 V: value4
K:   row5/f1:c1/1290347447541/Put/vlen=6 V: value5
K:   row6/f1:c2/1290347461068/Put/vlen=6 V: value6
K:   row7/f1:c1/1290347581879/Put/vlen=7 V: value10
K:   row7/f1:c1/1290347469553/Put/vlen=6 V: value7
K:   row7/f1:c10/1290348157074/DeleteColumn/vlen=0 V:
K:   row7/f1:c10/1290347625771/Put/vlen=7 V: value11
K:   row7/f1:c11/1290347971849/Put/vlen=7 V: value14
K:   row7/f1:c12/1290347979559/Put/vlen=7 V: value15
K:   row7/f1:c13/1290347986384/Put/vlen=7 V: value16
K:   row7/f1:c2/1290347569785/Put/vlen=6 V: value8
K:   row7/f1:c3/1290347575521/Put/vlen=6 V: value9
K:   row7/f1:c8/1290347638008/Put/vlen=7 V: value13
K:   row7/f1:c9/1290347632777/Put/vlen=7 V: value12
                          MemStores	
  
•  ACer	
  data	
  is	
  wriRen	
  to	
  the	
  WAL	
  the	
  RegionServer	
  
   saves	
  KeyValues	
  in	
  memory	
  store	
  
•  Flush	
  to	
  disk	
  based	
  on	
  size,	
  see	
  
   hbase.hregion.memstore.flush.size	
  
•  Default	
  size	
  is	
  64MB	
  
•  Uses	
  snapshot	
  mechanism	
  to	
  write	
  flush	
  to	
  disk	
  
   while	
  sGll	
  serving	
  from	
  it	
  and	
  accepGng	
  new	
  data	
  
   at	
  the	
  same	
  Gme	
  
•  Snapshots	
  are	
  released	
  when	
  flush	
  has	
  succeeded	
  	
  
                                     Block	
  Cache	
  
•  Acts	
  as	
  very	
  large,	
  in-­‐memory	
  distributed	
  cache	
  
•  Assigned	
  a	
  large	
  part	
  of	
  the	
  JVM	
  heap	
  in	
  the	
  RegionServer	
  process,	
  
   see	
  hfile.block.cache.size	
  
•  OpGmizes	
  reads	
  on	
  subsequent	
  columns	
  and	
  rows	
  
•  Has	
  priority	
  to	
  keep	
  “in-­‐memory”	
  column	
  families	
  in	
  cache	
  
      if(inMemory) {
         this.priority = BlockPriority.MEMORY;
      } else {
         this.priority = BlockPriority.SINGLE;
      }

•  Cache	
  needs	
  to	
  be	
  used	
  properly	
  to	
  get	
  best	
  read	
  performance	
  
      –  Turn	
  off	
  block	
  cache	
  on	
  operaGons	
  that	
  cause	
  large	
  churn	
  
      –  Store	
  related	
  data	
  “close”	
  to	
  each	
  other	
  
•  Uses	
  LRU	
  cache	
  with	
  threaded	
  (asynchronous)	
  evicGons	
  based	
  on	
  
   prioriGes	
  
                                       CompacGons	
  
•  General	
  Concepts	
  
      –  Two	
  types:	
  Minor	
  and	
  Major	
  CompacGons	
  
      –  Asynchronous	
  and	
  transparent	
  to	
  client	
  
      –  Manage	
  file	
  bloat	
  from	
  MemStore	
  flushes	
  
•  Minor	
  CompacGons	
  
      –  Combine	
  last	
  “few”	
  flushes	
  
      –  Triggered	
  by	
  number	
  of	
  storage	
  files	
  
•  Major	
  CompacGons	
  
      –    Rewrite	
  all	
  storage	
  files	
  
      –    Drop	
  deleted	
  data	
  and	
  those	
  values	
  exceeding	
  TTL	
  and/or	
  number	
  of	
  versions	
  	
  
      –    Triggered	
  by	
  Gme	
  threshold	
  
      –    Cannot	
  be	
  scheduled	
  automaGcally	
  starGng	
  at	
  a	
  specific	
  Gme	
  (bummer!)	
  
      –    May	
  (most	
  definitely)	
  tax	
  overall	
  HDFS	
  IO	
  performance	
  

Tip:	
  Disable	
  major	
  compacGons	
  and	
  schedule	
  to	
  run	
  manually	
  (e.g.	
  cron)	
  at	
  
   off-­‐peak	
  Gmes	
  
                            Region	
  Splits	
  
•  Triggered	
  by	
  configured	
  maximum	
  file	
  size	
  of	
  any	
  
   store	
  file	
  
    –  This	
  is	
  checked	
  directly	
  aAer	
  the	
  compacGon	
  call	
  to	
  
       ensure	
  store	
  files	
  are	
  actually	
  approaching	
  the	
  
       threshold	
  
•  Runs	
  as	
  asynchronous	
  thread	
  on	
  RegionServer	
  
•  	
  Splits	
  are	
  fast	
  and	
  nearly	
  instant	
  
    –  Reference	
  files	
  point	
  to	
  original	
  region	
  files	
  and	
  
       represent	
  each	
  half	
  of	
  the	
  split	
  
•  CompacGons	
  take	
  care	
  of	
  spli~ng	
  original	
  files	
  
   into	
  new	
  region	
  directories	
  
ReplicaGon	
  
QuesGon?	
  
              MapReduce	
  with	
  HBase	
  
•  Framework	
  to	
  use	
  HBase	
  as	
  source	
  and/or	
  sink	
  for	
  
   MapReduce	
  jobs	
  
•  Thin	
  layer	
  over	
  naGve	
  Java	
  API	
  
•  Provides	
  helper	
  class	
  to	
  set	
  up	
  jobs	
  easier	
  

    TableMapReduceUtil.initTableMapperJob(
       “test”, scan, MyMapper.class,
       ImmutableBytesWritable.class,
       RowResult.class, job);


    TableMapReduceUtil.initTableReducerJob(
       “table”, MyReducer.class, job);
      MapReduce	
  with	
  HBase	
  (cont.)	
  
•  Special	
  use-­‐case	
  in	
  regards	
  to	
  Hadoop	
  
•  Tables	
  are	
  sorted	
  and	
  have	
  unique	
  keys	
  
     –  OCen	
  we	
  do	
  not	
  need	
  a	
  Reducer	
  phase	
  
     –  Combiner	
  not	
  needed	
  
•  Need	
  to	
  make	
  sure	
  load	
  is	
  distributed	
  properly	
  by	
  
   randomizing	
  keys	
  (or	
  use	
  bulk	
  import)	
  
•  ParGal	
  or	
  full	
  table	
  scans	
  possible	
  
•  Scans	
  are	
  very	
  efficient	
  as	
  they	
  make	
  use	
  of	
  block	
  caches	
  
     –  But	
  then	
  make	
  sure	
  you	
  do	
  not	
  create	
  to	
  much	
  churn,	
  or	
  beRer	
  
        switch	
  caching	
  off	
  when	
  doing	
  full	
  table	
  scans.	
  
•  Can	
  use	
  filters	
  to	
  limit	
  rows	
  being	
  processed	
  	
  
                  TableInputFormat	
  
•  Transforms	
  a	
  HBase	
  table	
  into	
  a	
  source	
  for	
  
   MapReduce	
  jobs	
  
•  Internally	
  uses	
  a	
  TableRecordReader	
  which	
  
   wraps	
  a	
  Scan	
  instance	
  
    –  Supports	
  restarts	
  to	
  handle	
  temporary	
  issues	
  
•  Splits	
  table	
  by	
  region	
  boundaries	
  and	
  stores	
  
   current	
  region	
  locality	
  	
  
                TableOutputFormat	
  
•  Allows	
  to	
  use	
  HBase	
  table	
  as	
  output	
  target	
  
•  Put	
  and	
  Delete	
  support	
  from	
  mapper	
  or	
  
   reducer	
  class	
  
•  Uses	
  TableOutputCommiRer	
  to	
  write	
  data	
  
•  Disables	
  auto-­‐commit	
  on	
  table	
  to	
  make	
  use	
  of	
  
   client	
  side	
  write	
  buffer	
  
•  Handles	
  final	
  flush	
  in	
  close()	
  
               HFileOutputFormat	
  
•  Used	
  to	
  bulk	
  load	
  data	
  into	
  HBase	
  
•  Bypasses	
  normal	
  API	
  and	
  generates	
  low-­‐level	
  
   store	
  files	
  
•  Prepares	
  files	
  for	
  final	
  bulk	
  insert	
  	
  
•  Needs	
  special	
  handling	
  of	
  sort	
  order	
  and	
  
   parGGoning	
  
•  Only	
  supports	
  one	
  column	
  family	
  (for	
  now)	
  
•  Can	
  load	
  bulk	
  updates	
  into	
  exisGng	
  tables	
  
                   MapReduce	
  Helper	
  
•  TableMapReduceUGl	
  
•  IdenGtyTableMapper	
  
   –  Passes	
  on	
  key	
  and	
  value,	
  where	
  value	
  is	
  a	
  Result	
  
      instance	
  and	
  key	
  is	
  set	
  to	
  value.getRow()	
  	
  
•  IdenGtyTableReducer	
  
   –  Stores	
  values	
  into	
  HBase,	
  must	
  be	
  Put	
  or	
  Delete	
  
      instances	
  
•  HRegionParGGoner	
  
   –  Not	
  set	
  by	
  default,	
  use	
  it	
  to	
  control	
  parGoning	
  on	
  
      Hadoop	
  level	
  
   Custom	
  MapReduce	
  over	
  Tables	
  
•  No	
  requirement	
  to	
  use	
  provided	
  framework	
  
•  Can	
  read	
  from	
  or	
  write	
  to	
  one	
  or	
  many	
  tables	
  
   in	
  mapper	
  and	
  reducer	
  
•  Can	
  split	
  not	
  on	
  regions	
  but	
  arbitrary	
  
   boundaries	
  
•  Make	
  sure	
  to	
  use	
  write	
  buffer	
  in	
  
   OutputFormat	
  to	
  get	
  best	
  performance	
  (do	
  
   not	
  forget	
  to	
  call	
  flushCommits()	
  at	
  the	
  end!)	
  
QuesGon?	
  
              Advanced	
  Techniques	
  
•    Key/Table	
  Design	
  
•    DDI	
  
•    SalGng	
  
•    Hashing	
  vs.	
  SequenGal	
  Keys	
  
•    ColumnFamily	
  vs.	
  Column	
  
•    Using	
  BloomFilter	
  
•    Data	
  Locality	
  
•    checkAndPut()	
  and	
  checkAndDelete()	
  
•    Coprocessors	
  
                      Key/Table	
  Design	
  
•  Crucial	
  to	
  gain	
  best	
  performance	
  
    –  Why	
  do	
  I	
  need	
  to	
  know?	
  Well,	
  you	
  also	
  need	
  to	
  
       know	
  that	
  RDBMS	
  is	
  only	
  working	
  well	
  when	
  
       columns	
  are	
  indexed	
  and	
  query	
  plan	
  is	
  OK	
  
•  Absence	
  of	
  secondary	
  indexes	
  forces	
  use	
  of	
  
   row	
  key	
  or	
  column	
  name	
  sorGng	
  
•  Transfer	
  mulGple	
  indexes	
  into	
  one	
  
    –  Generate	
  large	
  table	
  -­‐>	
  Good	
  since	
  fits	
  
       architecture	
  and	
  spreads	
  across	
  cluster	
  
                              DDI	
  
•  Stands	
  for	
  DenormalizaGon,	
  DuplicaGon	
  and	
  	
  
   Intelligent	
  Keys	
  
•  Needed	
  to	
  overcome	
  shortcomings	
  of	
  
   architecture	
  
•  DenormalizaGon	
  -­‐>	
  Replacement	
  for	
  JOINs	
  
•  DuplicaGon	
  -­‐>	
  Design	
  for	
  reads	
  
•  Intelligent	
  Keys	
  -­‐>	
  Implement	
  indexing	
  and	
  
   sorGng,	
  opGmize	
  reads	
  
         Pre-­‐materialize	
  Everything	
  
•  Achieve	
  one	
  read	
  per	
  customer	
  request	
  if	
  
   possible	
  
•  Otherwise	
  keep	
  at	
  lowest	
  number	
  
•  Reads	
  between	
  10ms	
  (cache	
  miss)	
  and	
  1ms	
  
   (cache	
  hit)	
  
•  Use	
  MapReduce	
  to	
  compute	
  exacts	
  in	
  batch	
  
•  Store	
  and	
  merge	
  updates	
  live	
  
•  Use	
  incrementColumnValue	
  

                 MoRo:	
  “Design	
  for	
  Reads”	
  
                                          SalGng	
  
•    Prefix	
  row	
  keys	
  to	
  gain	
  spread	
  
•    Use	
  well	
  known	
  or	
  numbered	
  prefixes	
  
•    Use	
  modulo	
  to	
  spread	
  across	
  servers	
  
•    Enforce	
  common	
  data	
  stay	
  close	
  to	
  each	
  other	
  for	
  subsequent	
  
     scanning	
  or	
  MapReduce	
  processing	
  
      0_rowkey1, 1_rowkey2, 2_rowkey3
      0_rowkey4, 1_rowkey5, 2_rowkey6	
  
•  Sorted	
  by	
  prefix	
  first	
  
      0_rowkey1
      0_rowkey4
      1_rowkey2
      1_rowkey5
      …
         Hashing	
  vs.	
  SequenGal	
  Keys	
  
•  Uses	
  hashes	
  for	
  best	
  spread	
  
    –  Use	
  for	
  example	
  MD5	
  to	
  be	
  able	
  to	
  recreate	
  key	
  
         •  Key	
  =	
  MD5(customerID)	
  
    –  Counter	
  producGve	
  for	
  range	
  scans	
  


•  Use	
  sequenGal	
  keys	
  for	
  locality	
  
    –  Makes	
  use	
  of	
  block	
  caches	
  
    –  May	
  tax	
  one	
  server	
  overly,	
  may	
  be	
  avoided	
  by	
  salGng	
  
       or	
  spli~ng	
  regions	
  while	
  keeping	
  them	
  small	
  
           ColumnFamily	
  vs.	
  Column	
  
•  Use	
  only	
  a	
  few	
  column	
  families	
  
    –  Causes	
  many	
  files	
  that	
  need	
  to	
  stay	
  open	
  per	
  
       region	
  plus	
  class	
  overhead	
  per	
  family	
  
•  Best	
  used	
  when	
  logical	
  separaGon	
  between	
  
   data	
  and	
  meta	
  columns	
  
•  SorGng	
  per	
  family	
  can	
  be	
  used	
  to	
  convey	
  
   applicaGon	
  logic	
  or	
  access	
  paRern	
  
•  Define	
  compression	
  or	
  in-­‐memory	
  aRributes	
  
   to	
  opGmize	
  access	
  and	
  performance	
  
                    Using	
  Bloomfilters	
  
•  Defines	
  a	
  filter	
  that	
  allows	
  to	
  determine	
  if	
  a	
  store	
  
   file	
  does	
  not	
  contain	
  a	
  row	
  or	
  column	
  
•  Error	
  rate	
  can	
  control	
  overhead	
  but	
  is	
  usually	
  very	
  
   low,	
  1%	
  or	
  less	
  
•  Stored	
  with	
  each	
  storage	
  file	
  on	
  flush	
  and	
  
   compacGons	
  
•  Good	
  for	
  large	
  regions	
  with	
  many	
  disGnct	
  row	
  
   keys	
  and	
  many	
  expected	
  misses	
  
•  Trick:	
  “OpGmize”	
  compacGon	
  to	
  gain	
  advantage	
  
   while	
  scanning	
  files	
  	
  
                           Data	
  Locality	
  
•  Provided	
  by	
  DFSClient	
  
•  Transparent	
  for	
  Hbase	
  
•  ACer	
  restart,	
  data	
  may	
  not	
  be	
  local	
  
    –  Work	
  is	
  done	
  to	
  improve	
  on	
  this	
  
•  Over	
  Gme	
  and	
  caused	
  be	
  compacGons	
  data	
  is	
  
   stored	
  where	
  it	
  is	
  needed,	
  i.e.	
  local	
  to	
  
   RegionServer	
  
•  Could	
  enforce	
  major	
  compacGon	
  before	
  
   starGng	
  MapReduce	
  jobs	
  
 checkAndPut()	
  and	
  checkAndDelete()	
  
•  Helps	
  with	
  atomic	
  operaGons	
  on	
  single	
  row	
  
•  Absence	
  of	
  value	
  is	
  treated	
  as	
  check	
  for	
  non-­‐
   existence	
  
	
  public   boolean checkAndPut(final byte[] row,
    final byte[] family, final byte[] qualifier,
    final byte[] value, final Put put)

 public boolean checkAndDelete(final byte[] row,
   final byte[] family, final byte[] qualifier,
   final byte[] value, final Delete delete)
                                     Locks	
  
•  Locks	
  can	
  be	
  set	
  explicitly	
  for	
  client	
  operaGons	
  
•  Lock	
  a	
  row	
  from	
  modificaGons	
  by	
  other	
  clients	
  
    –  Clients	
  block	
  on	
  locked	
  rows	
  –>	
  keep	
  locking	
  
       reasonably	
  short!	
  
•  Use	
  HTable’s	
  lockRow	
  to	
  acquire	
  and	
  unlockRow	
  
   to	
  release	
  
•  Locks	
  are	
  guarded	
  by	
  leases	
  on	
  RegionServer	
  and	
  
   configured	
  with	
  hbase.regionserver.lease.period	
  
    –  By	
  default	
  set	
  to	
  60	
  seconds	
  
    –  Leases	
  are	
  refreshed	
  by	
  any	
  mutaGon	
  call,	
  e.g.	
  get(),	
  
       put()	
  or	
  delete().	
  
                             Coprocessors	
  
•  New	
  addiGon	
  to	
  feature	
  set	
  
•  Based	
  on	
  talk	
  by	
  Jeff	
  Dean	
  at	
  LADIS	
  2009	
  
    –  Run	
  arbitrary	
  code	
  on	
  each	
  region	
  in	
  RegionServer	
  
    –  High	
  level	
  call	
  interface	
  for	
  clients	
  
         •  Calls	
  are	
  addressed	
  to	
  rows	
  or	
  ranges	
  of	
  rows	
  while	
  
            Coprocessors	
  client	
  library	
  resolves	
  locaGons	
  
         •  Calls	
  to	
  mulGple	
  rows	
  are	
  atomically	
  split	
  
    –  Provides	
  model	
  for	
  distributed	
  services	
  
         •  AutomaGc	
  scaling,	
  load	
  balancing,	
  request	
  rouGng	
  
                Coprocessors	
  in	
  HBase	
  
•  Use	
  for	
  efficient	
  computaGonal	
  parallelism	
  
•  Secondary	
  indexing	
  (HBASE-­‐2038)	
  
•  Column	
  Aggregates	
  (HBASE-­‐1512)	
  
    –  SQL-­‐like	
  sum(),	
  avg(),	
  max(),	
  min(),	
  etc.	
  
•  Access	
  control	
  (HBASE-­‐3025,	
  HBASE-­‐3045)	
  
    –  Provide	
  basic	
  access	
  control	
  
•  Table	
  Metacolumns	
  
•  New	
  filtering	
  	
  
    –  predicate	
  pushdown	
  
•  Table/Region	
  access	
  staGsGcs	
  
•  HLog	
  extensions	
  (HBASE-­‐3257)	
  
               Coprocessors	
  in	
  HBase	
  
•  Java	
  classes	
  implemenGng	
  interfaces	
  
•  Load	
  through	
  configuraGon	
  or	
  table	
  aRribute	
  

'COPROCESSOR$1' => 'hdfs://localhost:8020/
  hbase/coprocessors/test.jar:Test:1000‘

'COPROCESSOR$2' => '/hbase/coprocessors/
  test2.jar:AnotherTest:1001‘	
  

•  Can	
  be	
  chained	
  like	
  servlet	
  filters	
  
•  Dynamic	
  RPC	
  allows	
  funcGonal	
  extensibility	
  
  Coprocessor	
  and	
  RegionObserver	
  
•  The	
  Coprocessor	
  interface	
  defines	
  these	
  hooks	
  
   –  preOpen,	
  postOpen:	
  Called	
  before	
  and	
  aCer	
  the	
  
      region	
  is	
  reported	
  as	
  online	
  to	
  the	
  master	
  
   –  preFlush,	
  postFlush:	
  Called	
  before	
  and	
  aCer	
  the	
  
      memstore	
  is	
  flushed	
  into	
  a	
  new	
  store	
  file	
  
   –  preCompact,	
  postCompact:	
  Called	
  before	
  and	
  
      aCer	
  compacGon	
  
   –  preSplit,	
  postSplit:	
  Called	
  aCer	
  the	
  region	
  is	
  split	
  
   –  preClose,	
  postClose:	
  Called	
  before	
  and	
  aCer	
  the	
  
      region	
  is	
  reported	
  as	
  closed	
  to	
  the	
  master	
  
   Coprocessor	
  and	
  RegionObserver	
  
•  The	
  RegionObserver	
  interface	
  is	
  defines	
  these	
  hooks:	
  
     –  preGet,	
  postGet:	
  Called	
  before	
  and	
  aCer	
  a	
  client	
  makes	
  a	
  Get	
  request	
  
     –  preExists,	
  postExists:	
  Called	
  before	
  and	
  aCer	
  the	
  client	
  tests	
  for	
  
        existence	
  using	
  a	
  Get	
  
     –  prePut,	
  postPut:	
  Called	
  before	
  and	
  aCer	
  the	
  client	
  stores	
  a	
  value	
  
     –  preDelete,	
  postDelete:	
  Called	
  before	
  and	
  aCer	
  the	
  client	
  deletes	
  a	
  
        value	
  
     –  preScannerOpen,	
  postScannerOpen:	
  Called	
  before	
  and	
  aCer	
  the	
  client	
  
        opens	
  a	
  new	
  scanner	
  
     –  preScannerNext,	
  postScannerNext:	
  Called	
  before	
  and	
  aCer	
  the	
  client	
  
        asks	
  for	
  the	
  next	
  row	
  on	
  a	
  scanner	
  
     –  preScannerClose,	
  postScannerClose:	
  Called	
  before	
  and	
  aCer	
  the	
  client	
  
        closes	
  a	
  scanner	
  
     –  preCheckAndPut,	
  postCheckAndPut:	
  Called	
  before	
  and	
  aCer	
  the	
  client	
  
        calls	
  checkAndPut()	
  
     –  preCheckAndDelete,	
  postCheckAndDelete:	
  Called	
  before	
  and	
  aCer	
  the	
  
        client	
  calls	
  checkAndDelete()	
  
RegionObserver	
  Call	
  Sequence	
  
                           Example	
  
public class RBACCoprocessor extends BaseRegionObserver {
  @Override
  public List preGet(CoprocessorEnvironment e, Get get,
      List results) throws CoprocessorException {

        // check permissions...
        if (access_not_allowed) {
          throw new AccessDeniedException(
           "User is not allowed to access.");
        }
        return results;
    }

    // override prePut(), preDelete(), etc.
}
Endpoint	
  and	
  Dynamic	
  RPC	
  
                  HBase	
  and	
  Indexing	
  
•  Secondary	
  indexing	
  or	
  search?	
  
•  HBasene	
  
    –  Port	
  of	
  Lucandra	
  
•  Nutch,	
  Solr,	
  Lucene	
  
•  ITHBase	
  and	
  IHBase	
  	
  
    –  Moved	
  out	
  from	
  contrib	
  into	
  GitHub	
  
•  HSearch	
  
         Secondary	
  Index	
  or	
  Search?	
  
•  Can	
  keep	
  “lookup”	
  tables	
  
    –  But	
  could	
  also	
  be	
  in	
  the	
  same	
  table	
  
    –  Could	
  even	
  be	
  in	
  the	
  same	
  row	
  
         •  Use	
  ColumnFamily	
  per	
  index	
  (but	
  keep	
  number	
  low)	
  
         •  Make	
  use	
  of	
  column	
  sorGng	
  
         •  Does	
  it	
  fit	
  your	
  access	
  paRern?	
  
•  How	
  to	
  guarantee	
  updates?	
  
    –  Use	
  some	
  sort	
  of	
  “transacGon”	
  
•  Offer	
  sorGng	
  in	
  one	
  direcGon	
  
                    Example:	
  HBasene	
  
•  Based	
  on	
  Lucandra	
  
•  Implements	
  Lucene	
  API	
  over	
  HBase	
  
•  Stores	
  term	
  vector	
  as	
  rows	
  in	
  a	
  table	
  
    –  Each	
  row	
  is	
  one	
  term	
  and	
  the	
  columns	
  are	
  the	
  
       index	
  with	
  value	
  being	
  the	
  posiGon	
  in	
  the	
  text	
  
•  Document	
  fields	
  are	
  stored	
  as	
  columns	
  using	
  
   “field/term”	
  combinaGons	
  
•  Perform	
  boolean	
  operaGons	
  in	
  code	
  
                        ITHBase	
  and	
  IHBase	
  
•  Provided	
  by	
  contributors	
  
•  May	
  not	
  be	
  supporGng	
  latest	
  HBase	
  release	
  
•  Indexed-­‐TransacGonal	
  HBase	
  	
  
     –    Extends	
  RegionServer	
  code	
  
     –    Intrusive	
  
     –    Provides	
  noGon	
  of	
  TransacGons	
  over	
  rows	
  
     –    Maintains	
  lookup	
  tables	
  
•  Indexed	
  HBase	
  
     –    Implemented	
  by	
  Powerset/MicrosoC	
  
     –    Support?	
  
     –    Intrusive	
  
     –    Keeps	
  state	
  in	
  memory	
  
     –    Hooks	
  into	
  region	
  operaGons	
  to	
  maintain	
  state	
  
     –    Replace	
  with	
  Coprocessors	
  (HBASE-­‐2038)	
  
                 Custom	
  Search	
  Index	
  
•  Facebook	
  is	
  using	
  Cassandra	
  to	
  power	
  inbox	
  
   search	
  
    –  150TB	
  of	
  data	
  stored	
  
    –  Row	
  is	
  user	
  inbox	
  ID	
  
    –  Uses	
  super	
  columns	
  to	
  index	
  terms	
  
    –  Each	
  column	
  is	
  document	
  that	
  contains	
  the	
  term	
  
•  Make	
  use	
  of	
  parGal	
  scans	
  
    –  Can	
  be	
  done	
  on	
  row	
  and	
  column	
  level	
  
      	
  “Find	
  email	
  with	
  albert*”	
  
•  SorGng	
  of	
  columns	
  allows	
  for	
  performance	
  
   opGmizaGons	
  during	
  term	
  retrieval	
  
QuesGons?	
  

				
DOCUMENT INFO
Shared By:
Tags: hbase, hadoop
Stats:
views:7053
posted:12/13/2010
language:English
pages:85