Cloud data serving systems by fjzhangweiyun


									 Cloud Computing
       lecture 8

Cloud Data Serving Systems

        Keke Chen
 Frequent operations: append, squential
  read. Great scalability
 Random Read/Write. Great scalability
     BigTable (Hbase)
     DynamoDB
     Cassandra
     PNUTS
 Structured data
   May generate from MapReduce processing
 Support random R/W
Data model
 Objects are arrays of bytes
 Objects are indexed by 3 dimensions
   Row
   Column and column family
   Timestamp (versions)


     “a big sparse table”
Data model
 Row
   Row key: strings
   Row range (groups of rows) – tablet (100-200M)
 Column
   Grouped into “column families”
     Column key “Family: qualifier”
   Data physically organized by column-family – a
    embedded substructure of a record
   Access controls are on column-family
   Example:
    “anchor” family, qualifier is the anchor link, and
    object is the anchortext

 Timestamps
   Representing versions of the same object
   Schema
     Create, delete tables/column families
     Access control
   Data
     Write, delete, lookup values
     Iterate a subset
   Single-row transaction
   BigTable data can be input/output of
Physical organization
 Built on top of GFS
 A tablet contains multiple SSTables
 SSTable: a block file with sparse index
   Data (key, value) are stored in blocks
   Block index
     sorted keys,
     Keys are partitioned to ranges
     First Key of the range ->block(s)
   Data Possibly organized in a nested way
 Use Chubby distributed lock service
Basic technical points
 Use index to speed up random R/W
   Tablet servers maintain the indices
 Use locking service to solve the R/W
  conflicts  guarantees consistency
How to get tablet location?
 Through Chubby service

                                               Chubby client
                                               Caches locations

   Metadata 128M, 1K per item  2^17 entries
   Two levels  2^34 entries(tablets)
How it works
 Chubby service handles tablet locking
 Master maintains consistency
   Start up by reading Metadata from Chubby
   Check lock status
 insert/delete/split/merge tablets
 Tablet operations
   Each tablet contains multiple SSTables (info
    stored in Metadata)
   Use log to maintain consistency
 Open source version to BigTable
 Built on top of HDFS
Dynamo (DynamoDB)
 Developed by Amazon
 Fast R/W speed, satisfying strong
  service level agreement
 Using a fully distributed architecture
 Store only key-value pairs
Service Level Agreements
 Application can deliver
  its functionality in
  abounded time: Every
  dependency in the platform
  needs to deliver its
  functionality with even tighter
 Example:      service
  guaranteeing that it will
  provide a response within
  300ms for 99.9% of its
  requests for a peak client load
  of 500 requests per second.
                                    Service-oriented architecture of
                                          Amazon’s platform
Design Consideration
 Sacrifice strong consistency for
 Conflict resolution is executed during
  read instead of write, i.e. “always
 Other principles:
     Incremental scalability.
     Symmetry.
     Decentralization.
     Heterogeneity.
  Partition Algorithm

 Consistent hashing: the
    output range of a hash function is
    treated as a fixed circular space or
 ”Virtual Nodes”: Each node
  can be responsible for more than one
  virtual node.
 Advantages of using virtual
 If a node becomes unavailable
  the load handled by this node
  is evenly dispersed across the
  remaining available nodes.
 When a node becomes
  available again, the newly
  available node accepts a
  roughly equivalent amount of
  load from each of the other
  available nodes.
 The number of virtual nodes
  that a node is responsible can
  decided based on its capacity,
  accounting for heterogeneity
  in the physical infrastructure.

 Each data item is
  replicated at N hosts.
 “preference list”: The
  list of nodes that is
  responsible for storing
  a particular key.
 Easy to scale up/down
   Membership join/leaving
   Redundant data stores
 Use BigTable’s data model
   “Rows” and “column family”
 Use Dynamo’s architecture
Comparison on cloud data
serving systems
 Paper “Benchmarking Cloud Serving
  Systems with YCSB”
Update heavy workload
Read heavy
Short scan
Cluster size
 C-store is Read-optimized, for OLAP
  type apps
 Traditional DBMS, write-optimized
  (optimized for online transactions)
   Based on records(rows)
 What are the cost-sensitive major
  factors in query processing?
   Size of database
   Index or not
   Join

 Current hardware configuration and
  what a DBMS can do…
   Cheap storage – allow distributed redundant
    data store
   Fast CPUs – compression/decompression
   Limited disk bandwidth – reduce I/O
 Supporting OLAP (online analytic
  processing) operations
   Optimized read operations
   Balanced write performance
   Address the conflict between writes and reads
     Fast write – append records
     Fast read – indexed, compressed

 Think
   if data organized in columns, what are the
    unique challenges (different from the row-
C-store’s features
 Column based store saves space
   Compression is possible
   Index size is smaller
 Multiple projections
   Allow multiple indices
   Parallel processing on the same attributes
   Materialized join results
 Separation of writeable store and read-
  optimized store
   Both write/read are optimized
   Transactions are not blocked by write locks
Data model
 Same as relational data model
     Tables, rows, columns
     Primary keys and foreign keys
     Projections
       From single table
       Multiple joined tables
 Example
Normal relational model        Possible C-store model

EMP(name, age, dept, salary)   EMP1 (name, age)
DEPT(dname, floor)             EMP2 (dept, age, DEPT.floor)
                               EMP3 (name, salary)
                               DEPT1(dname, floor)
Physical projection organization
 Sort key
   each projection has one
   Rows are ordered by sort key
   Partitioned by key range
 Linking columns in the same projection
   Storage key – (segment id, key, i.e.,offset in

 Linking projections
   To reconstruct a table
   Join index
     Conceptual organization
              column   Sort key column      Join index

                                           Seg id        offset
by sort key

                            Projection 1

                                                                  Projection 2
Architectural consideration
between writes and reads
 Read often  needs indices to speedup
 Write often  index unfriendly: needs to
  update indices frequently
 Use “read store” and “write store”
Read store: Column encoding
 Use compression schemes and indices
   Self-order (key), few distinct values
     (value, position, # items)
     Indexed by clustered B-tree
   Foreign-order (non-key), few distinct values
     (value, bitmap index)
     B-tree index: position  values
   Self-order, many distinct values
     Delta from the previous value
     B-tree index
   Foreign-order, many distinct values
     Unencoded
Write Store
 Same structure, but explicitly use
  (segment, key) to identify records
   Easier to maintain the mapping
   Only concerns the inserted records
 Tuple mover
   Copies batch of records to RS

 Delete record
   Mark it on RS
   Purged by tuple mover
Tuple mover
 Moves records in WS to RS
 Happens between read-only transactions
 Use merge-out process
How to solve read/write conflict
 Situation: one transaction updates the
  record X, while another transaction
  reads X.
 Use snapshot isolation
Benefits in query processing
 Selection – has more indices to use
 Projection – some “projections” already
 Join – some projections are materialized
 Aggregations – works on required
  columns only
 Use TPC-H – decision support queries
 Storage
Query performance
Query performance
 Row store uses materialized views
Summary: the performance gain
 Column representation – avoids reads of
  unused attributes
 Storing overlapping projections –
  multiple orderings of a column, more
  choices for query optimization
 Compression of data – more orderings
  of a column in the same amount of
 Query operators operate on compressed

To top