C-Store: An Introduction to Berkeley DB

W
Shared by: S19hMKpr
Categories
Tags
-
Stats
views:
24
posted:
3/4/2012
language:
English
pages:
21
Document Sample
scope of work template
							C-Store: An Introduction to
Berkeley DB

    Jianlin Feng
    School of Software
    SUN YAT-SEN UNIVERSITY
    Mar. 13, 2009
Overview of Berkeley DB

   Means the Berkeley Database
       An open-source, embedded transactional data
        management system
       A key/value store
   Embedded ?
       As a library that is linked with an application
       Hides data management from end-user
   Scales from Bytes to Petabytes
   Runs on everything from cell phone to large
    servers.
Berkeley DB : Examples of Applications

   Google Accounts
       Store all user and service account information and
        preferences.


   Amazon’s user-customization

   Berkeley DB has high reliability and high
    performance.
Berkeley DB: A Brief History (1)

   Began life in 1991 as a dynamic linear
    hashing implementation.
       historic UNIX database libraries: dbm, ndbm and
        hsearch
   Released as a library in the 4.4 BSD in 1992.
       db-1.85 == Hash + B-Tree

   The package LIBTP
       Transactional Implementation of db-1.85
       A research prototype that was never released.
Berkeley DB: A Brief History (2)

   In 1996, Seltzer and Bostic started Sleepycat
    Software.
       for use in the Netscape browser
   Berkeley DB 2.0, Released in 1997
       Transactional implementation
       the first commercial release
   Berkeley DB 3.0, Released in 1999
       Transformed into an Object-Oriented Handle and
        Method style API.
Berkeley DB: A Brief History (3)

   Berkeley DB 4.0, Released in 1999
       Single-Master, Multiple-Reader Replication
       High Availability
           replicas can take over for a failed master
       High Scalability
           Read-only replicas can reduce master load
       Similar ideas are adopted in C-Store.


   In Feb. 2006, Oracle acquired Sleepycat.
Sleepycat Public License:
 a Dual License
   The code
       Is open source
       And may be downloaded and used freely
   However, redistribution requires
       Either the package using Berkeley DB be
        released as open source
       Or that the distributors obtain a commercial
        license from Sleepycat (and now Oracle, acquired
        in Feb. 2006).
Berkeley DB: Product Family Today

   The original Berkeley DB library
   Berkeley DB XML
       Atop the library
   Berkeley DB Java Edition
       100% pure Java implementation
Berkeley DB :
Product Family Architecture
Berkeley DB: The Design Philosophy

   Provide mechanisms without specifying
    policies

   For example, Berkeley DB is abstracted as a
    store of <key, value> pairs.
       Both keys and values are opaque byte-strings.
       i.e., Berkeley DB has no schema,
       And the application that embeds Berkeley DB is
        responsible for imposing its own schema on the
        data.
Advantages of <key, value> pairs

   An application is free to store data in
    whatever form is most natural to it.
       Objects (like structures in C language)
       Rows in Oracle, SQL Server
       Columns in C-store

   Different data formats can be stored in the
    same databases.
       As long as the application understands how to
        interpret the data items.
Indexing Key Values

   Indexing methods
       B-Tree
       Hash
       Queue
       A record-number-based index implemented atop
        B-Tree
   Data manipulation
       Put,    store key/value pairs
       Get,    retrieve key/value pairs
       Delete, remove key/value pairs
How Applications Access key/value pairs?

   Through handles on databases
       Similar to relational tables
   Or through cursor handles
       Representing a specific place within a database
       Used for iteration, i.e., fetch a key/value pair each
        time.
   Databases are implemented atop OS file
    system.
       A file may contain one or more databases.
Berkeley DB Replication:
A Log-Shipping System
   A Replication Group
       A single Master
       One or more Read-Only Replicas.
   All write operations must be processed
    transactionally by the Master
   The Master sends log records to each of the
    Replicas.
   The Replicas apply log records only when
    they receive a transaction commit record.
Berkeley DB: Configuration Flexibility

   Configuration flexibility is critical
       Due to a wide range of applications


   Three ways
       Compile Time Configuration
       Feature Set Selection
       Runtime Configuration
Compile Time Configuration
   Option 1:         small footprint build
       -enable-smallbuild
       For use in a cell phone
       The compiled library contains only B-Tree index,
       Omits replication, cryptography, statistics
        collection, etc. The library is about 0.5 MB.

   Option 2:         higher concurrency locking
       -enable-fine-grained-lock-manager
       For use in a Data Center
       Lock-Based Concurrency Control
Feature Set Selection

1.       The Data Store (DS) feature set
         Most similar to the original db-1.85 library
         Good for temporary data storage
2.       The Concurrent Data Store (CDS) feature set
         Acquires a single lock per API invocation
         Good for Read-Most applications
3.       The Transactional Data Store (TDS) feature set
         Currently the most widely used feature set
         Acquires a single lock per page
4.       The High Availability (HA) feature set
         Can continue running even after a site fails.
Runtime Configuration

   Index Selection and Tuning
       Applications can select the page size in an index
   Trading off Durability and Performance
       No-force log write
       Extreme case: applications can run completely in
        memory
   Trading off Two-Phase Locking and
    Multiversion Concurrency Control.
   Note: C-Store adopts similar ideas for high
    performance.
Challenges of Berkeley DB’s Flexibility

   Need flexibility in Berkeley DB designers

   Need flexibility in application developers
Any Dream? Any Idea?

   iGoogle中国大学生创新设计大赛

   中山大学软件学院第四届软件创新设计大赛

   Some Research with Me?
References

   M Seltzer . Berkeley DB: A Retrospective.
    IEEE Data Engineering Bulletin, Pp. 21-28,
    Volume 30, Number 3, September 2007
   MA Olson, K Bostic, M Seltzer . Berkeley DB.
    USENIX Annual Technical Conference, Pp.
    183–192, June 6-11, 1999, Monterey,
    California, USA.
   Oracle Berkeley DB Site.
    http://www.oracle.com/technology/products/b
    erkeley-db

						
Related docs
Other docs by S19hMKpr
Sinclair CV
Views: 5  |  Downloads: 0
rapport final 19 10 06
Views: 33  |  Downloads: 0
Special Olympics Update
Views: 3  |  Downloads: 0
FEDERATION EQUESTRE INTERNATIONALE 1996
Views: 5  |  Downloads: 0
Univerzita a adresa
Views: 2  |  Downloads: 0
GUIBERT Frederic
Views: 52  |  Downloads: 0
RESPONSABLES SCIENTIFIQUES
Views: 3  |  Downloads: 0