Docstoc

Berkeley DB

Document Sample
Berkeley DB Powered By Docstoc
					B-tree File Stack – Berkeley DB                                                        21 May 2007

Benoit Maréchal


                                          Berkeley DB

Introduction

The B-Tree file stack uses the Berkeley DB software to handle each operation.
Berkeley DB is a high-performance, embedded database library with bindings in C, C++, Java
and many other programming languages.
It can be used to form a database engine, in fact a B-Tree engine, which is then
incorporated into the B-Tree file stack. It makes simple function calls, rather than sending
messages to a remote server to store and retrieve data. Efficient compiled, link edited into the
same executable. Thus the Berkeley DB library is efficient compiled and the link are edited into
the same executable.


History

Berkeley DB was developed at University of California Berkeley as part of the transition from
Berkeley Software Distribution 4.3 to 4.4 and the effort to remove AT&T encumbered code. The
authors of Berkeley DB were asked by Netscape to improve and extend the library, then version
1.85, to suit their requirements for an LDAP server and for use in the Netscape browser. That
request led to the creation of Sleepycat Software, which was then acquired by Oracle Corporation
in February 2006. Berkeley DB is redistributed under the Sleepycat Public License.
Berkeley DB is distributed under a license that conforms to the Open Source Definition. This
license guarantees Berkeley DB is freely available for use and redistribution in other Open
Source applications.


Key and payload

Storage and retrieval for the Berkeley DB access methods are based on key and data pairs. The
key is a unique byte stream which identifies a data item.

 In Berkeley DB both key and data items are represented by the DBT data structure (So, it is not
an object). Thus, Berkeley DB works in terms of DBT data structures. These are what actually
stores, retrieves updates and deletes.
This structure contains 6 fields. The mains fields are data which holds the byte string of the key
or of the data, and the size of this byte string.

All The elements of the DBT structure are defined as follows:

void *data: A pointer to a byte string.

u_int32_t size: The length of data, in bytes.

u_int32_t ulen: The size of the user's buffer (to which data refers), in bytes. This location is not
written by the Berkeley DB functions.


                                             Page 1 on 3
B-tree File Stack – Berkeley DB                                                        21 May 2007

Benoit Maréchal
u_int32_t dlen: The length of the partial record being read or written by the application, in bytes.
See the DB_DBT_PARTIAL flag for more information.

u_int32_t doff: The offset of the partial record being read or written by the application, in bytes.

u_int32_t flags: The flags parameter must be set to 0.


Berkeley DB Handle

In order to perform each operation on a database, Berkeley DB library provides a structure
named DB. Notice, a database is the name given by Berkeley DB to a relation.
The DB structure refers to a single Berkeley DB database, thus, it is the handle for a Berkeley
DB database. It provides all methods used to manage the database, like put to insert a record, get
to retrieve a record, open to open the database with a means of knowing, which particular
database is involved (i.e in which one is a record inserted, retrieved, etc.).
Thus, the B-Tree File Stack use handle to process all operations.
The DB handle is created by the db_create method of Berkeley DB library, and we can create
several DB handle if we call several time the db_create method. Since, all the B-Tree File Stack
functions will require the DB handler of the database into which we want to perform an
operation; it will be possible to manage more than one database with the B-Tree File Stack.
Notice, a database should not be closed while it is being used.


Practical points

       Choice of the library used

Bekerley DB provides several libraries. The B-Tree File Stack uses the C library, because it’s the
better way to easily integrate the file stack into Raquel DBMS Server.


       Choice of the version used

Berkeley DB allows using lot of advanced features like replication, statistics, cryptography and
more. But for the B-Tree File Stack we just need the B-Tree feature. It’s why I choose to compile
a small-build version of Berkeley DB, which disables all features we don’t need to manage only
B-trees.
See the appendix “How to build a small version of Berkeley DB” and t “How much smaller is the
small build than the entire version” for more details.


Why it’s good to use

The Berkeley DB library is the most popular database engine in the world. This open source
library provides to developers simple function calls to manage a database. The most famous
database management system like MySQL uses this library for its own database engine.


                                            Page 2 on 3
B-tree File Stack – Berkeley DB                                                        21 May 2007

Benoit Maréchal
Berkeley DB was designed to provide industrial-strength database services to application
developers. It is a classic C-library toolkit, providing a broad base of functionality to application
writers.
The Berkeley DB library is very portable. It runs under almost all UNIX and Linux variants,
Windows, and a number of embedded real-time operating systems. It runs on both 32-bit and 64-
bit systems. It has been deployed on Internet servers, desktop machines and elsewhere.
Berkeley DB is a good choice too when we need fairly simple key/value lookups. In fact,
Berkeley DB runs more quickly than an SQL engine which needs to be parsed before their
execution.
Finally, Berkeley DB is scalable in a number of respects. The database library itself is compact,
but it can manage relations up to 256 terabytes in size. It also supports concurrency, with
thousands of users operating on the same database at the same time. Berkeley DB is small
enough to run in tightly constrained embedded systems, but can take advantage of gigabytes of
memory and terabytes of disk on server machines.


Reference

See the official website of Berkeley DB for more details:
       http://www.oracle.com/database/berkeley-db/index.html
The Wikipedia definition:
       http://en.wikipedia.org/wiki/Berkeley_DB




                                            Page 3 on 3

				
DOCUMENT INFO