INLS 760- Web Databases NOSQL

Document Sample
INLS 760- Web Databases NOSQL Powered By Docstoc
					 INLS 760- Web Databases

                        O. Alaba
                        Doctoral Student
                        School of Information and Library Science
                        University of North Carolina at Chapel Hill
                        13 April 2010

Special Thanks to Matt Thomas for his contribution to this presentation.
What is NOSQL?

DEFINITION: Next Generation Databases address some of the
following points: being non-relational, distributed, open-source
and horizontal scalable. The original intention has been modern
web-scale databases. The movement began early 2009 and is
growing rapidly. Often more characteristics apply as: schema-free,
replication support, easy API, eventually consistency, and more. So
the misleading term "NOSQL" (the community now translates it
mostly with "not only sql") should be seen as an alias to something
like the definition above.

What is NOSQL?
• “Not only SQL”

• Non-relational: flat file database

• Horizontally scalable: new nodes or modules can be
       easily added for more users.

• Vertically scalable: can easily add more information.

• Structured storage: usually a collection of tables of
       structured data (like a hash table or a dictionary) .
What is NOSQL?
• NOSQL was first developed in the late 1990’s by Carlo
• No ‘join’ operator.
• No ACID (atomicity, consistency, isolation, durability)
• No predefined schema
• No need to map object-oriented designs into a
      relational model.
• Examples: Google’s BigTable, Amazon’s Dynamo.
      Cassandra (used in Facebook’s inbox search) and
      HBase (Apache) are open source
ACID (for relational                   Requires a lot of resources,
                                       and does not scale very well

• ATOMICITY: All or nothing

• CONSISTENCY: Any transaction should result in valid tables.

• ISOLATION: separate transactions

• DURABILITY: Database will survive a system failure.
Problems with Relational DBs
• Poor Scalability!
       Digg: 3 TB for "Green Badges“
       Facebook: 50 TB for Inbox Search
       Ebay: 2 PB overall

• Poor server-to-server performance

• Rigid schema design
                                                     For easy scalability
CAP (for NOSQL databases)
• CONSISTENCY : All database clients see the same data,
        even with concurrent updates.

• AVAILABILITY : All database clients are able to access
        same version of the data.

• PARTITION TOLERANCE : The database can be split over
        multiple servers.

Different Levels of Consistency
• Strict Consistency (one copy serializability)
• Read your write consistency
• Session consistency
• Monotonic Read Consistency
• Eventual Consistency

                                     There are a number of models.
Types of NOSQL Structures

Document Store: focuses on documents. Documents can vary
in size; information can always be added to it.
         eg. FirstName="Jonathan", Address="15 Wanamassa
         Point Road", Children=("Michael,10", "Jennifer,8",
         "Samantha,5", "Elena,2")

• Large collections of items organized into domains.
• Items are little hash tables containing attributes of key,
value pairs.
• Attributes can be searched with various lexicographical
• eg. MondoDB, Apache’s CouchDB, and Amazon’s SimpleDB.
Types of NOSQL Structures

Graph: is a network database that uses edges and nodes to
represent and store data.

• Social networking
• Represent the real world
• eg. Neo4j
                                                  Written in Java; stores nodes
Types of NOSQL Structures                         and relationships and each can
                                                  store properties in key-value
eg. Neo4j                                         form.

Types of NOSQL Structures
eg. Neo4j
To create a small graph:
Node firstNode = graphDb.createNode();
Node secondNode = graphDb.createNode();
Relationship relationship = firstNode.createRelationshipTo(
secondNode, MyRelationshipTypes.KNOWS );

firstNode.setProperty( "message", "Hello, " );
secondNode.setProperty( "message", "world!" );
relationship.setProperty( "message", "brave Neo4j " );

The graph will look like this:
(firstNode )---KNOWS--->(secondNode)
Types of NOSQL Structures
eg. Neo4j
 Printing information from the graph:
 System.out.print( firstNode.getProperty( "message" ) );
 System.out.print( relationship.getProperty( "message" ) );
 System.out.print( secondNode.getProperty( "message" ) );

 Printing will result in:
 Hello, brave Neo4j world!
Types of NOSQL Structures
Key/Value: Uses hash tables, where keys can map to multiple

More complex key/value example: Cassandra
• “Column Families.”
• Keys have different numbers of columns, so the database
        can scale in an irregular way.
• Simple and Super: super columns are columns within
• RDMS/NOSQL hybrid
• eg. Facebook’s Cassandra developed to search inbox.
   Types of NOSQL Structures
   eg. Cassandra

 •Keyspace: Usually the name of the
 application; eg, 'Twitter', 'Wordpress‘.

 •Column family: data based on key

 •Key: name of record

 •You can choose either (defined at
      A column: a tuple with name and
      A super column: Super columns
      are a great way to store one-to-
      many indexes to other records

                                         In Cassandra's Ruby API,
 Types of NOSQL Structures               parameters are expressed in
                                         storage order, for clarity:
 Cassandra Comparison

                         SELECT `column` FROM `database`,`table`
                         WHERE `id` = key;
BigTable                 table.get(key, "column_family:column")
Cassandra: standard      keyspace.get("column_family", key,
model                    "column")
Cassandra: super column keyspace.get("column_family", key,
model                   "super_column", "column")
Other Types of NOSQL Structures

  Tabular: similar to traditional databases.
  • eg. Google’s BigTable and Apache Hbase.

  Object-Oriented: database of objects or documents.
Shared Characteristics
•Allows for scalability
•Key/value store
• Run on large number of machines
• Data are partitioned and replicated among these machines
• Relaxes the data consistency requirement (because the CAP
        Theorem proves that you cannot get Consistency, Availability
        and Partitioning at the same time).
• Fast Averages
         w/ 50GB       Writes       Reads
         MySQL         ~300 ms      ~350 ms
         Cassandra     0.12 ms      15 ms

• from:,
Shared Characteristics-API
The underlying data model can be considered as a large
Hashtable (key/value store).

Basic API access:
• get(key) -- Extract the value given a key
• put(key, value) -- Create or Update the value given its key
• delete(key) -- Remove the key and its associated value
• execute(key, operation, parameters) -- Invoke an operation
        to the value (given its key) which is a special data
        structure (e.g. List, Set, Map .... etc).
• mapreduce(keyList, mapFunc, reduceFunc) -- Invoke a
        map/reduce function across a key range.
 • CouchDB views are stored as rows which are kept sorted
 by key.
 • Can adapt to variations in document structure.
 • MapReduce (based on simple range requests against the
 • Best to build an index that stores related data under
 nearby keys.

 • curl command is automatically a GET function

 Get a list of databases:
 curl -X GET

 Create a database:
 curl -X PUT

 Delete the second database:
 curl -X DELETE

     "_id" :
     "_rev" : "2612672603",
     "item" : "apple",
     "prices" : {
        "Fresh Mart" : 1.59,
        "Price Max" : 5.99,
        "Apples Express" : 0.79
Advantages of NOSQL

• Cheap, easy to implement
• Data are replicated and can be partitioned
• Easy to distribute
• Don't require a schema
• Can scale up and down
• Quickly process large amounts of data
• Relax the data consistency requirement (CAP)
Disadvantages of NOSQL
 • New and sometimes buggy
• Data is generally duplicated, potential for inconsistency
• No standardized schema
• No standard format for queries
• No standard language
• Most NOSQL systems avoid in-memory storage
• Difficult to impose complicated structures
• Depend on the application layer to enforce data integrity
• No guarantee of support
The Big Deal
• RDBMSs don't work well for the web
        the end?
• Big names supporting NOSQL
• Not the end of the relational database, but big changes
• Questions?

Shared By:
Tags: NoSQL
Description: NoSQL, refers to a non-relational database. With the rise of the Internet web2.0 site, the traditional relational database in dealing with web2.0 site, especially the large scale and high concurrent SNS type of web2.0 pure dynamic website has appeared to be inadequate, exposes a lot of difficult problems to overcome, rather than the relational database is characterized by its own has been very rapid development.