Docstoc

INLS 760- Web Databases NOSQL

Document Sample
INLS 760- Web Databases NOSQL Powered By Docstoc
					 INLS 760- Web Databases
 NOSQL


                        O. Alaba
                        Doctoral Student
                        School of Information and Library Science
                        University of North Carolina at Chapel Hill
                        13 April 2010



Special Thanks to Matt Thomas for his contribution to this presentation.
What is NOSQL?

DEFINITION: Next Generation Databases address some of the
following points: being non-relational, distributed, open-source
and horizontal scalable. The original intention has been modern
web-scale databases. The movement began early 2009 and is
growing rapidly. Often more characteristics apply as: schema-free,
replication support, easy API, eventually consistency, and more. So
the misleading term "NOSQL" (the community now translates it
mostly with "not only sql") should be seen as an alias to something
like the definition above.

from: NOSQL-databases.org
What is NOSQL?
• “Not only SQL”

• Non-relational: flat file database

• Horizontally scalable: new nodes or modules can be
       easily added for more users.

• Vertically scalable: can easily add more information.

• Structured storage: usually a collection of tables of
       structured data (like a hash table or a dictionary) .
What is NOSQL?
• NOSQL was first developed in the late 1990’s by Carlo
        Strozzi.
• No ‘join’ operator.
• No ACID (atomicity, consistency, isolation, durability)
• No predefined schema
• No need to map object-oriented designs into a
      relational model.
• Examples: Google’s BigTable, Amazon’s Dynamo.
      Cassandra (used in Facebook’s inbox search) and
      HBase (Apache) are open source
ACID (for relational                   Requires a lot of resources,
                                       and does not scale very well
databases)

• ATOMICITY: All or nothing

• CONSISTENCY: Any transaction should result in valid tables.

• ISOLATION: separate transactions

• DURABILITY: Database will survive a system failure.
Problems with Relational DBs
• Poor Scalability!
       Digg: 3 TB for "Green Badges“
       Facebook: 50 TB for Inbox Search
       Ebay: 2 PB overall

• Poor server-to-server performance

• Rigid schema design
                                                     For easy scalability
CAP (for NOSQL databases)
• CONSISTENCY : All database clients see the same data,
        even with concurrent updates.

• AVAILABILITY : All database clients are able to access
        same version of the data.

• PARTITION TOLERANCE : The database can be split over
        multiple servers.

from: http://books.couchdb.org/relax/intro/eventual-consistency
Different Levels of Consistency
• Strict Consistency (one copy serializability)
• Read your write consistency
• Session consistency
• Monotonic Read Consistency
• Eventual Consistency




from: http://horicky.blogspot.com/2009/11/NOSQL-patterns.html.
                                     There are a number of models.
Types of NOSQL Structures

Document Store: focuses on documents. Documents can vary
in size; information can always be added to it.
         eg. FirstName="Jonathan", Address="15 Wanamassa
         Point Road", Children=("Michael,10", "Jennifer,8",
         "Samantha,5", "Elena,2")

• Large collections of items organized into domains.
• Items are little hash tables containing attributes of key,
value pairs.
• Attributes can be searched with various lexicographical
queries.
• eg. MondoDB, Apache’s CouchDB, and Amazon’s SimpleDB.
Types of NOSQL Structures

Graph: is a network database that uses edges and nodes to
represent and store data.

• Social networking
• Represent the real world
• eg. Neo4j
                                                  Written in Java; stores nodes
Types of NOSQL Structures                         and relationships and each can
                                                  store properties in key-value
eg. Neo4j                                         form.




 Video: http://blog.neo4j.org/2010/02/top-10-ways-to-get-to-know-neo4j.html
Types of NOSQL Structures
eg. Neo4j
To create a small graph:
Node firstNode = graphDb.createNode();
Node secondNode = graphDb.createNode();
Relationship relationship = firstNode.createRelationshipTo(
secondNode, MyRelationshipTypes.KNOWS );

firstNode.setProperty( "message", "Hello, " );
secondNode.setProperty( "message", "world!" );
relationship.setProperty( "message", "brave Neo4j " );


The graph will look like this:
(firstNode )---KNOWS--->(secondNode)
Types of NOSQL Structures
eg. Neo4j
 Printing information from the graph:
 System.out.print( firstNode.getProperty( "message" ) );
 System.out.print( relationship.getProperty( "message" ) );
 System.out.print( secondNode.getProperty( "message" ) );

 Printing will result in:
 Hello, brave Neo4j world!
Types of NOSQL Structures
Key/Value: Uses hash tables, where keys can map to multiple
values.

More complex key/value example: Cassandra
• “Column Families.”
• Keys have different numbers of columns, so the database
        can scale in an irregular way.
• Simple and Super: super columns are columns within
        columns.
• RDMS/NOSQL hybrid
• eg. Facebook’s Cassandra developed to search inbox.
   Types of NOSQL Structures
   eg. Cassandra

 •Keyspace: Usually the name of the
 application; eg, 'Twitter', 'Wordpress‘.

 •Column family: data based on key

 •Key: name of record

 •You can choose either (defined at
 startup):
      A column: a tuple with name and
      value
      A super column: Super columns
      are a great way to store one-to-
      many indexes to other records

from: http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/
                                         In Cassandra's Ruby API,
 Types of NOSQL Structures               parameters are expressed in
                                         storage order, for clarity:
 Cassandra Comparison

                         SELECT `column` FROM `database`,`table`
Relational
                         WHERE `id` = key;
BigTable                 table.get(key, "column_family:column")
Cassandra: standard      keyspace.get("column_family", key,
model                    "column")
Cassandra: super column keyspace.get("column_family", key,
model                   "super_column", "column")
Other Types of NOSQL Structures

  Tabular: similar to traditional databases.
  • eg. Google’s BigTable and Apache Hbase.

  Object-Oriented: database of objects or documents.
Shared Characteristics
•Allows for scalability
•Key/value store
• Run on large number of machines
• Data are partitioned and replicated among these machines
• Relaxes the data consistency requirement (because the CAP
        Theorem proves that you cannot get Consistency, Availability
        and Partitioning at the same time).
• Fast Averages
         w/ 50GB       Writes       Reads
         MySQL         ~300 ms      ~350 ms
         Cassandra     0.12 ms      15 ms

• from: http://horicky.blogspot.com/2009/11/NOSQL-patterns.html,
http://www.slideshare.net/Eweaver/cassandra-presentation-at-NOSQL
Shared Characteristics-API
The underlying data model can be considered as a large
Hashtable (key/value store).

Basic API access:
• get(key) -- Extract the value given a key
• put(key, value) -- Create or Update the value given its key
• delete(key) -- Remove the key and its associated value
• execute(key, operation, parameters) -- Invoke an operation
        to the value (given its key) which is a special data
        structure (e.g. List, Set, Map .... etc).
• mapreduce(keyList, mapFunc, reduceFunc) -- Invoke a
        map/reduce function across a key range.
from:
http://horicky.blogspot.com/2009/11/NOSQL-patterns.html
CouchDB
 • CouchDB views are stored as rows which are kept sorted
 by key.
 • Can adapt to variations in document structure.
 • MapReduce (based on simple range requests against the
 indexes)
 • Best to build an index that stores related data under
 nearby keys.
 from: http://books.couchdb.org/relax/intro/getting-started
CouchDB
 curl http://127.0.0.1:5984/
 {"couchdb":"Welcome","version":"0.9.0“}

 • curl command is automatically a GET function

 Get a list of databases:
 curl -X GET http://127.0.0.1:5984/_all_dbs
 []

 Create a database:
 curl -X PUT http://127.0.0.1:5984/books
 {"ok":true}

 Delete the second database:
 curl -X DELETE http://127.0.0.1:5984/books
 {"ok":true}
CouchDB

 {
     "_id" :
     "bc2a41170621c326ec68382f846d5764",
     "_rev" : "2612672603",
     "item" : "apple",
     "prices" : {
        "Fresh Mart" : 1.59,
        "Price Max" : 5.99,
        "Apples Express" : 0.79
     }
 }
Advantages of NOSQL

• Cheap, easy to implement
• Data are replicated and can be partitioned
• Easy to distribute
• Don't require a schema
• Can scale up and down
• Quickly process large amounts of data
• Relax the data consistency requirement (CAP)
Disadvantages of NOSQL
 • New and sometimes buggy
• Data is generally duplicated, potential for inconsistency
• No standardized schema
• No standard format for queries
• No standard language
• Most NOSQL systems avoid in-memory storage
• Difficult to impose complicated structures
• Depend on the application layer to enforce data integrity
• No guarantee of support
The Big Deal
• RDBMSs don't work well for the web
        the end?
• Big names supporting NOSQL
• Not the end of the relational database, but big changes
        Ahead?
• Questions?

				
DOCUMENT INFO
Shared By:
Tags: NoSQL
Stats:
views:19
posted:9/26/2011
language:English
pages:25
Description: NoSQL, refers to a non-relational database. With the rise of the Internet web2.0 site, the traditional relational database in dealing with web2.0 site, especially the large scale and high concurrent SNS type of web2.0 pure dynamic website has appeared to be inadequate, exposes a lot of difficult problems to overcome, rather than the relational database is characterized by its own has been very rapid development.