Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

NOSQL Databases_ Topics

VIEWS: 35 PAGES: 61

NoSQL, refers to a non-relational database. With the rise of the Internet web2.0 site, the traditional relational database in dealing with web2.0 site, especially the large scale and high concurrent SNS type of web2.0 pure dynamic website has appeared to be inadequate, exposes a lot of difficult problems to overcome, rather than the relational database is characterized by its own has been very rapid development.

More Info
									                     NOSQL Databases: Topics
• Introduction
• Rationale
• Key-value stores
• MapReduce
• Implementations




                                1
                           Introduction
• NOSQL := Not Only SQL
• Acronym introduced in 2009
  3 as the name of a meetup about open-source distributed non-relational
    databases
• Message misunderstood, giving birth to “NoSQL”




                                   2
                 Rationale (1)
• Performance
• Scalability
• Flexibility
• Kind of Data




                       3
                           Rationale (2)
• Brewer’s CAP Theorem
• Cannot guarantee more than two of
  3 Coherence
  3 Availability
  3 Partition tolerance




                                      4
                        Implementations

                                                  Voldemort

                                 Riak

               Neo4j


                                                              Dynamo

                                         KV Durable

HyperGraphDB
                       Graph
                                                                        Infobright




                                     NOSQL              Column Store



                                                                        MonetDB
                   KV Volatile
Memcached

                                             Document
                                               Store
                                                              MongoDB


               Redis

                                 eXist

                                                  CouchDB




                                         5
                          Key-Value Stores
• Global collection of Key/Value pairs
• Multiple types
  3 In memory (Redis, Memcache)
  3 On disk (BerkeleyDB)
  3 Eventually Consistent (Cassandra, Dynamo, Voldemort)




                                     6
                       Document Databases
• Similar to Key/Value database, with whole document as values.
• Flexible schema
• Documents are Serialized
• Examples: CouchDB, MongoDB




                                    7
                     Column Family Database
• Similar to Key/Value database, with multiple attributes (columns) as values.
• Not to be confused with column-oriented DBMS




                                     8
                         Graph Databases
• Inspired by Graph Theory
• Gains popularity as RDF store
• Examples Neo4j, InfiniteGraph




                                  9
                                Other
• Many other exist:
  3 Any database outside the relational model
• Object databases
• File System




                                   10
                        Key-Value Stores
• Basic Idea
• Mapping Tables to KV pairs
• Consistent Hashing




                               11
                              Basic Idea
• Very simple data model
• {key,value} pairs with unique keys
  3 {student_id: student_name}
  3 {part_id: part_manufacturer}
  3 {child_id: parent_id}
• Values have no type constraint




                                       12
                                API
• put(key, value)
• get(key)
  3 value = get(key)
• value is usually composite
  3 Opaque blob (e.g. TokyoCabinet)
  3 Directly supported (e.g. MongoDB)




                                 13
                            Implementation
• Usually B-trees or extensible hash tables
• Well-known structures in RDMS world




                                     14
Mapping Tables to KV pairs




            15
                     Mapping Tables to KV pairs
CREATE TABLE u s e r (
     i d INTEGER PRIMARY KEY,
               AC A
     username V R H R( 6 4 ) ,
               AC A
     password V R H R( 6 4 )
);
CREATE TABLE f o l l o w s (
     f o l l o w e r INTEGER REFERENCES u s e r ( i d ) ,
     f o l l o w e d INTEGER REFERENCES u s e r ( i d )
);
CREATE TABLE t w e e t s (
     i d INTEGER,
     u s e r INTEGER REFERENCES u s e r ( i d ) ,
              AC A
     message V R H R( 1 4 0 ) ,
     timestamp TIMESTAMP
);


                                        16
               Mapping Tables to KV pairs — Redis
• Creating a user

               INCR g l o b a l : n e x t U s e r I d => 1000
               SET u i d : 1 0 0 0 : username j o h n s m i t h
               SET u i d : 1 0 0 0 : password sunnyEvening

• Enabling logging-in

               SET username : j o h n s m i t h : u i d 1000

• Following:

               u i d : 1 0 0 0 : f o l l o w e r s => S e t o f u i d s
               u i d : 1 0 0 0 : f o l l o w i n g => S e t o f u i d s




                                             17
            Mapping Tables to KV pairs — Redis
• Messages by user:

            u i d : 1 0 0 0 : p o s t s => a L i s t o f p o s t i d s

• Adding a new message:

            SET p o s t : 1 0 3 4 3 ” $ o w n e r i d | $time | I ’m having fun ”




                                        18
                           Consistent Hashing
• Huge amounts of data
  3 Naive approach:

                  s e r v e r i d = hash ( key ) % n u m b e r o f s e r v e r s

  3 Hash function: anything → int
• Distribution?




                                         19
                   Consistent Hashing — Circle
• Assume int to be an 8-bit unsigned integer
• We have hash(key) ∈ 0, 255
• We can represent these values on a circle and:
  3 Assign a position to each server
  3 Compute the position of each key
  3 Assume a key k belongs to the next server on the circle (clockwise)




                                       20
                   Consistent Hashing — Circle




• Each node (server) is assigned a random value
• The hash of this value gives the position of the server on the circle
• A server is responsible for the arc before its position




                                       21
• Adding a node




                  22
Virtual nodes




     23
Moving nodes




     24
                                Replication
• Coordinator as defined previously
• In charge of replication to other nodes (e.g. N next ones)
• Parameters :
  3 Number of replicates (N )
  3 Minimal number of successful writes (W )
  3 Minimal number of coherent reads (R)
  3 Must respect R + W > N (Why ?)
• Repair-on-read




                                     25
NOSQL Databases: Topics




      Introduction
      Rationale
      Key-value stores
      MapReduce
      Implementations
MapReduce



     Parallel processing model
     Introduced to tackle computations over very large datasets
     Based on the well-known divide and conquer approach
         Large problem divided in many small problems
         Each tackled by one “processor” (Map)
         Results are then combined (Reduce)




     References: MapReduce (textbook), Lin and Dyer, 2010
Parallelism?




       Not a new problem
           E.g. threads, MPI, sockets, remote shell, . . .
           Generally tackles computation distribution, not data
           distribution.
           The developper is in charge of the implementation details.
       MapReduce offers an abstraction of many mechanisms by
       imposing a structure to the program.
MapReduce Concepts

                         Data




    Mapper    Mapper    Mapper    Mapper    Mapper




    Reducer   Reducer   Reducer   Reducer   Reducer




                                            Output
Origins of Map




      Map originally comes from the functional programming world
      Basic idea:
      for ( int i = 0; i < arr . length (); i ++) {
            result [ i ] = function ( arr [ i ]);
      }

      where function is a function in the mathematical sense
Origins of Map


      Idea: isolate the loop, so we can write:
      result = map ( function , arr );

      What if you could pass functions around as values?
      map could be a function that takes as arguments
          a sequence
          a function
      and that returns a new sequence where every element is the
      result of applying the function on the corresponding element
      in the original sequence
      map can abstract many for loops
Origins of Reduce

      map does not cover all for loops
      For example, when you gradually aggregate the results:
      int total = 0
      for ( int i = 0; i < arr . length (); i ++) {
            total = total + arr [ i ];
      }

      More generally:
      for ( int i = 0; i < arr . length (); i ++) {
            total = function ( total , arr [ i ]);
      }

      reduce covers these ones:
      total = reduce ( function , arr );
map and reduce in MapReduce


      In the context of MapReduce, the mapped function must
      return key-value couples:

              map(function, [data, . . . ]) → [(key, value), . . . ]

      Before the reduction, the data has to be aggregated by key:

      [(key1, value1), (key1, value2), . . . ] → (key1, value1, value2, . . . )

      Reduce step acts on values for each key

             reduce(key1, value1, value2, . . . ) → (key1, value)
Example

     Counting the words in a text
     map: word → (word, 1)
     Pair make_pair ( String word ) {
         return new Pair ( word , 1);
     }

     Aggregation: (word, 1, 1, 1, . . . )
     reduce:
     Pair compute_sum ( String word , List < int > values ) {
         int sum = 0;
         for ( int i : values ) {
               sum += i ;
         }
         return new Pair ( word , sum );
     }
Parallelization




       map:
              Trivial: absolutely no side effect
              (or not: what about transfer times?)
       reduce:
              Not fully parallelizable (each step needs the result of the
              previous step)
Parallelizing reduce


       Reduce needs to be idempotent
           Mathematically: f (f (x)) = f (x)
       Computation can be tree-shaped:
                 1
                                        2
                 1
                                               4
                 1
                                        2
                 1
       log N instead of N
We lied!

      There is still one step to discuss: How do we aggregate values
      by keys ?
      Naive idea: put a barrier between map and reduce
           Wait for all maps to complete
           Get all results in one place, sort them
           Redistribute them for reduce
                                             Data




                        Mapper    Mapper    Mapper    Mapper    Mapper




                                            Barrier




                        Reducer   Reducer   Reducer   Reducer   Reducer




                                            Output
Parallelizing aggregation


       The naive approach:
           is simple
           does not require an idempotent reduce
           is not as parallel as it could be
       Other idea: consistent hashing and idempotence
           Can compute results incrementally (idempotence)
           No barrier: better parallelism (hashing)
           Can display current results (idempotence)
       Note: usually, the implementation sorts the intermediate
       key-value pairs generated by map and the final results by key.
       This can be exploited by choosing a meaningful key.
Example: Sorting people by name




      map: person → (person.name, person)
      reduce: (person.name, person1, person2, . . . ) →
      (person.name, person1, person2, . . . )
      The result is sorted by virtue of the MapReduce machinery
      itself.
Example: Finding all (author,book) pairs



      There can be multiple authors per book!
      map
          We need a polymorphic map function, say f , such that:
               f (author) → (author.name, author)
               f (book) → [(book.author.name, book), . . . ]
      Aggregation: (author.name; book∗ , author, book∗ )
      In the following code, Value is a superclass of Author, Book
      and List.
Example: Finding all (author,book) pairs
      reduce
       Pair reduce ( String authorName , List < Value > values ) {
           Author a = n u l l ;
           Book prevbook = n u l l ;
           List < Pair > list = new List < Pair >();
           f o r ( Value value : values ) {
                 i f ( value i n s t a n c e o f Author ) {
                       a = ( Author ) value ;
                        i f ( prevbook != n u l l ) {
                              list . append (new Pair (a , prevbook ));
                              prevbook = n u l l ;
                       }
                 } e l s e i f ( value i n s t a n c e o f Book && a == n u l l ) {
                        i f ( prevbook != n u l l ) emit ( prevbook );
                       prevbook = ( Book ) value ;
                 } e l s e i f ( value i n s t a n c e o f Book && a != n u l l ) {
                       list . append (new Pair (a , prevbook ));
                 } e l s e i f ( value i n s t a n c e o f List < Pair >) {
                       list . append_all ( value );
                       a = list . first (). author ;
                 }
           }
           i f ( prevbook != n u l l ) emit ( prevbook );
           i f (! list . empty ()) emit ( list );
       }
Implementations




      LightCloud
      MongoDB
      Cassandra
LightCloud


      LightCloud is a distributed key-value store
          Implements distributed storage.
          “On-site” storage is provided by Tokyo Tyrant/Redis
      Tokyo Tyrant is a local key-value store
          Implements database managment functions
               Network interface and concurency control
               Database replication
          Actual storage is provided by Tokyo Cabinet
      Tokyo Cabinet
          Implements storage of key/value pairs
          Over a single file, for a single client.
LightCloud


                             Tokyo Tyrant
                             Tokyo Cabinet




             Tokyo Tyrant                    Tokyo Tyrant
             Tokyo Cabinet                   Tokyo Cabinet
Tokyo Cabinet/Tyrant



      Tokyo Cabinet/Tyrant provide a very raw interface for storing
      key/value pairs in a given single file
      The desired on-disk layout must be chosen
          Extensible Hash Map, B-Tree, Fixed-size records, . . .
          Parameters of these structures can be tweaked for better
          performance
          Very demanding on the user
      The API consists of get and put and a few variants
          The data are opaque, unstructured blobs!
LightCloud




      Adds (horizontal) scalability to Tokyo Tyrant nodes by means
      of consistent hashing
          Mitigates the distribution problem
          However, no replication is performed; consistency is preferred
          over availability.
      The API is still get and put, over strings.
MongoDB


    MongoDB is a document oriented database
    json documents
    {
          " name ": " John Smith " ,
          " address ": {
               " city ": " Owatonna " ,
               " street ": " Lily Road " ,
               " number ": 32 ,
               " zip ": 55060
          },
          " hobbies ": [ " yodeling " , " ice skating " ]
    }
Database Organisation




      Databases contain collections
      Collections contain documents and indexes
Physical layout




       Documents are stored as binary blobs (BSON)
           Documents are opaque for the database
           As a result of a query they are retrieved in their entirety
       Indexes are B-Trees referencing these documents.
           Allows to find documents based on the values they contain
           without explicitely opening the whole document.
Advanced querying




      Simple queries can be performed efficiently when an index is
      available
          E.g. db.employee.find({"address.city": "Owatonna"})
          with an index on ”address.city”
      Larger jobs can be done by means of map-reduce
          map maps a document to the needed key-value pair.
Advanced querying




      However, there is no facility for:
           Joining documents
           Quantifying over other documents (i.e. EXISTS in SQL)
      Such operations are left to the user of the database!
           Processing outside the database is costly!
           It is therefore important to design the data model in such a
           way that it returns the appropriate data directly.
Sharding




      MongoDB can shard documents over multiple servers
           Data are split into chunks
           A chunk has a starting and ending value.
           A server is Responsible for multiple chunks.
      Individual collections and not whole databases are sharded
Example: Sharding Persons over the Age field on 3 servers
                Server 1 Server 2 Server 3
                 1–10      11–20     22–29
                 21–22     30–41     42–50
                 51–72                73+
To be efficient, each server must keep roughly the same
amount of data.
    Mongodb provides automated balancing (auto-sharding) as
    much as possible
Shards are created explicitely by the database administrator
    shard = (collection, key)
    Well chosen, can improve query performance
    Otherwise, the load of each server can be very unbalanced
Cassandra




      Introduction and history
      Data model and layout
      Distribution
            Replication
            Adding nodes
            Handling problems
            Timestamping
Cassandra — Introduction




      Created by Facebook
          Based on Dynamo
          Lead Dynamo engineer hired by Facebook
      Released as Apache project
          Source code released in July 2008
          Adopted by Apache in March 2009
          Became high priority in February 2010
Cassandra — Data model

      Databases are conceptually two-dimensional
      Disks are one-dimensional
              1 2
      Table:         can be stored as either row-oriented (1, 2, 3, 4)
              3 4
      or column-oriented (1, 3, 2, 4); Cassandra is column-oriented
      No cost for NULL entries
      Easy column creation
      Structure:
          Column family ∼ table
          Super column ∼ columns
          Column ∼ column
      May be seen as a hash table with 4 or 5 dimensions:
      get ( keyspace , key , column_family [ , super_column ] , column )
Cassandra — Distribution


      CAP Theorem:
          (Consistency)
          Availability
          Partition tolerance
      Design goals
          Scalability
          Simplicity
          Speed
          Uniformity between nodes
      Consistent Hashing on a ring
          No virtual nodes
          Random placement
Cassandra — Replication and consistency




      Availability ⇒ more than one node needs a copy of each pair
      Responsible node choses N other nodes to hold copies
          Way in which those are chosen can be changed
          Next ones on the ring, different geographic location, etc.
      Attribution table copied to each node
      Possibility of choosing R and W values
Cassandra — Timestamping


      Every data has an associated timestamp
      Every key actually has an associated vector of
      (timestamp, value) pairs (truncated)
      Used to reach consistency with repair-on-read
      Query sequence:
          Identify the nodes that own the data for the key
          Route the request to the node and wait for the response
          If the reply does not arrive within the configured timeout, fail
          Figure out the latest response based on timestamps
          Schedule a repair if needed
      Repair algorithm can be customized
Cassandra — Adding a node


      Gossip
          Each node must know the position of every other node (and all
          replicas)
          Whenever a node moves or changes its replicas, it tells a
          number of other nodes, sending its whole replication table
          Routing information thus propagates
          Some nodes are preferred (seeds)
      When a new node is inserted, we must give it a keyspace
      and the address of a seed
          It chooses its position at random
          It contacts the seed to get a view of the current state
          It begins to move its data
Cassandra — Problem solving
      Overloaded node
          Causes
               The keys are not uniformly distributed
               Some keys are accessed more than others
               The node runs on inferior hardware
          Solution
               Overloaded nodes may move on the ring
      Unresponsive node
          Causes
               The machine has crashed
               There is too much latency on the network
          Solution
               Each node attributes a score to its neighbour
               Inverse logarithmic scale: 1 means 10% chance to wake up, 2
               means 1%, etc.
               Define a threshold after which the node is removed
      Can be mostly automated

								
To top