Docstoc

DISTRIBUTED DATABASES IN THE CLOUD USING NoSQL

Document Sample
DISTRIBUTED DATABASES IN THE CLOUD USING NoSQL Powered By Docstoc
					    DISTRIBUTED
   DATABASES IN
THE CLOUD USING
          NoSQL

                Sidney SHEK
        sshek2@csc.com
      September 8, 2011




      CSC Leading Edge Forum
        Technology Grant FY11

       9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 1
Agenda


• Introduction – What is NoSQL and why is it relevant to us?


• Three key principles of NoSQL


• Case studies – Applying NoSQL to Enterprises


• Future trends


• Conclusion




                                                      9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 2
Challenges faced by Internet giants


1. Support massively scalable and high performance web services
 •   Massive data volumes...and growing!
 •   Customers distributed around the world
 •   Highly available
 •   Low-latency




                                                  9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 3
Challenges faced by Internet giants


2. Dealing with flexible and complex data structures
 •     Document storage
 •     Social networks
 •     Multimedia




           Source: IDC White Paper - sponsored by EMC.
 As the Economy Contracts, the Digital Universe Expands. May 2009


                                                                    9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 4
What is NoSQL?


• Movement away from traditional relational databases


• Address challenges posed by Cloud and Big Data


• One size no longer fits all




                                                    9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 5
Who’s using NoSQL?




  References: http://www.mongodb.org/display/DOCS/Production+Deployments, http://wiki.apache.org/cassandra/ArticlesAndPresentations,
                    http://en.wikipedia.org/wiki/NoSQL, http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
                Disclaimer: All logos, trade marks and brand names used in this presentation belong to the respective owners

                                                                                                9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 6
Who’s paying?
                “NoSQL: 10gen stores in
                  $6.5m for MongoDB
                      database”
                           Source: WIREDvc
                  http://www.wiredvc.com/nosql-10gen-
                528stores-in-6-5m-for-mongodb-database/



                             “VMware hires key
                            developer for Redis”
                                     Source: VMware
                           http://blogs.vmware.com/console/2010/
                             03/vmware-hires-key-developer-for-
                                          redis.html



                                                 “Cassandra NoSQL
                                                   Database Gets
                                                Commerical Support”
                                                      Source: Database Journal
                                             http://www.databasejournal.com/sqletc/article.
                                              php/3878651/Cassandra-NoSQL-Database-
                                                     Gets-Commercial-Support.htm

                                                            9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 7
Principle #1 – Build Infinitely Scalable Systems


• Add more servers to boost capacity and throughput
• Parallel processing for linear scalability (e.g. MapReduce)
• Process where the data is to reduce network hops

                                                 “Run program on
                                                    your data”



                    “Run program”



           Users    “Here are your
                       results”
                                     Processor



                                                                   Data nodes
                                                                   9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 8
Principle #2 – Avoid Distributed Transactions


• Distributed transactions don‟t scale
 – Design for co-located transactional data
 – Accept eventual consistency




                    What happens when nodes are far apart?

                                                       9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 9
Principle #2 – Avoid Distributed Transactions


• Distributed transactions don‟t scale
 – Design for co-located transactional data
 – Accept eventual consistency




               No more overhead in acquiring locks across nodes!

                                                        9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 10
Principle #3 – Choose the right tools for the job

             Indicative comparison of NoSQL databases




                                               9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 11
Case Study 1 – Cloud-based Application Config Management




     App 1                                                App 1
                           Config           Config
             Sydney                                                 New York
     App 2                                                App 2



                      Config                     Config

                                   Central
                                administrators
     App 1                                                 App 1


         Melbourne                                                        London
     App 2                                                 App 2



                                                      9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 12
Case Study 1 – Cloud-based Application Config Management




                                         9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 13
Case Study 1 – Cloud-based Application Config Management




                                         9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 14
Case Study 2 – Data Capture and Dashboard in the Cloud




                       Australia-wide
                          power




Perth region                                                     Sydney region
  power                                                             power




                                          9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 15
Case Study 2 – Data Capture and Dashboard in the Cloud




                                          9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 16
Case Study 2 – Data Capture and Dashboard in the Cloud




                                          9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 17
Case Study 2 – Data Capture and Dashboard in the Cloud




                                          9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 18
Case Study 3 – Migrating Location-based Service to NoSQL




                                          9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 19
Possible Applications To Many Enterprise Scenarios


• Complex bioinformatics data analysis


• Forensic analysis


• Master data management


• Real-time web application analytics




                                          9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 20
Future Trends


• Consolidation, standardisation and new features for NoSQL databases
 – Spring Data
 – UnQL and coSQL


• Database-as-a-Service


• „RAM Cloud‟ – High-performance data grids


• The new enterprise stack




                                                   9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 21
The Old Stack – Relational Database for Everything




         Queries (SQL)




      Relational database


      Monolithic hardware
       (few CPUs and network
             computers)


    “Shared disk/memory”
         architecture
     (centralised processing)




                                            9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 22
The New Stack – No More One Size Fits All




                Direct record access or queries                         MapReduce programs


     High-performance          NoSQL database         Parallel relational              MapReduce
         traditional            and data grids            database                      engines
    relational database         (e.g. CouchDB,
                                                         (e.g. Greenplum)                  (Hadoop)
     (e.g. Oracle Exadata)         GemFire)

      Monolithic hardware                                 Distributed hardware
       (few CPUs and network               (multi-core CPUs, multiple computers connected via high-
             computers)                                     performance network)


    “Shared disk/memory”
         architecture                              Shared nothing architecture
     (centralised processing)                        (distributed parallel processing)




                                                                            9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 23
Summary – What is NoSQL?


• Move away from traditional relational databases


• Address challenges posed by Cloud and Big Data by:

 – Building infinitely scalable systems


 – Avoiding distributed transactions


 – Choosing the right database(s) for the job




                                                    9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 24
Summary – Join the crowd who are using NoSQL




                                        9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 25
For more information


• Contact me at sshek2@csc.com


• „Distributed Databases in the Cloud Using NoSQL‟ LEF grant report


• “Life beyond Distributed Transactions: an Apostate's Opinion.”
  by Pat Helland




                                                     9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 26
    DISTRIBUTED
   DATABASES IN
THE CLOUD USING
          NoSQL


           Thank you




      9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 27
9/12/2011 9:12 AM   PPT 2007_MASTER_FMT 28

				
DOCUMENT INFO
Shared By:
Tags: NoSQL
Stats:
views:39
posted:9/26/2011
language:English
pages:28
Description: NoSQL, refers to a non-relational database. With the rise of the Internet web2.0 site, the traditional relational database in dealing with web2.0 site, especially the large scale and high concurrent SNS type of web2.0 pure dynamic website has appeared to be inadequate, exposes a lot of difficult problems to overcome, rather than the relational database is characterized by its own has been very rapid development.