a survey of open source databases by czi13167

VIEWS: 148 PAGES: 45

									                Which databases solve
                    my problem?
                       a survey of
                  open source databases
                           Selena Deckelmann
                          End Point Corporation
LC                            @selenamarie
     A
         20
           10
                  PostgreSQL Global Development Group
     2005:
                BerkeleyDB
                 MySQL
                PostgreSQL
                  SQLite
LC
     A
         20
           10
2010:
                                                HSQLDB             SQLite
                     LucidDB
                                      CouchDB                   InfiniDB     XtraDB
                                            GT.M
                                         SmallSQLOzone
       Dynamo               Ingres
BerkeleyDB                                                              CUBRID
                                                                     MNesia
                       H2                            Scalaris PostgreSQL MongoDB
                   4store
                                     Neo4j                             Redis    SciDB
                                                         Cassandra LightCloud
                                                                       Apache Derby
SmallDB      ydb                              Riak              Terrastore

                       Hypersonic                       MariaDB MySQL
                                                                      TokyoTyrant
                                     Chordless
                              CouchDB HyperTable
                                             Parliament
                                                MonetDB
      Firebird
Memcached             Hbase               Voldemort
                                   McKoi DDB
                                                Sesame
                                                                  3store
2am on Monday morning.
                Which open source
                    database
                  should I use?
LC
     A
         20
           10
LC
     A
         20
           10
                MySQL vs PostgreSQL


LC
     A
         20
           10
                What problem are you
                  trying to solve?

LC
     A
         20
           10
                Some problems:


LC
     A
         20
           10
                I need to store and
                manipulate GIS data.

LC
     A
         20
           10
                I need a database for
                      my blog.

LC
     A
         20
           10
                I have ONE BILLION
                  users to store and
                  analyze data from.
LC
     A
         20
           10
                Define your problem.




LC
     A
         20
           10
                Which problems are
                   important?

LC
     A
         20
           10
                   performance
                  your use case.
                test with real data.
LC
     A
         20
           10
                   interoperability
                can I get my data in/out?
                    how painful is it?

LC
     A
         20
           10
                     sustainability
                how is the software made?


LC
     A
         20
           10
                Which databases solve
                    my problem?

LC
     A
         20
           10
                       Free and Open Source Databases*
                50

                40

                30

                20

                10



                 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008


LC
     A                 * That I can find information about
         20
           10
                         The Survey

                • Wasn’t perfect.
                • Contacted 25 projects,12 responses.
                • Will try again with different
                  questions, cooler website.

LC
     A
         20
           10
                The questions:


LC
     A
         20
           10
                •   What is the name of your project?

                •   How would you describe your software and what
                    it does in a sentence or two?

                •   Who is the target user or audience for your
                    database? Do you have any case studies to share?
                •   Is there a proprietary work-alike or equivalent to
                    your open source database? 

                •   What's the best mailing list for users of your
                    database to subscribe to? 

                •   What's the best mailing list for developers of your
                    database to subscribe to?
                •   What's the best document for new developers to
                    read if they want to get involved? 

LC
     A
         20
           10
                •   What revision control system does your project
                    primarily use?
                •   What motivated you to create a new project,
                    rather than join an existing project?
                •   Do you have a roadmap for the next year? If so,
                    what is it?
                •   Does anyone provide commercial support for your
                    software?
                •   What languages are drivers available in, and/or
                    what protocols does your database support? Are
                    they up to date?
                •   Do you need help with any particular drivers?
                •   Is there some question I should have asked?
LC
                •   What feature(s) sets your project apart from your
     A
         20
                    peers?
           10
                And I did my own
                   research...

LC
     A
         20
           10
                Means of comparison

                  Database model
                  Infrastructure features
                  Development style

LC
     A
         20
           10
                Models: defining what
                operations you’ll likely
                 perform on the data
LC
     A
         20
           10
                       Relational
                    Database models
                OLTP: Transaction-oriented
                Embedded: Bundling, simplicity, testing
                Column: Data warehouses
                MPP: Massively Parallel
                Streaming: Query streams, not storage
LC
     A
         20
           10
                     Relational
                  Database Models
                                    Column-
                OLTP   Embedded
                                      store
                                    MonetDB
                                     LucidDB
        CUBRID            H2
                                  C-store/Vertica
     MySQL (InnoDB)     HSQLDB
       PostgreSQL        SQLite
                                    (Cassandra
                                      Hbase)
LC
     A
         20
           10
                       non-Relational
                      Database models
                Flatfile: See Tin ( http://tr.im/KNFp )
                Key-value: map-reduce, fault-tolerance, caching
                Multi-value: Multi-dimensional - GT.M
                Graph/Triple-store: Relationship queries
                Document-oriented: Semi-structured data
LC
     A
         20
           10
            non-Relational
           Database Models
                       Graph/
   Key-value                         Document
                     Triple-store
       BerkeleyDB
         Cassandra
          Hbase
                        Neo4j          CouchDB
       Memcached
                        4store      BerkeleyDB-XML
           Riak
                      Parliament       MongoDB
           Redis
LC TokyoCabinet
  A
    20
      10    ydb
                infrastructure features:
                     “distributed”
                       memory
                          HA
LC
     A
         20
           10
                       “Distributed”
            Partitioning/
                                   Replication
              Sharding

                            BerkeleyDB    Scalaris
                Cassandra
                             CouchDB     Voldemort
                  Hbase
                             Cassandra   HyperTable
                Voldemort
                              MySQL        HBase
                   Riak
                            PostgreSQL   Memcached
                 MySQL
                               Riak       MNesia
LC
     A
         20
           10
                   Memory vs Disk
          In-memory*         Configurable            Disk

                             Cassandra
          Memcached
                               Hbase
           Scalaris                             Everyone else
                             HyperTable
            Redis
                              MNesia

           *This is databases existing solely in memory and
LC         being unable or never persisting to disk.
     A
         20
           10
                   High Availability
                           Node failover


                            Cassandra
                             HBase
                              Riak

          Otherwise, use one or more of: heartbeat, DBRD,
LC        filesystem replication, etc.
     A
         20
           10
                   Sustainable open
                source development is
                  code + community.
LC
     A
         20
           10
                      Code
                Development Model
     Core + modules    Monolithic   Infrastructure


            Drizzle     GT.M        Memcached
           LucidDB      Ingres        Redis
          PostgreSQL   CUBRID        Scalaris

LC
     A
         20
           10
                   Community
                Development Model
     Benevolent      Feature       Small
                                              A mix
      Dictator        driven       Group

                                              LucidDB
           Redis    Apache Derby   CouchDB
                                               Drizzle
          XtraDB       InfiniDB     MonetDB
                                                 H2
         MckoiDDB     SmallSQL       Riak
                                             PostgreSQL


LC
     A
         20
           10
                    Plans for the data

                • Attempt to update Wikipedia
                • Talk to people who write real surveys
                • Contacting more projects
                • http://ossdbsurvey.org
LC
     A
         20
           10
                The Future!




LC
     A
         20
           10
                          Protocols
                How client/server communication happens


                LucidDB, H2 -> PostgreSQL protocol
                Sphinx -> MySQL protocol
                Tokyo Cabinet / Tyrant -> memcached
                protocol

LC
     A
         20
           10
                          Verification

                • ‘memcapable’ certifies memcached
                  implementations
                • Need automated, repeatable tests for
                  complex systems (Cucumber?)
                • More people connections between projects
LC
     A
         20
           10
                      Databases.
                Talking to each other.

                  Thrift -> ThruDB
                 http://code.google.com/p/thrudb/
LC
     A
         20
           10
                            Thanks go to:
                •   Sheeri Cabral         •   Martin Kersten
                •   Josh Berkus           •   Robin Schumacher
                •   Brian Aker            •   Vadim Tkachenko
                •   Monty Taylor          •   Justin Sheehy
                •   Stewart Smith         •   Nicholas Goodman, John
                •   Mark Atwood               Sichi, Joseph A. di
                                              Paolantonio
                •   J Chris Anderson
                                          •   Jay Pipes
                •   Jan Lehnardt
                                          •   Tobias Downer
                •   Rick Hillegas
                                          •   Thomas Mueller
LC
     A
                •   Salvatore Sanfilippo
                                          •   Scott Deckelmann
         20
           10
                Questions?


LC
     A
         20
           10
                 This work by Selena Deckelmann is licensed under a
                Creative Commons Attribution-Noncommercial-Share
                          Alike 3.0 United States License.




LC
     A
         20
           10

								
To top