Docstoc

3 Case Studies of NoSQL and Java Apps in the Real World

Document Sample
3 Case Studies of NoSQL and Java Apps in the Real World Powered By Docstoc
					                                          Eugene Ciurana
 geecon@ciurana.eu - pr3d4t0r ##java, irc.freenode.net

  3 Case Studies of NoSQL
                       and
Java Apps in the Real World


          This presentation is available from:
            http://ciurana.eu/GeeCON-2011

                       Letʼs move the Java world!
                  About Eugene...
•   15+ years building mission-critical, high-
    availability systems
•   15+ years of Java work
•   Open source evangelist
•   MapReduce + Hadoop early adopter
•   VP of R&D at badoo.com - largest social
    network in Europe (120M subscribers
    worldwide!)

•   State of the art main line of business at
    the largest companies in the world - not
    a web guy!
                                   Letʼs move the Java world!
    Very Important!



Please Ask Questions!
        (don’t be shy)




                  Letʼs move the Java world!
               What Is NoSQL?
•   Database...
•   Horizontally scalable
•   Non-relational
•   Built-in application support
•   Custom file system designed for supporting NoSQL
    operations
•   Best for non-OLTP applications
    • Unstructured data
•   Lower cost than RDBMS



                              Letʼs move the Java world!
              NoSQL Topology
                                  Consumer




Node                    Node                     Node                     Node




                              Virtual File System
         logical table management, load balancing, garbage collection
                          (HDFS, GridFS, Hypertable)


   Tablet           Tablet                                        Tablet
  Server 0         Server 1                                      Server n


                            Distributed File System


       FS 0          FS 1            FS 2                          FS n




                                             Letʼs move the Java world!
             Areas of Application
•   Document storage and management
•   Object databases
•   Graph databases
•   Key/value stores
•   Eventually consistent key/value stores
•   Financial modeling
•   Click stream analytics
•   Simulations
•   Protein folding
•   Distributed sorting or grepping

                                 Letʼs move the Java world!
            Brewer’s CAP Theorem                         Relational
                                                         Key-Value
                                                      Column-Oriented
                                                     Document-Oriented

           Consistency                                                                             Availability
                         RDBMs (Oracle, MySQL), Aster Data, Green Plum, Vertica
              C                                                                                            A




                                                                                                            ,
              mo




                                                                                                           ra
                                                                                                           nd
              ng edi




                                                                                                        a
                oD s,




                                                                                                     ss
                 R




                                                                                                       a
                   B, Be




                                                                                                  ak C
                     Te rke




                                                                                                Ri I,
                                           Pick Any Two




                                                                                              B, , KA
                       rra ley
                          sto D




                                                                                            hD et
                             re B,




                                                                                          uc bin
                               ,D M




                                                                                       Co Ca
                                 ata em




                                                                                     B, yo
Pick any
                                    sto cac




                                                                                  leD Tok
                                       re he




                                                                               mp t,
                                         , H DB




                                                                             Si mor
  two!
                                            yp , S




                                                                                e
                                              er




                                                                            old
                                                tab cala




                                                                          ,V
                                                   le, ris




                                                                     mo
                                                      Hb




                                                                     na
                                                         as




                                                                    Dy




                                                                P
                                                          , e




                                                    Partition tolerance



                                                                    Letʼs move the Java world!
           Three NoSQL Systems
•   mongoDB
    • Horizontally scalable
    • Document-oriented database
    • No JOIN operations, no row level locking
•   GigaSpaces XAP
    • Data grid for replacing application servers
    • Event processing model
    • Front-end to various data stores (SQL and NoSQL)
•   Hadoop/Hive/HBase
    • MapReduce framework foundation
    • Optimized for fast search and retrieval
    • Batch model for indexing and processing

                                   Letʼs move the Java world!
                      mongoDB
•   Document-oriented storage
•   Querying via JavaScript or custom APIs for all major
    programming languages
•   In-place updates for atomicity
•   Any attribute in a document can be indexed
•   Built-in MapReduce
•   Built-in caching
•   BSON (“binary JSON”) document format




                                 Letʼs move the Java world!
                              mongoDB
                                  Consumer



                                                    fail-over



    mongoDB Server (master)                             mongoDB Server (slave)
mongod                                            mongod
Database                                          Database
daemon                                            daemon
                       mongos                                              mongos
                       Sharding                                            Sharding
                       daemon                                              daemon


            Data                                                 Data
           Storage                                              Storage




                                             Letʼs move the Java world!
               GigaSpaces XAP
•   Data persistence
•   Distributed processing
•   Caching
•   Multi-language support
•   NoSQL operations:
    • SQLQuery - SQL-like syntax
    • Persistency - RDBMS through wrapper
    • memcached
•   Task execution and marshalling



                              Letʼs move the Java world!
  GigaSpaces XAP
              Application Frameworks

   Java          C++               .Net           Groovy

   Mule         Spring             JEE              Jetty




                         XAP Deployment Virtualization
   XAP
Management
   and                   XAP Middleware Virtualization
 Monitoring
                          (Virtualized Clustering Layer)




  RDBMS            Memcache DB                   mongoDB




                                   Letʼs move the Java world!
              Hadoop and HBase
•   HDFS - distributed high performance file system
    •  Runs on top of ext3, HFS+, whatever
    •  Alternatives: AWS S3, CloudStore, others
•   MapReduce - framework for running jobs
    •  Java or anything that works with stdin, stdout
•   Chukwa - large log analysis framework (not very popular)
•   Hive - Data warehousing, ETL, and SQL-like language
•   HBase - Column-oriented NoSQL database
•   Pig - flat file data analysis



                                Letʼs move the Java world!
                          Hadoop and HBase


            Hive           Chukwa            PIG
ZooKeeper




                          MapReduce                             HBase



                                           HDFS                                        Sqoop




                   Disk             Disk           Disk         Disk




                                                          Letʼs move the Java world!
Case Study 1




       Letʼs move the Java world!
Case Study 1: Large FI Stock Trades
• Stock trading system is based on large commercial
    database
•   It can store only up to 4 weeks of trades
    •  Otherwise it’s too expensive
•   Inability to run long-term forecasting or trend analysis
•   Robust, Java-based
•   Mule-based - all messaging going through ESB
•   Message playback log




                                  Letʼs move the Java world!
Case Study 1: Large FI Stock Trades
• Syphon trades as they fly by through the ESB
  • Copy every trade to HDFS
• Use MapReduce to break the data down for analysis
• Commit initial analysis to HBase
• Run queries and further mine data through HBase and
    MapReduce
•   Data mining and presentation using WEKA
•   Forecasting accuracy increased by 11.3% in the first 180
    days of operation for commodity markets



                                 Letʼs move the Java world!
Case Study 2




       Letʼs move the Java world!
                                                 Large SaaS                        Browser
                                                                                   RSS
                                                                                                                Various services
                                                                                                                providers throughout
                                                 Service                                             Service
                                                                   End Users       Outlook                      the Internet. Some
                                                Consumers                          CWS              Providers   are public, some are
                                                                                   EWS                          partners



Legend
                        Heavy web services
                        Some XML, some custom
HTTP
                             Internal                                                                              Netezza
SOAP
                             Service                                                                               Lucene
Custom RPC
ODBC/JDBC
                            Providers                                                 query
Direct/API                                                                            reply
                                                                                                   Search

                            Rich Docs
                             (GridFS)

   Static                                                           Main App
   Files
    (S3)
             Firewall




                                                                                                                       update
                              CRM
                                                            Client Relationships
                                                                    App                             Queue


                             Internal
                            End Users

 End Users
                                                                                                 Custom
                                                               Dispatcher
                                                                                              Queuing System

 Service
Consumers

                                                                                                Reporting




                                                                         Letʼs move the Java world!
                                            Large SaaS                Various services
                                                                      providers throughout
                                                                                                                       Browser
                                                                                                                       RSS
                                     Service            Service       the Internet. Some                End Users      Outlook
                                    Consumers          Providers      are public, some are                             CWS
                                                                      partners                                         EWS




             Cloud Firewall

     New
                                                                                                                                 Static
    System
                                                                                                                                 Files
  Acquisition            Internal                                                            Tomcat App Container                 (S3)
  (.Net, PHP,            Service
     etc.)              Providers                                                                 Main App
                                                                                                 Client Relations
                                                                                               (zone instance)
                                                                                                 (Zone Manager)
                                                                                                     Dispatcher
                                                                                                        New Apps




                                                                                                                                           m
                                                                                                                                           e
                         Mule ESB Container: Services, Message Routing, and Transformations
                                                                                                                                           m
                                      Client
                     Other New                     Dispatcher        Main App                OpenMQ           cron                         c
                                     Relations
                      Services                      Services         Services                               Services                       a
                                     Services
                                                                                                                                           c
                                                                                                                                           h
                                                                                                                                           e
                                                                                                                                           d
                   Local
                    DBs,                Rich             Reporting                       Search
                   Other             Documents
                  Resource            (GridFS)

                                                      Corporate Firewall

Enterprise
 Services                                                                                                                                  End
                                                                                                           Databases                      Users

 Legend: HTTP | Web services (SOAP, REST, JMS, other) | JDBC | Direct/API/Any



                                                                              Letʼs move the Java world!
                                         Large SaaS
                                                                                                              Static
                           External
                                                                               Tomcat App Container           Files
                          Service or
                                                                                                               (S3)
                          Consumer                                                   Main App
                                                                                  (zone instance)
                                                                                    Client Relations
                                                    Internal                        (Zone Manager)
                                                                                        Dispatcher
                                                    Services
                                                                                          New Apps



                                                                                                                        m
                                                                                                                        e
                        Mule ESB Container: Services, Message Routing, and Transformations                              m
                                    Client
                                                                                                                        c
                   Other New                   Dispatcher        Main App      OpenMQ             cron                  a
                                   Relations
                    Services                    Services         Services                       Services                c
                                   Services
                                                                                                                        h
                                                                                                                        e
                                                                                                                        d




                                                     Reporting               Search
                                      Rich
                                                       Pig                    Hive
                                   Documents                                                     Databases
                                    (GridFS)


HDFS, GridFS, Data Warehouse                          Hadoop, DB cluster,
                                                     computational network
                                                                                        Cloud-based MapReduce/NoSQL
                                                                                      Infrastructure - expand and contract
                                                                                               capacity as-needed




                                                                      Letʼs move the Java world!
Case Study 3




       Letʼs move the Java world!
                          SOBA Labs
                                        Ubuntu
                                       Landscape


                                              REST SOBA interface - implementation is transparent to caller!
                                              http://soba.myserver.com/manage/resource


                                                                           Other
           sobaDB                     sobaEngine
                                                                         Consumer
         192.168.0.42                  localhost
                                                                        192.168.0.42

                                                                     REST SOBA interface


               EC2 web services API                        Xen XML-RPC API




           Amazon EC2                                                    Xen Host



                                          F                                                  SOBA Agent
                                          i                      Xen Python
                                          r
                                          e
                                          w
                                          a
                                          l
                                          l            Oracle
End-user App        End-user App                      vm_uuid:
ami-322ec65b        ami-322ec65b
                                                                              SOBA Python
                                                      b220c8db




                                                    Letʼs move the Java world!
                                                SOBA Labs
                                          CANONICAL Landscape
                                                                                                              Other Application
                                                                                                               easy integration!




                                                services
                                                  web




                                                                                                                        services
                                                                                                                          web
                                                              JSON
             Config                                                                     R                 R     JSON                            dict
              Data                                                                     E                 E
           (Puppet?)                                                                   S                 S
                                                                                       T                 T
                                                                 Mule-based SOBA Engine                                              SOBA
 SOBA                                                      abstracts provisioning, configuration, and                                 Engine
                                                               monitoring through web services                                                   dict         Native Application
  Data                                                                                                                                                          easy integration!
mongoDB                                                                                                                              Python
                                                           Java and Python Web Services Interface                                     API      Python
                                                                                           R
                                                                                           E
                                                                                               JSON



                                                               Xen XML-RPC API
                                                                                           S
                                      XML
                                                                                           T                            JSON
                                    EC2 Query
                   services API
                    EC2 web




                                                                                           SOBA Agent
                                                                                                                              R
                                                                                                                              E                         DRY Interface
                                                                                 XML                                          S
                                                                                                  dict       Python
                                                                                                                              T
                                                                                                                                                        Don't Repeat Yourself!

                                                                                                                          Rackspace                     Provisioning, configuration
EC2 Data         amazon EC2 API                                     Xen Server API                                                                      or monitoring via SOBA is the
                                                                                                                       Cloud Servers API
                                                                                                                                                        same regardless of target:
                                                                                                                                                        Same API call, same data
                                                                                                                                                        payload, same data
                                                                                                                                                        format, etc.
                       Ubuntu                                                    Ubuntu                                            Ubuntu
                       Server                                                    Server                                            Server               Implementation is
                  puppet          SOBA                                 puppet          Ensemble                            puppet      SOBA
                  facter          Agent                                facter           Agent                              facter      Agent            abstracted from the
                                                                                                                                                        caller!




                                                                                                             Letʼs move the Java world!
Plug - Know Any High Caliber Coders?
•   badoo.com is hiring!
    •   Top talent - we’re very demanding
•   PHP, MySQL developers and sr. developers
•   Java with a Business Intelligence twist for Pentaho and Hadoop
•   Mobile: Android, iOS, Blackberry, WAP, JME
•   QA sr. lead - highly technical, web, web services, and mobile
•   €2,000 referral bonus for you if we hire your friend!
    •   Paid 90 days after hiring (trial period ends)
•   If your friend can legally work in Russia or the UK, but doesn’t live in
    Moscow or London, we’ll work out relocation
•   Contact: geecon@ciurana.eu
•   Contact: jobs@corp.badoo.com

                                         Letʼs move the Java world!
                                        Eugene Ciurana
geecon@ciurana.eu - pr3d4t0r ##java, irc.freenode.net
                        http://ciurana.eu/scalablesystems




                               Q&A
                        Comments?
                       Anything else?
         This presentation is available from:
          http://ciurana.eu/GeeCON-2011
                           Twitter: ciurana
                     Letʼs move the Java world!

				
DOCUMENT INFO
Shared By:
Tags: NoSQL
Stats:
views:20
posted:9/26/2011
language:English
pages:26
Description: NoSQL, refers to a non-relational database. With the rise of the Internet web2.0 site, the traditional relational database in dealing with web2.0 site, especially the large scale and high concurrent SNS type of web2.0 pure dynamic website has appeared to be inadequate, exposes a lot of difficult problems to overcome, rather than the relational database is characterized by its own has been very rapid development.