Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

What Sho ould I Do_ I Do_

VIEWS: 27 PAGES: 52

NoSQL, refers to a non-relational database. With the rise of the Internet web2.0 site, the traditional relational database in dealing with web2.0 site, especially the large scale and high concurrent SNS type of web2.0 pure dynamic website has appeared to be inadequate, exposes a lot of difficult problems to overcome, rather than the relational database is characterized by its own has been very rapid development.

More Info
									        ould I Do?
 What Sho




              NoSQL or Both for
Choosing SQL, N
             eb pp
   Scalable We Applications

                d
             Todd Hoff
    http://highscalability.com
    h    //hi h   l bili
                        Welc
                        W lcome
• What Should I Do? Choos  sing SQL, NoSQL or Both
  for Scalable Web Applications
                             calability.com
  – Speaker: Todd Hoff, HighSc
  – Sponsor: VoltDB, Inc.
  Housekeeping
• H    k   i
  –                               e
      Type your questions into the box in the webinar console
  –   The    bi      ill be     d for l b k
      Th webinar will b recorded f playback
  –                               ble
      Survey report will be availab in about a week
  –   Follow voltdb on Twitter for post-webinar updates
                     g
It's All About Solving Problems
• Let's set the stage. Imagine a programming
  team sitting in a conference ro oom,
                                  oom a gray glow
                                   r
  of laptop light illuminating their faces. On the white board is
  a problem to be solved. That's where it all starts. Figuring
  out how to solve a problem.
                                   ngs
• We all are just trying figure thin out and make things work.
• Then why is there is so much S   SQL on NoSQL hate?
• Everything is wonderful and no   obody is happy.
    y                 y       g pro
• Systems evolve by solving p oblems. Some of those
                                   s
  solutions require discontinuous jumps into the unknown
                                   all
  territory and some require sma continuous steps on a well
  known path.
  k          th
• Organizing the world's informa  ation, being the heartbeat of
                                   d's
                                   d
  the Internet, or being the world s social network all drive you
  to different places. Your appliccation has its unique place too.
                   Changing
Problem Solving is C
                                  tributed systems).
• It's not just about scaling (dist
            that s                re
  Though that's part of it. We ar talking architecture.
                                 s
  It's about how we build things now. It's not SQL vs.
  NoSQL. This is the main poin   nt.
  In th              Ch       OS is thi ki the
• I the same way ChromeOS i rethinking th personal            l
                                   ave
  computer post browser, we ha been rethinking
  architectures post cloud + No            social etc.
                                 oSQL + social, etc
• Choose simplicity, rapid deve  elopment, consistency,
  availability, ACID, latency, sca ale-out, distribution, cost,
  operations, elasticity, queryab bility, manageability,
                                   w
  navigability, data model fit, low cost, and so on. Options
                                 we
  we never had or even knew w had before   before.
                                  e's
• Forget SQL vs. NoSQL, there a spectrum of different
    p                      p      e            g
  options and as developers we need to figure which ones
  to use in which combinations to solve our problems.
                 ections
James Burke: Conne
        Connections exp plores an “Alternative View of
               g (the sub
        Change” (                             )
                         btitle of the series) that
        rejects the conveentional linear and
                        w
        teleological view of historical progress. Burke
                        ne
        contends that on cannot consider the
                         any
        development of a particular piece of the
           ode     o d     so at o     at e , the entire
        modern world in isolation. Rather, t e e t e
                        odern world is the result of a
        gestalt of the mo
        web of interconnnected events, each one
              i ti   f
        consisting of a pe                    ti for
                          erson or group acting f
                         own (e.g., profit, curiosity,
        reasons of their o
        religious) motivations with no concept of the
                         sult
        final, modern res of what either their or
        their contempora aries’ actions finally led to.
Architectures of the Past

• Any system is like an archeolo ogical dig.
                                  ,
   o Mainframe, Minicomputers, Workstations, PCs
   o Integrated Data Store, relat tional, embedded, client-server
     Canonical architecture: CDN l d b l
   o C      i l      hit t                               b tier,
                                   N, load balancers, web ti
                                  er,
     application tier, database tie storage tier.
             fixed, static,
   o Mostly fixed static monolithichic.
• When the web was largely rea   ad-only we could scale-up or
  replicate, pool identical databaases, keep caches.
• When the web went real-time and interactive this broke
  down. Too large for one mach   hine & distributed transactions.
                                  ase         everything.
• We made the relational databa tier do everything
                                 d
• When it couldn't, we extended it Borg-like. Adding
                          g
  memcached, sharding, key-va                              y g
                              y alue, custom consistency logic,
  moving all logic into apps, avooiding joins, etc..
                    Idea
The Idea of the Big I
                       Copernicus's disco overy of the heliocentric model of the
                       solar system was p               1543
                                           published in 1543, and it started a fire
                       storm of scientific invention (distinction between electricity
                       and magnetism; law of free fall; Galilean inertia; theory of
                                            w
                       lenses; laws of planetary motion, etc.), not because of the
                       discovery itself, but because it spread the idea that we
                                           d
                       puny humans could think and make big discoveries about
                                          g       g             y
                       the universe using nothing but our tiny brains.

                       This was a brave nnew thought. A big idea. Copernicus
                       gave people permission to tackle big challenges and the
                                         ey
                       confidence that the could expect to meet them.




From It Started with
Copernicus by Howard
Margolis
Architectures of the Near Now
• Big Idea: Some brave souls st     tarted writing radical,
       i li d l ti         to l specialized problems, lik
  specialized solutions t solve s         i li d     bl     like
  dealing with massive scale, ma    assive distribution, massive
  concurrency,              users. Google Inktomi, Amazon etc
  concurrency massive users G      Google, Inktomi Amazon, etc.
• Thoughts from the deep bubbled up. CAP. Sharding. Scale-
  out. Commodity hardware. Par      rtitioning. ACID is bad
  for availability. Building highly available systems is different.
  We can play with consistency, availability, and partition
  tolerances.
  tolerances
                                   and
• Developers took these ideas a built scalable architectures
                     g
  on top of existing software. Bru  utal. Facebook, FriendFeed,
                                   all
  Flickr, Salesforce, eBay, etc. a did this.
                                   t's
• These were simply too hard. It still easier to scale-up and
  stick with a relation database c core.
                                   core
Architectures of the Now
                            ere
        • Architectures we still either/or. You built for
                            n't
                            n
          scale or you didn t.
        • That's changing now with a pleasing number
          of new products that want to help bridge the
                   p                         p      g
          chasm.
                            nsition and we still have
        • It's a time of tran
          everything all jum                    once
                            mbled together at once. That's
          why it's so dang hard to make a decision.
           t                aling took
                            a              uc
        • It used to be sca g too so much more workoe o
          it wasn't worth it. Now with the new tool
          chains it's becom ming equal. Not quite, but
                d              di in that direction, b t it'
          vendors are speeding i th t di ti           but it's
          still not a no-brainer, so the confusion
          remains.
          remains
Architectures of the Future
• William Gibson: The future is h here -- it's just not
  evenly distributed yet. True of today. Were not there yet.
• The Next Big Idea: all these o options are on a sliding scale
  and you can choose what you want based on the qualities
     d features you need in your system. SQL vs. NoSQL i
  and f t                 di           t                N SQL is
  an illusion. Think of the space as spectrum, where you
  make choices based on requir   rements
                                 rements.
• Some of your world may be as Eric Brewer
                                 ous
  says: partitioned + asynchrono which implies an
  architecture that's weakly cons sistent + delayed exceptions +
  compensation. Get charged tw   wice and your account is
  credited.                       e
  credited Overbook an airplane and compensate the
                                  on
  passengers. Duplicate detectio is the delayed exception.
             y
• Some of your world may be Ay   ACID where transactions
                                 ased on your application.
  matter. All options are valid ba
                    glot
The Future is...Polyg
           When asked if Facebook intended to
            t d di      n    i l database
           standardize on a single d t b
                        book's Director of Engineering
           platform Faceb
                        orth responded...
           Andrew Boswo responded

           For the time beeing, the company intends to
           use separate p platforms for separate tasks.
                          k's
           With Facebook technology stack in general,
           we ve           ed
           we've really trie to use the right technology
                         m
           for the problem we're solving. You can get
           into trouble ove               g
                           er-standardizing the
           technology.
                    d
The Future is...Cloud
• Let's assume for a moment you can't build and run
  your own datacenter...
• The place you can plug your polyglot systems into and
  expect low operations cost, elaastic resource use, advanced
      lti d t   t     d       it    f t t            d d
  multi-datacenter and security infrastructure, and advanced d
  scalable services, no-brainer eease of use is...the cloud.
                                  ge
• Quality & power/bandwidth/cag costs better in colo. colo
• Look for products that can dyn namically scale up and down
  automatically. Traditional databbases do not work this way
                                 for
  and is a major leverage point f small teams developing
  big systems.
• Allows deferring capacity plann ning by using a scalable
  architecture and scalable softwware from the start.
                      p
• Clouds can scale-up and scale  e-out.
• Should work across multiple av  vailability zones.
                    ice
The Future is...Servi Oriented

  It s
• It's not all wonderful. Integrating systems across
   transaction boundaries is a p  problem. Manually queue,
                                  but
  retry, eventually consistent, b not great.
  For      t ti            it     e        i
• F protection many sites use a service oriented i t d
                                  sh,
  architecture: Amazon, Playfis Twitter, The Case Against
  ORM Google App Engine
  ORM,
• After HTTP terminates, all ap   pplications tend to look a like.
                                  nd
  A big change from the two an three tier days.
• Loose coupling technology de     ependencies don't leak
  through and services can be developed, managed,
  scaled deployed and tuned independently Creates
  scaled, deployed,                independently.
  separate failure domains.
      g
• Organize y                  y ms
               your internal system to be automated, service-
  driven and API-driven.
                 en
Services Have Take Over
the World
• Google services: OAuth, User Service, Calendar,
  Map, Contacts, Document Han   ndling, Videos, Photo,
  Spreadsheets, Mail, Data Mining, etc.
  Google App Engine services: M
• G     l A E i             i            h        Fetch, Mail,
                                Memcache, URL F t h M il
  XMPP, Images, Google Accou    unts, Task Queues, Blobstore,
  Channel
• Amazon Web Services: EC2, E   EMR, Auto Scaling, Cloud
  Front, SimpleDB, RDS, FWS, SWS, SNS, CloudWatch,
  DNS, VPM, ELB, DevPay, S3, EBS, Mechanical Turk
                                 list
• ProgrammableWeb has a big l of APIs. Twitter, Facebook,
  Queuing, Lucene, Solr,
  Queuing Lucene Solr Simple    eGeo Twilio, Flickr Foursqure
                                eGeo, Twilio Flickr,
• Use EC2 for videos instead of wedging it into GAE.
         g           y y      p ing your application from other
• Building scalability by composi g y      pp
  scalable services. This is how it works now.
                    Scale
But I Don't Need to S

 • Most common reason for sta                           I ll
                             anding pat is saying that "I'll
   never need to scale so why bother? We aren't Twitter or
   Facebook or Google after all."
 • From Tumblr:
Frankly, keeping up with growth has pre esented more work than our small
team was prepared for — with traffic now climbing more than 500M
pageviews each month. But we are dete   ermined and focused on bringing our
infrastructure well ahead of capacity as quickly as possible. We’ve nearly
quadrupled our engineering team this m month alone, and continue to distribute
   d h                hit t    to be    re    ili t to failures lik t d ’
and enhance our architecture t b mor resilient t f il           like today’s.
                              eed
 • What happens when you ne to cross the scalability
   chasm? Do you want to com  mpletely change your
                               something that was meant to
   architecture or evolve from s
   scale?
                  eople's Scale
Leveraging Other Pe
(LOPS)
  Social h                 thi . Common b i
• S i l changes everything. C                            t t
                                            business strategy
                                ople's scale in the form of
  is to leverage off of other peo
  social networks or data feeds.s
• App stores. Distribution has totally changed. Your app
  can be placed immediately in  n-front of millions of people. I
  remember schlepping shrink wrapped software around to
  conferences.
  Fame.
• Fame One of my most read posts had absolutely nothing
  to do with me. I used "Kevin R Rose" in the title. They came.
                                y
                                ys        people
• Population. There are alway more p p and things          g
  being added to the potential u user base.
                                 o
• Zynga and Playfish needed to scale to hundreds of millions
                                 y       LOPSing.
  of users quickly because they were LOPSing
                  eople's Data
Leveraging Other Pe
(LOPD)
                          ing
         • Crowdsourci as a source of
           scale. Letting users "help you" by adding
                         a y        y
           their own data to your system can q       y
                                               quickly
           turn into a shoockingly massive load.
                          3,000 photos uploaded every
            o Flickr with 3
                i t
               minute
            o Facebook a  adds 12 terabytes per day
                         ds
            o Twitter add 7 terabytes a day
         • Freemium bu   usiness model. When you give
                         k
           away the milk a lot of people will want
             ilk h k
           milkshakes.
                  odel of Systems
The 3 Big Bucket Mo
• Previously the expensive relat  tional database was
                      everything. We're
                                  We
  tasked with doing everything W re now seeing
  people move away from the re   elational database
  as the central datastore of recoord.
• Dwight Merriman, 10gen CEO of MongoDB fame,
  thinks there will be 3 big bucke of systems:
                                  ets
   1.Analytics
   1 Analytics Processing - com                 ad-
                                  mplex offline ad
     hoc reporting
                   p              nal
   2.OLTP - complex transaction semantics
   3.NoSQL - mostly online proc  cessing, agile, high
     performance, horizontally sc calable.
  No           d t is best t ll h
• N one product i b t at all th               t
                                 hree, so systems
  will tend to divide up this way. Makes sense.
Main Data Models Adapted from Emil Eifrem NoSQL databases.         .

Document Databases                          y-Value
                                            y
                                         Key Value Stores
Lineage: Inspired by Lotus Notes.        Lineage: Amazon's Dynamo paper and
Data model: Collections of documents,      stributed HashTables.
                                         Dis
which contain key-value collections.
                 y                       Data model: A global collection of KV p
                                                       g                       pairs.
Example: CouchDB, MongoDB                Exaample: Membase, Riak

Graph Databases                            gTable Clones
                                         Big
                         theory.
Lineage: Euler and graph theory                                      paper
                                         Lineage: Google's BigTable paper.
Data model: Nodes & relationships,       Data model: Column family, i.e. a tabular model
both which can hold key-value pairs      where each row at least in theory can have an
Example: AllegroGraph, InfoGrid, Neo4j
   a pe     eg oG ap , oG d, eo j        individual co gu a o o co u s
                                           d dua configuration of columns.
                                         Exaample: HBase, Hypertable, Cassandra
Relational Databases
Lineage: E. F. Codd in A Relational      Data Structure Servers
Model f D t f L
M d l of Data for Large Shared D t
                         Sh d Data       Li eage: ?
                                         Lin
Banks                                    Exaample: Redis
Data Model: a set of relations           Data model: Operations over dictionaries, lists,
Example: VoltDB, Clustrix, MySQL            ts
                                         set and string values.

Object Oriented Databases                   id
                                         Gri Databases
Lineage: Graph Database Research         Lineage: Data Grid and Tuple Space research.
D t Model: Obj t
Data M d l Objects                       Data M d l Space B
                                         D t Model: S            d Architecture
                                                            Based A hit t
Example: Objectivity, Gemstone           Exaample: GigaSpaces, Coherence
                 ood
Data Models are Go at?
                                 al
• Document Databases: Natura data modeling.
  Programmer friendly. Web frie  endly. CRUD.
                                 ze
• Key-value Stores: Handles siz well. Processing a constant
  stream of small reads and writes. Fast. Programmer
  friendly.
  f i dl
                                 ze
• BigTable Clones: Handles siz well. Stream massive write
  loads.      availability.      ple
                                 ple-data centers. MapReduce.
  loads High availability Multip data centers MapReduce
• Relational Databases: High pe   erforming, scalable OLTP.
                                  ws.
  SQL access. Materialized view Transactions matter.
  Programmer friendly transactions.
                                 ky
• Data Structure Servers: Quirk stuff you never thought of
                         before.
  using a database for before
• Graph Databases: Rock comp     plicated graph problems. Fast.
                     g perform
• Grid Databases: High p        mance and scalable transaction
  processing.
                 ecisions
Use Cases Drive De
 If your application needs...
• complex transactions because you ca afford to lose
                                           an't
                                          nsaction programming model then look
    data or if you would like a simple tran
    at a Relational or Grid database.
                                          at
     o Example: an inventory system tha might want full ACID. I was very
              ppy             g
        unhappy when I bought a p                  y               y
                                   product and they said later they were out of
                                          ted
        stock. I did not want a compensat transaction. I wanted my item!
• to scale then NoSQL or SQL can work. Look for systems that support
                 p         g
    scale-out, partitioning, live addition and removal of machines, load
    balancing, automatic sharding and rebalancing, and fault tolerance.
                                          ase
• to always be able to write to a databa because you need high availability
    then look at Bigtable Clones which fe eature eventual consistency.
                                          ads
• to handle lots of small continuous rea and writes, that may be volatile,
                                           or
    then look at Document or Key-value o databases offering fast in-memory
    access.
                                           ns
• to implement social network operation then you first may want a Graph
    database or second, a database like Riak that supports relationships. An
    in- memory relational database with s simple SQL joins might suffice for
         ll data t R di '        t d list       ti      ld     k too.
    small d t sets. Redis's set and li t operations could work t
Use Cases...2
 If your application needs...
• to operate over a wide variety of acc  cess patterns and
                                            database,
    data types then look at a Document database they generally are flexible
    and perform well.
• powerful offline reporting with large d  datasets then look at Hadoop first
          second
    and second, products that support M  MapReduce.
                                         MapReduce Supporting MapReduce
    isn't the same as being good at it.
• to span multiple data-centers then lo at Bigtable Clones and other
                                         ook
                                          on
    products that offer a distributed optio that can handle the
                                         rant.
    long latencies and are partition toler
• to build CRUD apps then look at a D    Document database, they make it
                                         ut joins
    easy to access complex data withou joins.
• built-in search then look at Riak.
• to operate on data structures like list sets, queues, publish-
                                           ts,
                            Redis.        l               locking,
    subscribe then look at Redis Useful for distributed locking capped logslogs,
    and a lot more.
• programmer friendliness in the form of programmer friendly data types
         JSON, HTTP, REST                 t
    like JSON HTTP REST, Javascript then first look at Document
    databases and then Key-value Data    abases.
Use Cases...3
 If your application needs...
• transactions combined with materialized views for real-time
                            VoltDB.          or data-rollups          windowing.
    data feeds then look at VoltDB Great fo data rollups and time windowing
• enterprise level support and SLAs then look for a product that makes a point
    of catering to that market. Membase is an example.
                                            may
• to log continuous streams of data that m have no
                                            all
    consistency guarantees necessary at a then look at Bigtable Clones because
    they generally work on distributed file systems that can handle a lot of writes.
• to be as simple as possible to operate t   then look for a hosted or PaaS solution
    because they will do all the work for youu.
                                             en
• to be sold to enterprise customers the consider a Relational Database
                  y
    because they are used to relational te           gy
                                            echnology.
• to dynamically build relationships betw    ween objects that have dynamic
    properties then consider a Graph Dat     tabase because often they will not
     equ e sc e a and ode s can e built c e e a y oug
    require a schema a d models ca be bu incrementally through
    programming.
• to support large media then look stora services like S3. NoSQL systems
                                             age
                                            ugh
    tend not to handle large BLOBS, thou MongoDB has a file service.
Use Cases...4
 If your application needs...
• to bulk upload lots of data q
              p                         y and         y
                                  quickly a efficiently then
    look for a product supports that sce  enario. Most will not because they
    don't support bulk operations.
• an easier upgrade p
                  pg      path then use a fluid schema system like a
                                                         y
    Document Database or a Key-value Database because it supports
    optional fields, adding fields, and field deletions without the need to
                                 g
    build an entire schema migration fr   ramework.
                                           hen
• to implement integrity constraints th pick a database that support
                                          ed
    SQL DDL, implement them in store procedures, or implement them in
      pp
    application code.
                                         aph
• a very deep join depth the use a Gra Database because they support
    blisteringly fast navigation between eentities.
• to move behavior close to the data so the data doesn't have to be moved
    over the network then look at stored procedures of one kind or another.
                                           rid,
    These can be found in Relational, Gr Document, and even Key-value
    databases.
Use Cases...5
  If your application needs...
                                                    Key-value
 • to cache or store BLOB data then look at a Key value
     store. Caching can for bits of web pages, or to save complex objects
                                        lational database, to reduce latency,
     that were expensive to join in a rel
     and so onon.
 • a proven track record like not corru  upting data and just generally
     working then pick an established p product and when you hit scaling (or
                                        on                  (scale up tuning,
     other issues) use on of the commo workarounds (scale-up, tuning
     memcached, sharding, denormaliz    zation, etc.).
                                        a
 • fluid data types because your data isn't tabular in nature, or requires a
                        columns,        s              structure,
     flexible number of columns or has a complex structure or varies by
     user (or whatever), then look at Doocument, Key-value, and Bigtable
                                        of
     Clone databases. Each has a lot o flexibility in their data types.
                                                                    don t
 • other business units to run quick relational queries so you don't have
                                       se
     to reimplement everything then us a database that supports SQL.
 • to operate in the cloud and automa    atically take full advantage of cloud
                                        e yet.
     features then we may not be there yet
Use Cases...6
 If your application needs...
                                          you
• support for secondary indexes so y can look up data by different
                                           ses
    keys then look at relational databas and Cassandra's new
    secondary index support.
                                           a
• creates an ever-growing set of data that rarely gets accessed then
    look at Bigtable Clone which will sp  pread the data over a distributed file
    system.
• to integrate with other services then c   check if the database provides some
                                           so
    sort of write-behind syncing feature s you can capture database changes
    and feed them into other systems to ensure consistency.
• fault tolerance check how durable wr     rites are in the face power failures,
    partitions, and other failure scenarioss.
                                           n
• to push the technological envelope in a direction nobody seems to be
       i then build
    going th b ild it yourself b           th
                                           that's h t t k t b
                                lf because t t' what it takes to be great  t
    sometimes.
                  pplication use?
What should your ap
                                   our
• Key point is to rethink how yo application could work
  differently in terms of the diffeerent data models and the
  different products. Right data model for the right problem.
                                   elp
• To see what models might he your application take a look
    t What Th H k A You Actually Using NoSQL For? In
  at Wh t The Heck Are Y Act ll U i N SQL F ? I
                                    her
  this article I tried to pull togeth a lot of use cases of the
  different qualities and features developers have used in
  building systems.
                                   with
• Match what you need to do w these use cases. From
                                    e
  there you can backtrack to the products you may want to
  include in your architecture. N  NoSQL, SQL, it doesn't matter.
                                   ct
• Look at Data Model + Produc Features + Your
                                    h
  Situation. Products have such different feature sets it's
  almost impossible to recommend by pure data model alone.
              p                           yp
• Which option is best is determ   mined by your priorities.
                  thBusters
Experiment Like Myt
• Every feature and product is l  like a myth.
• It must be proven through exp   periment, thought,
                                  periment thought and data data.
• Don't scramble to implement something in a production
                                  t
  environment. Figure out what you need to do. Decide by
  doing some prototypes. Test. Evaluate. See which
  solutions fit your architecture..
• MythBusters either plans elab                          it They ll
                                  borately or just does it. They'll
                                  ,
  build scale models, mockups, elaborate props, talk to
     p     ,          ,         gh p g
  experts, research, run through a progression.
   o In the end every myth is: busted, plausible, confirmed
                                  -scale, it's another to do it
   o It's one thing to do it small-
     large-scale.
     l          l
   o It wouldn't be MythBusters if it worked the first time.
   o When we experiment and t     things fail we start to ask why
     and that's when we learn.
                 to
Some Experiments t Try
       • Slice off part of a sservice that may need better
         performance or sc  calability onto its own system. For
                 l the       r login b t
         example, th user l i subsystem may need t b       d to be
         high performance and this feature could use a
         dedicated service to meet those goals.g
       • Turn one of your f  features into a service. Proceed
         one-by-one until d done.
       • Take a feature on your schedule and implement it
                            ack.
         with a different sta
       • Think about how y   your code might be refactored if it
                            ng                          products.
         was rewritten usin features from various products
       • Project out your next bottleneck or pain point and
         think how it might be solved differently.
       • Will a two tier appproach work? Low latency data is
         served through a f   fast interface, but the data itself
         can be calculated and updated by high
                                     p         y g
         latency apps.
                 aster
Make the Choice Fa



If the previous slides send you into analysis paralysis, I
                                  rom a Andrei, a commenter
really like this as an antidote, fr
on one of my posts:

   y        going               th
If you keep g g back and fort between upsides and
                                 ll
downsides of each choice you'l waste a lot of time for
nothing. Start working with one solution (SQL, NoSQL, or
                                nd you ll
both) faster rather than later an you'll move towards the
best alternative in time as you ssolve your problems. It's a
p
process!
                  s
Scale-Up as Long as You Can
                        s
        • Best examples are StackOverflow and
          PlentyOfFish.
        • Though there are a lot of reasons to go
          NoSQL other t than scale, it's still easier to
                      hine. A relational d t b
          use one machi          l ti     l database
          can scale amaazingly on the mega-
          hardware we h      today.
                       have today
                         ta
        • When your dat needs won't fit on a single
          machine anymmore then you have choices
                        t
          to make about how to span machines.
        • At some point the point of staying with
                       e
          what you have is greater than the pain of
          learning something new.
                  Make Excuses for
You Don't Have to M
                  g
Choosing Something Different
  RDBMSs           the d f lt      to l database problems.
• RDBMS are th default way t solve d t b                 bl
                                 ou
  If you do anything different yo must pass a quiz of the 99
  things you must have tried to get your RDBMS to
  scale. And you can never pas the quiz.
                                 ss
• Hitting a limit is seen as your failure. You don't have skillz.
  Is your schema correct? Deno   ormalized, but not too much?
  Queries optimized? Indexes o   optimized? Did you hire a
  DBA? Did you size your hardw    ware properly? Use a better
  database?
               g problems can be solved with money that y
• Most scaling p                 e                    y     you
  may not have.
                                 ,
• Do your homework, run tests, pick what you want, and
  have a plan B B.
                   Feeling Guilt Over
Free Yourself from F
Using What You Kno ow
       • Just because Fac  cebook, Google, Twitter, etc.
         do something doe  esn't mean you need to, too.
                          ese
       • The people in the organization are just
                           solve a problem, with certain
         people, trying to s
          esou ces, ce a equ e e s, and certain
         resources, certain requirements, a d ce a
         quirks.
       • They may know s   something you don't, but then
             i      b they don't. Y          i
         again maybe th y d 't Your experience,
         research, and knowledge is just as valid.
                 what s ol ol.
       • Forget what's coo Focus on the end product
                          ver
         and how to deliv it and keep on delivering it.
Go With the Strengths of Your Team
Michael Westen, Burn Notice:

Special forces squads are built aaround the skills of the
                                tter
individual members. But no mat how good each member of
the      d is,       i i
th squad i every mission com d             to
                               mes down t one thi         how
                                                   thing: h
                                se
well they work together. Becaus in the end you don't need a
                                         team.
hero to succeed in the field you need a team

• You have to go with the strenngths of your team unless you
  are prepared for a transition period. If they know a
                               o
  technology it may be safer to go with that. Manage risk.
                               on't
                               on            NoSQL.
  This is a big reason some do t go with NoSQL
• One team that started with EErlang and moved to Java so
     y            programmers Think about those
  they could find p g          s.
  scenarios.
                  aking Tradeoffs
Scaling Requires Ma
                                  may
• With scalable systems you m notice that you can't open
        transaction,
  up a transaction update 10 d               tables,
                                   different tables hundreds of
                                   ust
  records, and expect it all to ju work. Not that simple.
                                   se
• Nope, like a good poem, thes systems require some
  constraints be followed so they can operate at scale.
                                   es
• In GAE, you have task queue for long running jobs,
                                  es
  numerous quotas and querie can only be so expressive expressive.
• In KV stores you can only up    pdate one K in a transaction.
         p                  pp secondary indexes.
• Most products don't support s               y
                                   to
• Availability may require you t implement consistency with
                                  ng
  read-repair and compensatin transactions.
                                   eed               correctly.
• In relational systems you'll ne to partition correctly
                                  e
• These are all part of it. There are tradeoffs.
             g ,                   hard.
• You are right, this is still too h
                        e
              If You Love New York
                        East
              Take I-30 E
• From Why We Make Mistakes. This was a bumper sticker
  seen in Texas.
• The meaning is when people u                   changes,
                                 undergo major changes like
  moving, one their biggest mista akes is not changing how they
  use their time.
• In other words, if you move to Texas learn to enjoy the
  things Texas has to offer. Don't move there expecting to find
         t bagel as you would i NY or great beaches as you
  a great b    l            ld in NY,         tb    h
  would in L.A.
• Learn to love the rodeo or the Dallas Cowboys or the vast
  open spaces of Texas--or else you will be miserable.
                                 e
• Same applies when switching database. Really learn how
  these things work and change to make the best of them.
Related Articles

                           nce
      Please see the referen list at the end of
  What The Heck Are You Ac ctually Using NoSQL For?
Any Questions?
Support Slides
                                o
We won't talk about these, but you may find them useful.
Where are you starting from?

  Greenfield application?
• G      fi ld      li ti ?
                                 d
• In the middle of a project and worried about
  hitting bottlenecks?
• Worried about hitting the scal  ling wall once you deploy?
• Adding a separate loosely co   oupled service to an existing
  system?
• What are your resources? ex    xpertise? budget?
• What are your pain points? W  What s
                                What's so important that if it
                                  s
  fails you will fail? What forces are pushing you?
             y      p
• What are your priorities? Prio                            y
                                 oritize them. What is really
                                  get
  important to you, what must g done?
• What are your risks? Prioritize them. Is the risk of being
                                  han
  unavailable more important th being inconsistent?
What are you trying to accomplish?

                             mplish?
• What are you trying to accom
• What's the delivery schedule??
• Do the research to be specific, like Facebook did with
  their messaging system:
Facebook chose HBase because they mo      onitored their usage and figured out
what was needed: a system that could ha  andle two types of data patterns.
1. A short set of temporal data that tends to be volatile
2. An ever-growing set of data that rarely gets accessed
                   ..Your Problem
Things to Consider..
                                m
• Do you need to build a custom system?
                                 et
• Access patterns: 1) A short se of temporal data that tends
                               wing set of data that rarely gets
  to be volatile 2) An ever-grow
  accessed 3) High write loads 4) High throughput, 5)
  Sequential, 6) Random
• Requires scalability?
                                      consistency,
• Is availability more important than consistency or is it
                                 ty,
  latency, transactions, durabilit performance, or ease of
  use?
                                 es?
• Cloud or colo? Hosted service Resources like disk
  space?
  Can       find      l   h know the t k?
• C you fi d people who k w th stack?
                                 on
• Tired of the data transformatio (ORM) treadmill?
                                 sed
• Store data that can be access quickly and is used often?
                                ace
• Would like a high level interfa like PaaS?
                   ..Money
Things to Consider..

  Cost?
• C t? With money you h       different options than if you
                         have diff    t ti      th
                             e
  don't. You can probably make the technologies you know
       scale
  best scale.
• Inexpensive scaling?
• Lower operations cost?
• No sysadmins?
• Type of license?
• Support costs?
                   ..Programming
Things to Consider..

• Flexible datatypes and schem  mas?
• Support for which language b  bindings?
• Web support: JSON, REST, H     HTTP, JSON-RPC
• Built-in t d          d
  B ilt i stored procedure support? J           i t?
                                     t? Javascript?
• Platform support: mobile, wor  rkstation, cloud
•                        key valu distributed, ACID, BASE,
                                 ue,
                                 ue
  Transaction support: key-valu distributed ACID BASE
  eventual consistency, multi-o object ACID transactions.
• Datatype support: graph, key  y-value, row, column, JSON,
  document, references, relatio onships, advanced data
  structures, large BLOBs.
                                action model where you can
• Prefer the simplicity of transa
  just update and be done with it? In-memory makes it fast
  enough and big systems can fit on just a few nodes.
        g         g y                   j
                   ..Performance
Things to Consider..

  Performance metrics: IOPS/s
• P f              ti                  d     it
                         IOPS/sec, reads, writes,
  streaming?
                                ern:
• Support for your access patte random read/write;
                               or
  sequential read/write; large o small or whatever chunk
  size you use.
• Are you storing frequently updated bits of data?
• High Concurrency vs. High P  Performance?
                               of
• Problems that limit the type o work load you care about?
                                ent
• Peak QPS on highly-concurre workloads?
       y     p
• Test your specific scenarios??
                   ..Features
Things to Consider..
• Spooky scalability at a distannce: support across multiple
  data-centers?
• Ease of installation, configuraation, operations,
  development, deployment, su    upport, manage, upgrade,
  etc.
                                 d
• Data Integrity: In DDL, Stored Procedure, or App
                                              Append-only B-
• Persistence design: Memtable/SSTable; Append only B
  tree; B-tree; On-disk linked lists; In-memory replicated;
  In-memory snapshots; In-me
             y     p      ;    emory only; Hash; Pluggable.
                                      y   y;       ;    gg
                                 ,
• Schema support: none, rigid, optional, mixed
• Storage model: embedded, c    client/server, distributed, in-
  memory
• Support for search, secondary indexes, range queries,
  ad hoc
  ad-hoc queries, MapReduce?     ?
• Hitless upgrades?
                   ..More Features
Things to Consider..
• Tunability of consistency mod dels?
• Tools      il bilit   d   d t       t it ?
  T l availability and product maturity?
• Expand rapidly? Develop rap  pidly? Change rapidly?
• Durability? On power failure??
• Bulk import? Export?
• Hitless upgrades?
•                              s
  Materialized views for rollups of attributes?
• Built-in web server support?
• Authentication authorization, validation?
  Authentication, authorization,
• Continuous write-behind for s system sync?
•                     y            y
  What is the story for availability, data-loss prevention,
  backup and restore?
• Automatic load balancing, pa artitioning, and repartitioning?
• Live addition and removal of machines?
                   ..The Vendor
Things to Consider..
 •           y              y
     Viability of the company?
 •   Future direction?
 •   Community and support list quality?
 •   Support responsiveness?
 •   How do they handle disasters?
 •          y      q     y partn      p        p
     Quality and quantity of p nerships developed?
 •                             se-level SLA, paid support,
     Customer support: enterpris
     none
                    Yoda
Size Matters Not -- Y
                            • Scaalability, handling large data
                              volumes may have been the
                                iginal      ti ti for NoSQL
                              origi l motivation f N SQL
                              sysstems. Like 7TB a day for
                               witter Not the only motivation
                               witter.
                              Tw
                              anyymore.
 •
   NoSQL            isn't just b t      li    It's b t distributed
• N SQL or SQL i 't j t about scaling. It' about di t ib t d
                                  xity
   architectures, reduce complex via rich data models that
                                 ain
                                 ain.      it
   more easily represent a doma Or "it does what you need
   doing."
                                  ns
• More than one machine mean splitting data and worrying
   about consistency. Leads to 2 2PC or quorums, or just writing
   a complex value, which loses support for references and
        integrity                                  wins,
   data integrity, causes things like last update wins vector
   clocks for read repair, and gosssip protocols.
                   mor
There's Truth in Hum

                                 tional...
Something a little fun, yet educat

• Hilarious Relational Database Vs. NoSQL Fanbois by Garrett
    Smith (NSFW)
     o Oh so funny...classic worse is better argument. Peter Gabriel: It will take much
       less time and effort to implement initial and it will be easier to adapt to new
                                                 lly
                         g                      .                  p        p y
       situations. Porting becomes far easier. Thus its use will spread rapidly. Once
                                                s
       spread, there's pressure to improve its functionality, but users have already been
                                                han
       conditioned to accept "worse" rather th the "right thing". Therefore, the worse-
                                                ance, second will condition its users to
       is-better software first will gain accepta
               less,                           d                                   thing.
       expect less and third will be improved to a point that is almost the right thing
•                                               n
    Hilarious Fault-Tolerance Cartoon by John Muellerleile (NSFW)
     o                                                d
         John is from Riak and this cartoon was based on their actual experience. When standard, well-
                  y      g                           g.
         worn ways change a lot it can be disorienting.
                                  Making
• Flow Chart For Project Decision M                                 by Anonymous (NSFW)
     o                                                      es
         If it's not broke don't fix it. Rewriting rarely goe well. This was Twitter's choice with Cassandra for
         Tweet storage.
                               dy's
• Everything's Amazing and Nobod Happy                                   by Louis CK
                                 an
• Love this interview. We live a amazing, amazing world.
  We are using high speed inte                               it s
                                  ernet from a plane! While it's
                                  e
  flying through the air! You are sitting in a chair in the sky!
• We tend to see all this confus                      p
                                  sion in the market place and
  get anxious.
• Things were simpler when th    here was one way to do things.
  But          d           h      e ith less t d th ever
• B t we can do so much more with l           today than
  before. A few people can do now in half a year and $150K
  what it took a team of 20 peo  ople a year and $1.5 million.
• There's so much energy and excitement and learning.
                                 y
• Thinking back to some of my old projects I can see how
  they      ld be totally different today, in a good way.
  th would b t t ll diff            t d    i       d
Why So Much SQL on NoSQL Hate?
     • Everything is amaz zing, then why are people so
       mean?
       Look t the flame w
     • L k at th fl               f        N SQL
                          wars of SQL vs. NoSQL
                          t
     • A guy writes about his experiences with GAE and
                         d
       he gets hammered in the comments. Really?
     • Comments like:
        o "But your reasoning is just lame. Get better
          coders."
        o "Is english your 2nd language?"
            You
        o "You obviously are not a very good
          programmer or craftsman for that matter
          because both cr raftsman and programmer know
                          r
          how to use their tools before starting a project.
          "
                                    stake Chill.
     • Nothing really serious is at stake. Chill

								
To top