NoSQL with MySQL Presentation by suchenfz

VIEWS: 12 PAGES: 28

									 NoSQL with MySQL

        Yekesa Kosuru
Distinguished Architect, Nokia
    Peter Zaitsev, Percona
Agenda
•   Part 1 : Problem statement
•   Part 2 : Background (What is, Why, ACID, CAP)
•   Part 3 : Building a Key-Value Store
•   Part 4 : NoSQL with MySQL
PART 1: PROBLEM STATEMENT
NoSQL : No Formal Definition
• No Formal Definition
   – Not Only SQL ?
   – Don’t use SQL ?
   – Limit expressive power ?
• Underlying issues ?
   – ACID (Strong Consistency ?, Isolation ?)
   – SQL
   – Something else ?
• Facts
   – Hardware will fail
   – Software will have bugs
   – Growing traffic, data volumes, unpredictable
   – No downtime or expose errors to users
Problem to solve
• Key Value Store Problem Statement
   – A consumer facing system with foll. characteristics :
         •   Host large volumes of data & queries
         •   Responsive with predictable performance
         •   Always available (survives failures, software/hardware upgrades)
         •   Low cost of ownership
         •   Minimal administration
•   Availability : How to service customers (24 X 7) despite failures, upgrades
•   Scalability : How to scale (capacity, transactions, costs…)
•   Predictability : Predictable responsiveness
•   Not suited for all applications
PART 2: BACKGROUND
ACID
• Transactions simplify job of developers
• Client/developer point of view
• Atomicity: Bundled operations
   – All of the operations (multiple updates, deletes, inserts) in the
      transaction will complete, or none will
• Consistency: The database is in a consistent state when the transaction
  begins and ends
   – Referential integrity, and other constraints
   – All reads must see latest data
• Isolation: The transaction will behave as if it is the only operation being
  performed upon the database. Serialized updates i.e locks
• Durability: Upon completion of the transaction, the operation will not be
  reversed.
• ACID in single database is efficient
Issues Scaling DBMS
• Databases extend ACID to span multiple nodes using 2PC
• Shared disk & shared nothing architectures
• Cost of 2PC (N nodes involved in transaction)                     RequestCommit

   – Consistency & Isolation over 2PC makes it expensive              Prepare
                                                                       Prepare
                                                                        Prepare
                                                                         Prepare
   – 3N+1 Message Complexity                                          Prepare
                                                                     Prepare
                                                                    Prepare
                                                                   Prepared
   – N+1 Stable Writes                                                Prepare
                                                                       Prepare
                                                                        Prepare
   – Locks – lock resources                                              Commit

• Availability is a pain point
    – 2PC is blocking protocol , uses single coordinator and not fault tolerant
    – Coordinator failure can block transaction indefinitely
    – Blocked transaction causes increased contention
• Rebalancing
• Upgrades
• Total Cost of Ownership
CAP
• Introduced by Prof. Brewer in year 2000
• Consistency, Availability, (Network) Partition Tolerance
   – Industry proved that we can guarantee two out of three
• System point of view
• Consistency : Defines rules for the apparent order and visibility of updates
   – Variations on consistency
   – Eventual Consistency - eventually replicas will be updated – order of
      updates may vary - all reads will eventually see it
• Availability : All clients can read & write data to some replica, even in the
  presence of failures
• Partition-tolerance : operations will complete, even if network is
  unavailable i.e disconnected network
PART 3: BUILDING KEY-VALUE STORE
Important Considerations
• Functional
   – Simple API ( Get, Put, Delete - like Hashtable)
   – Primary key lookups only (limit expressive power)
        • No joins, No non-key lookups, No grouping functions, No constraints …
   – Data Model (Key-Value, Value = JSON)
   – Read-Your-Write Consistency
• SLA
   – Highly Available & Horizontally Scalable
   – Performance (low latencies & predictable)
        • Uniform cost of queries
• Operational
   – Symmetric, Rolling Upgrades, Backups, Alerts, Monitoring
• Total Cost of Ownership (TCO)
   – Hardware, Software, Licenses …
Making it Available
•   Make N replicas of data                               N = Number of replicas
•   Use Quorum (R + W > N)
                                                          R = Number of Reads
•   Overlap R & W to make it strongly consistent
•   Handle Faults (partition tolerance and self heal) W = Number of Writes
     – Real faults                                     R+W>N
         • Machine down, NIC down …
         • Eventual Consistency (data)
         • Prefer A & P in CAP                         2+2>3
         • If a node is down during write, read repair later – let write proceed
     – Intermittent faults
         • X and Y are clients of Z; X can talk to Z but Y can’t
         • Server does not respond within timeout, stress scenarios …
         • Inconsistent cluster view can cause inconsistencies in data
         • If two concurrently writing clients have different views of cluster,
           resolve inconsistency later - let write proceed
Making it Scalable
• Sharding (or partition data) works well for Capacity
• Large number of logical shards distributed over few physical machines
• How to shard
                                                                       A,B,C = nodes
    - Consistent Hashing                                               1,2,3,4 = keys
        - Hash(key) say “2”, move clockwise to find nearest node “B”
    - Hash to create a uniform distribution
    - Allows for adding of machines
         - Without redesign of hashing
         - Without massive movement of data
 - Load balancing
     - If a machine can’t handle load, move shards to another machine
Vector Clocks
•   Determine history of a key-value, think data versioning
•   Need to interpret causality of events, to identify missed updates & detect
    inconsistencies
•   Timestamps are one solution, but rely on synchronized clocks and don't
    capture causality
•   Vector clocks are an alternative method of capturing order in a distributed
    system
•   Node (i) coordinating the write generates a clock at the time of write
•   Construct: ( nodeid : logical sequence)
     – Node A receives first insert : A:1
     – Node B receives an update : A:2
•   Clock: (A:1) is predecessor of (A:1, B:1)
•   Clock: (A:2) and (A:1, B:1) are concurrent
•   Reads use vector clocks to detect missed updates & inconsistencies
Eventual Consistency Example 3/2/2
                                                                                 3 replicas
           Application          Client                        Server             Server            Server
                                                              Node A             Node B            Node C

                         PUT
                                            Write                       V1 A:1
                                                      Write                               V1 A:1
                                           Write                                                            V1 A:1
                          PUT
                                                                        NODE 1 DOWN
                     PUT
                                                    Write                                 V2 (A:1,B:1)
                                                                                                            V2 (A:1,B:1)
                         PUT



                                                                        NODE 1 UP
Outage               GET
Upgrade                                    Read
                                                                       V1 A:1
                                         Detect Missing     Read                          V2 (A:1,B:1)
                                         Update
                                                                                                            V2 (A:1,B:1)
                                           Read


 V2 (A:1,B:1)            GET
                                           Read
                                           Repair                      V2 (A:1,B:1)

                                                               Missed update added
PART 4: NOSQL WITH MYSQL
NoSQL with MySQL
• Key-Value store implemented on top of MySQL/ InnoDB
• MySQL serves as persistence layer, Java code handles the rest
• Atomicity limited to one operation (auto commit), Consistency & Isolation
  limited to one machine (recall issues with 2PC)
• Why MySQL ?
    – Easy to install, tune and restore
    – Fast, Reliable, Proven
    – Performance :
        • MVCC, row level locking
        • Buffer pool, insert buffering, direct IO
    – Data Protection & Integrity :
        • Automatic crash recovery
        • Backup, Recovery (full and incremental)
    – Leverage expertise across ACID and BASE systems
Implementation
• InnoDB Engine (innodb_file_per_table)
• Simple schema
   – Auto increment PK
   – App Key, Value
   – Metadata
   – Hash of app key (index)
• Clustered Index and Secondary Index
• KV use simple queries – no joins or grouping functions
• Database is a shard (pick right number of shards)
• Backup & Recovery
Tuning
• Queries are so simple that tuning is focused on IOPS, Memory, TCP,
  Sockets, Timeouts, Connection Pool, JDBC, JVM , Queuing etc
   – Memory to disk ratio
    – Connector/J
      useServerPrepStmts=true
      useLocalSessionState = true
      cachePrepStmts = true
      cacheResultSetMetadata = true
      tcpKeepAlive=true
      tcpTrafficClass=24
      autoReconnect=true
      maxReconnects=...
      allowMultiQueries=true
      prepStmtCacheSize=…
      blobSendChunkSize=…
      largeRowSizeThreshold=…
      locatorFetchBufferSize=…
• CentOS 5.5, XFS File System
    Scalability Test (TPS/IOPS) EC2’s
     Client        TPS      50.00%   90.00%      99.00%     99.50%      99.90%    Single Server         Characteristics
    Threads                  LAT      LAT         LAT          LAT        LAT          IOPS
       1           114.4      8        16          24          28         52            26
       2          226.88      8        16          26          30         63            48              m1.xlarge image
       4          437.37      9        17          29          34         63            96              User Dataset = 256GB
       5           600.1      8        14          27          32         64           107
                                                                                                        (240M 1K values)
      10          1061.45     9        18          35          44         77           199
      20          1730.69     10       23          52          66         118          307              Machines = 5
      50          2775.85     12       40          98          123        202          482              Spindles=4 per box
      100         3207.18     15       78         224          274        407          571              XFS File System
    3500                                                                                                Buffer Pool=12GB/machine
    3000
                                                                                                        Memory:Disk=1:4
                                                                                                        Workload R/W=100/0
    2500                                                                                                Random Access Pattern
T                                                         TPS                                           N/R/W=1/1/1
    2000                                                  50% LAT                                       Incremental load to 100
P
                                                          90% LAT                                       threads
S   1500                                                  99% LAT
                                                          Server IOPS
    1000

     500

        0
              0             20              40            60         80          100              120

                                   Client Threads
Know Your IOPS

       Disk            IOPS (random)

       7.2k rpm SATA                       90
       10k rpm SAS                        140
       15k rpm SAS                        180
       Value SSD                          400
       Intel X25-M                       1500
       Intel X25-E                       5000
       DDR Drive x1                    300000
    Performance Test                                                                          Characteristics

    0.6                                                                                       User Dataset = 900GB
                                                                                              Machines = 9
    0.5
                                                                                              Spindles=8 (15K)/machine
                                                                                              XFS File System (RAID0)
                                                                                              Buffer Pool=16GB/machine
    0.4                                                                                       Memory:Disk=1:6
S                                                                                             Cache=16%
E                                                                                             Workload R/W=100/0
    0.3
C                                                                                             Recency Skew
S                                                                                             i.e Recent Access=80/20
    0.2                                                                                       N/R/W=3/2/2
                                                                                              Incremental load to 10K
                                                                                              (TPS)
    0.1



     0
          1458   2393   3328   4263         5198         6132         7066   8000   8937   9863

                                      50%          90%          99%
                 TPS
Tips By Peter Zaitsev
      Percona
Why XFS

   File System   Threads(1)   Threads(16)
   EXT3

   RNDRD         243          582
   RNDWR         219          218
   Total         462          800
   XFS
   RNDRD         239          552
   RNDWR         205          408
   Total         444          960
MySQL for NoSQL
   Tune for Simple Queries
   Memory is the most important
       innodb_buffer_pool_size
   Are we looking at intensive writes ?
       innodb_log_file_size
       RAID with BBU
       XFS + O_DIRECT (avoid per inode locking)
   Restrict Concurrency on database
       Connection pool size or innodb_thread_concurrency
More on Scalability
   Is contention problem
       Use MySQL 5.5 or Percona Server
          Tune innodb_buffer_pool_instances
   Index update contention
       Use Multiple Tables
          Or MySQL Partitions if you can't
   Eliminate Query Parsing Overhead
       Consider HandlerSocket
          NoSQL Interface to InnoDB Storage Engine
Work on Result Stability
   Ensure you're getting stable results
      InnoDB Flushing can cause non uniform performance
THANK YOU



YEKESA.KOSURU@NOKIA.COM

								
To top