Document Sample
10000-Tables-Cant-Be-Wrong-Designing-a-Highly-Scalable-MySQL-Architecture-for-Write-intensive-Applications Powered By Docstoc
					                                   Ten Thousand Tables
                                   Can’t Be Wrong…
         Presented by,
MySQL AB® & O’Reilly Media, Inc.

                                   Richard Chart – Co-Founder,
                                   ScienceLogic, LLC


This Presentation is about…
 An unusual approach to application scaling
  using MySQL
 The design choices we made early in product
  design, and how they look a few years later
 A few hard earned lessons in what to look for as
  an application grows beyond its roots
 The decision path that led to using MySQL in this
 The growth path that got us to current scale
 Along the way I will talk about:
  What we discovered about MySQL write performance
  While real world monitoring is important… understanding
  how to interpret the results is even more so
In the beginning…
 …or at least the beginning for ScienceLogic:
 Product concept: IT Management Appliance
 A series of choices - fundamental technology for
  each layer:
  Operating System
  Application Architecture
  Database technology & architecture
                                        Sweet Layers – Shirley Shelton
Technology Choices
    Hardware/OS/architecture – for another day
    Database Technology: MySQL
    1.   Performance
    2.   Stability
    3.   Experience
    4.   Support (increasingly important)
 Starting point: single appliance supporting 200
 Goal: extensible architecture with -
   very deep host and application monitoring
  1,000+ devices, each with 100+ management points
 Current reality –
  several times the scale of the original goal
 Future: The fundamentals are sound: next
  generation of the product moves up another
  order of magnitude
Database Architecture
 The challenge:
  Wide range of monitored nodes
  Wide range of data points (ping…hundreds
  of HW, OS and application metrics)
  Distributed collection (WAN: latency,
  security concerns)
  Commercial product: MySQL dual licensing
  comes into play
Database Architecture
 Data Characteristics:
  90+% Writes
  ACID not important
  Resilient to loss
   (Remaining data is not invalidated by gaps)
  Data elements valuable by themselves
  Data elements much more valuable when
    relationships added
The MyISAM Fit
  Very fast writes when no lock contention
  Simple data handling
  Lower license cost that InnoDB (important only
   for those of us shipping commercial products)
  Not transactional – and we don’t care!
Not InnoDB (all the opposite reasons)

  Slower to stuff with data
  More expensive (for us commercial folks)
What about memory tables?
 Limited applicable areas in EM7: most of the data has to
  live on disk for reporting weeks or months in future
 No measurable benefit over MyISAM in transient data
  areas where we could use them in EM7
 Because we take advantage of MyISAM cached indexes
  and the required data is in the index
 …but the application continues to evolve, we will use
  them in future if the right situation occurs
Scaling without lock contention
 The ace in the hole
  Dynamically created tables
  No more than one thread writing to a
  table at once
  Separate thread consolidates data for
  reporting across devices

  This approaches the sharding
  architecture used in highly scalable web
  sites, but with core data stored centrally
Multiple Threads / Multiple Collectors
Dynamic Databases and Tables
                                                                     Stats Databases

Dynamic_app_18              Dynamic_app_20              Dynamic_app_21              Dynamic_app_22             Dynamic_app_27              Dynamic_app_28

                  Stat_1                                                 Stat_56                     Stat_77                    Stat_500                    State_600

                 Stat_12                                                 Stat_59                     Stat_79                    Stat_550                    Stats_601

                 Stat_13                                                 Stat_16                     Stat_80                    Stat_551                    Stats_602

                 Stat_14                                                 Stat_18                     Stat_81                    Stat_552                    Stats_650

                 Stat_15                                                 Stat_180                    Stat_82                    Stat_553                    Stats_651

                 Stat_120                                                Stat_181                    Stat_89                    Stat_554                    Stats_640

                 Stat_121                                                                                                       Stat_555                    Stats_642

                 Stat_250                                                                                                                                   Stats_649
    Dynamic Table Creation
   NO_TABLE = 1146 # MySQL error code

   try:
     db.execute(“INSERT INTO dynamic_app_%s.stat_%s
             VALUES (10, ’sample data’)”, (app, device))

   except MySQLdb.Error, e:
     if e.args[0] == NO_TABLE:
          db.execute(“CREATE DATABASE IF NOT EXISTS dynamic_app_%s”, (app))
           db.execute(“CREATE TABLE IF NOT EXISTS dynamic_app_%s.stat_%s
                         LIKE dynamic_app_0.stat_0”,(app, device))
Growth Curve


   150                             Collectors
                                   Devices / 10
   100                             Tables / 100


     Mth Mth Mth Mth Mth Mth
         2   4   6   8   10   12
How far does
this go?

 So Far:
 20,000+ tables
 2,200+ queries per
 5 billion rows
 93% writes

 Next limit is how
  quickly data can be
  stuffed onto disk
The Database Platform
 4 x Intel Xeon Dual Core 7140M,
  16MB Cache, 3.4GHz, 800MHz
 16GB (16 x 1GB) 400MHz Single
  Ranked DIMMs
 Hardware RAID Controller
 10 x 146GB 15k RPM SAS Drives
 Linux 2.6 kernel
 MySQL 5.0.x
Managing Performance With Growth
 As usage rates escalate, things that once were
  fine become an issue…eg:
  Query Cache entries purged due to too many entries,
  or too many changes to underlying tables
  Lock contention
  Sort data set size causing increased
  created_tmp_disk_tables rather than
Open Files & Tables
 Critical when scaling this way
 show global status like 'open%'
 Open Files (Linux)
     mysql        hard     nofile   20480
     mysql        soft     nofile   20480
 /etc/my.cnf
  open_files_limit = 20480
Misc MyISAM Tuning Helpers
 concurrent_insert = 2
  1 (Default) Enables concurrent insert for MyISAM tables
  that don't have holes
  2 Enables concurrent inserts for all MyISAM tables,
  even those that have holes. For a table with a hole, new
  rows are inserted at the end of the table if it is in use by
  another thread. Otherwise, MySQL acquires a normal
  write lock and inserts the row into the hole.
 myisam_recover = QUICK, BACKUP
 Very important to measure DB stats over time
 show global status like
  Point in time counter useless in its own right
  …very valuable with 5 minute poll and graphed deltas

 Some stats must be combined to be useful
  Percentage of requests waiting for locks (deltas)
  (table_locks_waited/(table_locks_waited +
  table_locks_immediate) * 100
 MySQL Enterprise Monitor
  Very worthwhile tool – take advantage of it if you
  subscribe to MySQL Enterprise support

 Other Tools
  EM7 monitors databases (as well as servers, routers,
  firewalls, etc. etc.), so of course we use that…
EM7 Example
Monitoring Caveat!
 Some monitored changes are obviously bad…
  eg increase in created_tmp_disk_tables

 Some monitors are

 …what’s going on
 What really happened:
 Misconfigured clients
  caused the CPU
 created_tmp_tables
  unconnected with
  the CPU load
 EXPLAIN showed
  small row sets being
  sorted in memory
  tmp tables
Summarizing & Pruning Data
 Keeping the source statistics tables small is key
  for ongoing performance –
 Summarized data for reporting
  Infrequent writes, regular reads - MyISAM fine here also in most
  In EM7 we summarize hourly, daily, monthly etc.
  Retention periods configurable
 DELETEs suck MySQL performance…
  can use a deleted row marker
  If you can do a purge in your app, you’re golden
  In EM7 we schedule DELETEs for a nightly quiet time
Summarizing Data With Dynamic
Stored Procedure
   USE dynamic_app_data_43;
   DROP PROCEDURE IF EXISTS dynamic_app_43.dynamictest;
   CREATE PROCEDURE dynamic_app_43.dynamictest(n INTEGER)
   SET @s = "";
   SET @s = CONCAT(@s," INSERT INTO app_crunched
   (did,object,ind,year,month, date,average, total, poll_count)");
   SET @s = CONCAT(@s," SELECT
   SET @s = CONCAT(@s," FROM stat_", n);
   SET @s = CONCAT(@s," WHERE crunched = 0");
   #SELECT @s;
   PREPARE exe FROM @s;
   EXECUTE exe;
   END;
   //
   CALL dynamictest(5);
What’s The Downside?
 Reporting tools not good with dynamic
  databases and tables (eg Crystal Reports)
 80 : 20 rule:
  Above a certain size of implementation, some tables
  just have to use row locking
  (with EM7, 1,500 devices, 10,000 tables, and 0.1%
  need to be InnoDB… so should be the 99.9 : 0.1 rule)
 Backup and data maintenance complexity of
  multiple engines
So… MyISAM all the way?
 Well, no.
 In our application, at larger sites, around a
  dozen tables need the characteristics of
  InnoDB, or around 0.1%
 Be selective in the storage engine choice,
  consider relative merits for each part of the

 Richard Chart
A hand on us…
                Contact for a pack:

                Richard Chart

Shared By: