Scaleable Computing Jim Gray Microsoft Corporation by cmz65105

VIEWS: 5 PAGES: 46

									    Scaleable Computing

            Jim Gray
      Microsoft Corporation
          Gray@Microsoft.com




™
Thesis: Scaleable Servers
   Scaleable Servers
       Commodity hardware allows new applications
       New applications need huge servers
       Clients and servers are built of the same “stuff”
            Commodity software and
            Commodity hardware
   Servers should be able to
       Scale up (grow node by adding CPUs, disks, networks)
       Scale out (grow by adding nodes)
       Scale down (can start small)
   Key software technologies
       Objects, Transactions, Clusters, Parallelism
       1987: 256 tps Benchmark
               14 M$ computer (Tandem)
               A dozen people
               False floor, 2 rooms of machines
                                                  Admin expert
                                                                  Hardware experts
                          A 32 node processor array                      Auditor
                                                        Network expert
Simulate 25,600 clients
                                              Performance   Manager
                                              expert




                                                      DB expert     OS expert
                     A 40 GB disk array (80 drives)
  1988: DB2 + CICS Mainframe
            65 tps
          IBM 4391
          Simulated network of 800 clients
          2m$ computer
          Staff of 6 to do benchmark
    2 x 3725                            Refrigerator-sized
network controllers                     CPU




                      16 GB disk farm
                      4 x 8 x .5GB
     1997: 10 years later
1 Person and 1 box = 1250 tps
    1 Breadbox ~ 5x 1987 machine room
    23 GB is hand-held
    One person does all the work
    Cost/tps is 1,000x less
     25 micro dollars per transaction
                               4x200 Mhz cpu
Hardware expert                1/2 GB DRAM
OS expert                      12 x 4GB disk
Net expert
DB expert                         3 x7 x 4GB
App expert                        disk arrays
            What Happened?
   Moore’s law:
    Things get 4x better every 3 years
      (applies to computers, storage, and networks)
   New Economics: Commodity
    class        price/mips software
                     $/mips k$/year
    mainframe      10,000    100
    minicomputer      100     10
    microcomputer      10       1
                                                time

   GUI: Human - computer tradeoff
    optimize for people, not computers
         What Happens Next
   Last 10 years:
     1000x improvement
   Next 10 years:
      ????
                                   1985 1995 2005
   Today:
    text and image servers are free
      25 m$/hit => advertising pays for them
   Future:
    video, audio, … servers are free
    “You ain’t seen nothing yet!”
        Kinds Of
 Information Processing
             Point-to-point         Broadcast
               Conversation       Lecture
               Money              Concert            Network
Immediate


Time-          Mail                Book
                                   Newspaper         Database
shifted


 It’s ALL going electronic
 Immediate is being stored for analysis (so ALL database)
 Analysis and automatic processing are being added
       Why Put Everything
        In Cyberspace?
                                                    Point-to-point
                                                         OR
Low rent -                                            broadcast




                        Immediate OR time-delayed
  min $/byte

Shrinks time -                                                       Network
  now or later

Shrinks space -                                      Locate
  here or there                                      Process
                                                     Analyze
                                                     Summarize
Automate processing -
  knowbots
                                                                     Database
         Magnetic Storage
        Cheaper Than Paper
   File cabinet:   cabinet (four drawer) 250$
                    paper (24,000 sheets) 250$
                    space (2x3 @ 10$/ft2) 180$
                    total                 700$
                    3¢/sheet
   Disk:           disk (4 GB =)        800$
                    ASCII: 2 mil pages
                    0.04¢/sheet          (80x cheaper)

   Image:          200,000 pages
                    0.4¢/sheet           (8x cheaper)

   Store everything on disk
              Databases
Information at Your Fingertips™
      Information Network™
     Knowledge Navigator™
    All information will be in an
     online database (somewhere)
    You might record everything you
        Read: 10MB/day, 400 GB/lifetime
         (eight tapes today)
        Hear: 400MB/day, 16 TB/lifetime
         (three tapes/year today)
        See: 1MB/s, 40GB/day, 1.6 PB/lifetime
         (maybe someday)
                            Database Store
                            ALL Data Types
   The old world:                                                     The new world:
      Millions of objects
                                                                           Billions of objects
      100-byte objects
                                                                           Big objects (1 MB)
            People

          Name    Address
                                                                           Objects have
          David    NY
                                                                            behavior (methods)
          Mike     Berk
                                                                               Paperless office
          Won
                                                                               Library of Congress online
                  Austin
                                               People                          All information online
                                                                                 Entertainment
                              Name    Address Papers    Picture Voice
                                                                                 Publishing
                              David    NY
                                                                                 Business
                              Mike     Berk                                    WWW and Internet
                              Won     Austin
     Billions Of Clients
   Every device will be “intelligent”
   Doors, rooms, cars…
   Computing will be ubiquitous
       Billions Of Clients
    Need Millions Of Servers
   All clients networked
                                          Clients
    to servers
                             Mobile
       May be nomadic       clients
                                                   Fixed
        or on-demand                              clients
   Fast clients want       Servers
    faster servers                       Server

   Servers provide
       Shared Data
       Control
                                       Super
       Coordination                   server
       Communication
                                Thesis
                Many little beat few big
                                                                                 3
      $1                                                                     1 MM
     million    $100 K          $10 K
                                                                   Pico Processor
                                Micro           Nano 1 MB         10 pico-second ram
    Mainframe        Mini
                                                       100 MB  10 nano-second ram
                                                        10 GB 10 microsecond ram
                                                         1 TB 10 millisecond disc
                                                       100 TB 10 second tape archive
                                    3.5"   2.5" 1.8"
                            5.25"                                1 M SPECmarks, 1TFLOP
                9"
      14"                                                        106 clocks to bulk ram
   Smoking, hairy golf ball                                     Event-horizon on chip
   How to connect the many little parts?                        VM reincarnated
   How to program the many little parts?                        Multiprogram cache,
                                                                 On-Chip SMP
   Fault tolerance?
             Future Super Server:
                 4T Machine
   Array of 1,000 4B machines
    1  bps processors
                                                        CPU
     1 BB DRAM
     10 BB disks
                                                                 50 GB Disc

     1 Bbps comm lines                               5 GB RAM

     1 TB tape robot
   A few megabucks
   Challenge:
     Manageability                                 Cyber Brick
     Programmability                                 a 4B machine
     Security
     Availability
     Scaleability
                                 Future servers are CLUSTERS
     Affordability              of processors, discs
   As easy as a single system
                                 Distributed database techniques
                                 make clusters work
The Hardware Is In Place…
   And then a miracle occurs

   ?
                  SNAP: scaleable network
                   and platforms
                  Commodity-distributed
                   OS built on:
                     Commodity platforms
                     Commodity network
                      interconnect
                  Enables parallel applications
Thesis: Scaleable Servers
   Scaleable Servers
       Commodity hardware allows new applications
       New applications need huge servers
       Clients and servers are built of the same “stuff”
            Commodity software and
            Commodity hardware
   Servers should be able to
       Scale up (grow node by adding CPUs, disks, networks)
       Scale out (grow by adding nodes)
       Scale down (can start small)
   Key software technologies
       Objects, Transactions, Clusters, Parallelism
       Scaleable Servers
     BOTH SMP And Cluster
               Grow up with SMP; 4xP6
               is now standard
SMP super      Grow out with cluster
server         Cluster has inexpensive parts

Departmental
server
                                      Cluster
                                      of PCs
Personal
system
        SMPs Have Advantages
   Single system image
    easier to manage, easier
    to program threads in
    shared memory, disk, Net
   4x SMP is commodity      SMP super
   Software capable of 16x server
   Problems:
    >4 not commodity            Departmental
                                 server
    Scale-down problem
     (starter systems expensive)
 There is a BIGGEST one
                                 Personal
                                 system
        Building the Largest Node
   There is a biggest node (size grows over time)
   Today, with NT, it is probably 1TB
   We are building it (with help from DEC and SPIN2)
       1 TB GeoSpatial SQL Server database
       (1.4 TB of disks = 320 drives).
       30K BTU, 8 KVA, 1.5 metric tons.
   Will put it on the Web as a demo app.         1-TB home page
   10 meter image of the ENTIRE PLANET.            www.SQL.1TB.com
   2 meter image of interesting parts (2% of land)                     Todo loo da loo-rah, ta da ta-la la la

                                                                           Todo loo da loo-rah, ta da ta-la la la
                                                                           Todo loo da loo-rah, ta da ta-la la la

                                                                            Todo loo da loo-rah, ta da ta-la la la

              One pixel per meter = 500 TB uncompressed.                   Todo loo da loo-rah, ta da ta-la la la
                                                                        Todo loo da loo-rah, ta da ta-la la la
                                                                      Todo loo da loo-rah, ta da ta-la la la
                                                                           TM




   Better resolution in US (courtesy of USGS).


                                              1-TB SQL Server DB                                     Support
                                               Satellite and aerial                                    files
                                                      photos
          What’s TeraByte?
   1 Terabyte:
      1,000,000,000 business letters 150 miles of book shelf
      100,000,000 book pages          15 miles of book shelf
      50,000,000 FAX images            7 miles of book shelf
    10,000,000 TV pictures (mpeg)           10 days of video
    4,000 LandSat images             16 earth images (100m)
     100,000,000 web page        10 copies of the web HTML

   Library of Congress (in ASCII) is 25 TB
     1980: $200 million of disc              10,000 discs
          $5 million of tape silo            10,000 tapes

     1997: $200 k$ of magnetic disc              48 discs
            $30 k$ nearline tape                 20 tapes


                 Terror Byte !
TB DB User Interface



      +       +

              +
       Next
Tpc-C Web-Based Benchmarks
   Client is a Web browser
             (7,500 of them!)
   Submits
       Order
       Invoice
       Query to server via Web




                                   HTTP
        page interface
   Web server translates to DB
   SQL does DB work                IIS
                                  = Web
   Net:




                                   ODBC
     easy to implement
     performance is GREAT!
                                  SQL
               Grow UP and OUT
           1 Terabyte DB
                           Cluster:
                             •a collection of nodes
                             •as easy to program
SMP super                    and manage as a
server
                             single node
                                            1 billion
                                         transactions
Departmental                                per day
server


Personal
system
Clusters Have Advantages
   Clients and servers made from the same stuff
   Inexpensive:
       Built with commodity components
   Fault tolerance:
       Spare modules mask failures
   Modular growth
       Grow by adding small modules
   Unlimited growth:
            no biggest one
            Windows NT Clusters
   Microsoft & 60 vendors defining NT clusters
       Almost all big hardware and software vendors involved
   No special hardware needed - but it may help
   Fault-tolerant first, scaleable second
       Microsoft, Oracle, SAP giving demos today
   Enables
       Commodity fault-tolerance
       Commodity parallelism (data mining, virtual reality…)
       Also great for workgroups!
Billion Transactions per Day
           Project
   Building a 20-node Windows NT
    Cluster (with help from Intel)
    > 800 disks
   All commodity parts
   Using SQL Server &
    DTC distributed transactions
   Each node has 1/20 th of the DB
   Each node does 1/20 th of the
    work
   15% of the transactions are
    “distributed”
          How Much Is 1 Billion
          Transactions Per Day?
   1 Btpd = 11,574 tps
    (transactions per second)                Millions of transactions per day
    ~ 700,000 tpm                          1,000.
    (transactions/minute)                   100.

   AT&T


                                    Mtpd
                                             10.

       185 million calls                      1.

        (peak day worldwide)                 0.1




                                                                    AT&T
   Visa ~20 M tpd

                                                    1 Btpd




                                                                           BofA


                                                                                  NYSE
                                                             Visa
       400 M customers
       250,000 ATMs worldwide
       7 billion transactions / year
        (card+cheque) in 1994
                   Parallelism
        The OTHER aspect of clusters

   Clusters of machines
    allow two kinds
    of parallelism
       Many little jobs: online
        transaction processing
          TPC-A, B, C…
       A few big jobs: data
        search and analysis
          TPC-D, DSS, OLAP

   Both give
    automatic parallelism
       Kinds of Parallel Execution
                                                                 Any               Any
                                                              Sequential        Sequential
    Pipeline                                                   Program           Program




   Partition                                                         Any            Any
                                                                                  Sequential
                                                                   Sequential
       outputs split N ways                                         Program        Program
       inputs merge M ways


Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
                     Partitioned Execution
                 Spreads computation and IO among processors

                                                                    Count

                                  Count              Count          Count     Count      Count




                                                             A Table
                                  A...E             F...J          K...N    O...S     T...Z


       Partitioned data gives
                        NATURAL parallelism

Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
                    N x M way Parallelism
                                                       Merge        Merge      Merge

                                      Sort              Sort          Sort      Sort       Sort

                                       Join              Join         Join      Join       Join




                                    A...E            F...J         K...N     O...S     T...Z


              N inputs, M outputs, no bottlenecks.

              Partitioned Data
              Partitioned and Pipelined Data Flows
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
                The Parallel Law
                 Of Computing
Grosch's Law:
                     2x $ is 4x performance

                  1,000 MIPS
                      32 $
       1 MIPS      .03$/MIPS            2x $ is
       1$
                                   2x performance

                               Parallel Law:
Needs:
                                               1,000 MIPS
  Linear speedup and linear scale-up             1,000 $
                                                            1 MIPS
  Not always possible                                         1$
Thesis: Scaleable Servers
   Scaleable Servers
       Commodity hardware allows new applications
       New applications need huge servers
       Clients and servers are built of the same “stuff”
            Commodity software and
            Commodity hardware
   Servers should be able to
       Scale up (grow node by adding CPUs, disks, networks)
       Scale out (grow by adding nodes)
       Scale down (can start small)
   Key software technologies
       Objects, Transactions, Clusters, Parallelism
           The BIG Picture
     Components and transactions
   Software modules are objects
   Object Request Broker (a.k.a., Transaction
    Processing Monitor) connects objects
    (clients to servers)
   Standard interfaces allow software plug-ins
   Transaction ties execution of a “job” into an
    atomic unit: all-or-nothing, durable, isolated




          Object Request Broker
        Linking And Embedding
         Objects are data modules;
    transactions are execution modules
   Link: pointer to object
    somewhere else
       Think URL in Internet
   Embed: bytes
    are here
   Objects may be active;
    can callback to subscribers
             Objects Meet Databases
            The basis for universal
      data servers, access, & integration

   object-oriented (COM oriented)
    programming interface to data     DBMS     Database
   Breaks DBMS into components       engine
                                               Spreadsheet
   Anything can be a data source
   Optimization/navigation “on top            Photos
    of” other data sources
   A way to componentized a DBMS              Mail
   Makes an RDBMS and O-R                     Map
    DBMS (assumes optimizer
    understands objects)
                                               Document
       Web Client
              HTML
                                           The Three
VBscritpt
                       VB Java
                       plug-ins
                                             Tiers
JavaScrpt
                                                      Middleware
                                           Object        ORB
  VB or Java         VB or Java                        TP Monitor
 Script Engine       Virt Machine          server     Web Server...
                                           Pool
                          HTTP+
                          DCOM      ORB
            Internet                                        Object & Data
                                                               server.
                                               DCOM (oleDB, ODBC,...)


                                    Legacy
                     IBM            Gateways
                                                                      43
         Server Side Objects
          Easy Server-Side Execution
                                  A Server
   Give simple execution
                                                   Network
    environment
                                                   Receiver




                                  Management
   Object gets                                     Queue


       start                                    Connections




                                                                      Configuration
                                               Context   Security
       invoke                                   Thread Pool

       shutdown                                 Service logic

                                               Synchronization
   Everything else is                           Shared Data
    automatic
   Drag & Drop Business
    Objects                                                      47
A new programming paradigm
   Develop object on the desktop
   Better yet: download them from the Net
   Script work flows as method invocations
   All on desktop
   Then, move work flows and objects to server(s)
   Gives

    desktop  development
    three-tier deployment
    Software Cyberbricks
    Transactions Coordinate
      Components (ACID)
   Transaction properties
       Atomic: all or nothing
       Consistent: old and new values
       Isolated: automatic locking or versioning
       Durable: once committed, effects survive
       Transactions are built into modern OSs
           MVS/TM Tandem TMF, VMS DEC-DTM, NT-DTC
Transactions & Objects
   Application requests transaction
    identifier (XID)
   XID flows with method invocations
   Object Managers join (enlist)
    in transaction
   Distributed Transaction Manager
    coordinates commit/abort
     Distributed Transactions
     Enable Huge Throughput
   Each node capable of 7 KtmpC (7,000 active users!)
   Can add nodes to cluster (to support 100,000 users)
   Transactions coordinate nodes
   ORB / TP monitor spreads work among nodes
Distributed Transactions
    Enable Huge DBs
   Distributed database technology
    spreads data among nodes
   Transaction processing technology
    manages nodes
Thesis: Scaleable Servers
   Scaleable Servers Built from Cyberbricks
       Allow new applications
   Servers should be able to
       Scale up, out, down
   Key software technologies
       Clusters (ties the hardware together)
       Parallelism: (uses the independent cpus, stores, wires
       Objects (software CyberBricks)
       Transactions: masks errors.

								
To top