Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™ Thesis: Scaleable Servers Scaleable Servers Commodity hardware allows new applications New applications need huge servers Clients and servers are built of the same “stuff” Commodity software and Commodity hardware Servers should be able to Scale up (grow node by adding CPUs, disks, networks) Scale out (grow by adding nodes) Scale down (can start small) Key software technologies Objects, Transactions, Clusters, Parallelism 1987: 256 tps Benchmark 14 M$ computer (Tandem) A dozen people False floor, 2 rooms of machines Admin expert Hardware experts A 32 node processor array Auditor Network expert Simulate 25,600 clients Performance Manager expert DB expert OS expert A 40 GB disk array (80 drives) 1988: DB2 + CICS Mainframe 65 tps IBM 4391 Simulated network of 800 clients 2m$ computer Staff of 6 to do benchmark 2 x 3725 Refrigerator-sized network controllers CPU 16 GB disk farm 4 x 8 x .5GB 1997: 10 years later 1 Person and 1 box = 1250 tps 1 Breadbox ~ 5x 1987 machine room 23 GB is hand-held One person does all the work Cost/tps is 1,000x less 25 micro dollars per transaction 4x200 Mhz cpu Hardware expert 1/2 GB DRAM OS expert 12 x 4GB disk Net expert DB expert 3 x7 x 4GB App expert disk arrays What Happened? Moore’s law: Things get 4x better every 3 years (applies to computers, storage, and networks) New Economics: Commodity class price/mips software $/mips k$/year mainframe 10,000 100 minicomputer 100 10 microcomputer 10 1 time GUI: Human - computer tradeoff optimize for people, not computers What Happens Next Last 10 years: 1000x improvement Next 10 years: ???? 1985 1995 2005 Today: text and image servers are free 25 m$/hit => advertising pays for them Future: video, audio, … servers are free “You ain’t seen nothing yet!” Kinds Of Information Processing Point-to-point Broadcast Conversation Lecture Money Concert Network Immediate Time- Mail Book Newspaper Database shifted It’s ALL going electronic Immediate is being stored for analysis (so ALL database) Analysis and automatic processing are being added Why Put Everything In Cyberspace? Point-to-point OR Low rent - broadcast Immediate OR time-delayed min $/byte Shrinks time - Network now or later Shrinks space - Locate here or there Process Analyze Summarize Automate processing - knowbots Database Magnetic Storage Cheaper Than Paper File cabinet: cabinet (four drawer) 250$ paper (24,000 sheets) 250$ space (2x3 @ 10$/ft2) 180$ total 700$ 3¢/sheet Disk: disk (4 GB =) 800$ ASCII: 2 mil pages 0.04¢/sheet (80x cheaper) Image: 200,000 pages 0.4¢/sheet (8x cheaper) Store everything on disk Databases Information at Your Fingertips™ Information Network™ Knowledge Navigator™ All information will be in an online database (somewhere) You might record everything you Read: 10MB/day, 400 GB/lifetime (eight tapes today) Hear: 400MB/day, 16 TB/lifetime (three tapes/year today) See: 1MB/s, 40GB/day, 1.6 PB/lifetime (maybe someday) Database Store ALL Data Types The old world: The new world: Millions of objects Billions of objects 100-byte objects Big objects (1 MB) People Name Address Objects have David NY behavior (methods) Mike Berk Paperless office Won Library of Congress online Austin People All information online Entertainment Name Address Papers Picture Voice Publishing David NY Business Mike Berk WWW and Internet Won Austin Billions Of Clients Every device will be “intelligent” Doors, rooms, cars… Computing will be ubiquitous Billions Of Clients Need Millions Of Servers All clients networked Clients to servers Mobile May be nomadic clients Fixed or on-demand clients Fast clients want Servers faster servers Server Servers provide Shared Data Control Super Coordination server Communication Thesis Many little beat few big 3 $1 1 MM million $100 K $10 K Pico Processor Micro Nano 1 MB 10 pico-second ram Mainframe Mini 100 MB 10 nano-second ram 10 GB 10 microsecond ram 1 TB 10 millisecond disc 100 TB 10 second tape archive 3.5" 2.5" 1.8" 5.25" 1 M SPECmarks, 1TFLOP 9" 14" 106 clocks to bulk ram Smoking, hairy golf ball Event-horizon on chip How to connect the many little parts? VM reincarnated How to program the many little parts? Multiprogram cache, On-Chip SMP Fault tolerance? Future Super Server: 4T Machine Array of 1,000 4B machines 1 bps processors CPU 1 BB DRAM 10 BB disks 50 GB Disc 1 Bbps comm lines 5 GB RAM 1 TB tape robot A few megabucks Challenge: Manageability Cyber Brick Programmability a 4B machine Security Availability Scaleability Future servers are CLUSTERS Affordability of processors, discs As easy as a single system Distributed database techniques make clusters work The Hardware Is In Place… And then a miracle occurs ? SNAP: scaleable network and platforms Commodity-distributed OS built on: Commodity platforms Commodity network interconnect Enables parallel applications Thesis: Scaleable Servers Scaleable Servers Commodity hardware allows new applications New applications need huge servers Clients and servers are built of the same “stuff” Commodity software and Commodity hardware Servers should be able to Scale up (grow node by adding CPUs, disks, networks) Scale out (grow by adding nodes) Scale down (can start small) Key software technologies Objects, Transactions, Clusters, Parallelism Scaleable Servers BOTH SMP And Cluster Grow up with SMP; 4xP6 is now standard SMP super Grow out with cluster server Cluster has inexpensive parts Departmental server Cluster of PCs Personal system SMPs Have Advantages Single system image easier to manage, easier to program threads in shared memory, disk, Net 4x SMP is commodity SMP super Software capable of 16x server Problems: >4 not commodity Departmental server Scale-down problem (starter systems expensive) There is a BIGGEST one Personal system Building the Largest Node There is a biggest node (size grows over time) Today, with NT, it is probably 1TB We are building it (with help from DEC and SPIN2) 1 TB GeoSpatial SQL Server database (1.4 TB of disks = 320 drives). 30K BTU, 8 KVA, 1.5 metric tons. Will put it on the Web as a demo app. 1-TB home page 10 meter image of the ENTIRE PLANET. www.SQL.1TB.com 2 meter image of interesting parts (2% of land) Todo loo da loo-rah, ta da ta-la la la Todo loo da loo-rah, ta da ta-la la la Todo loo da loo-rah, ta da ta-la la la Todo loo da loo-rah, ta da ta-la la la One pixel per meter = 500 TB uncompressed. Todo loo da loo-rah, ta da ta-la la la Todo loo da loo-rah, ta da ta-la la la Todo loo da loo-rah, ta da ta-la la la TM Better resolution in US (courtesy of USGS). 1-TB SQL Server DB Support Satellite and aerial files photos What’s TeraByte? 1 Terabyte: 1,000,000,000 business letters 150 miles of book shelf 100,000,000 book pages 15 miles of book shelf 50,000,000 FAX images 7 miles of book shelf 10,000,000 TV pictures (mpeg) 10 days of video 4,000 LandSat images 16 earth images (100m) 100,000,000 web page 10 copies of the web HTML Library of Congress (in ASCII) is 25 TB 1980: $200 million of disc 10,000 discs $5 million of tape silo 10,000 tapes 1997: $200 k$ of magnetic disc 48 discs $30 k$ nearline tape 20 tapes Terror Byte ! TB DB User Interface + + + Next Tpc-C Web-Based Benchmarks Client is a Web browser (7,500 of them!) Submits Order Invoice Query to server via Web HTTP page interface Web server translates to DB SQL does DB work IIS = Web Net: ODBC easy to implement performance is GREAT! SQL Grow UP and OUT 1 Terabyte DB Cluster: •a collection of nodes •as easy to program SMP super and manage as a server single node 1 billion transactions Departmental per day server Personal system Clusters Have Advantages Clients and servers made from the same stuff Inexpensive: Built with commodity components Fault tolerance: Spare modules mask failures Modular growth Grow by adding small modules Unlimited growth: no biggest one Windows NT Clusters Microsoft & 60 vendors defining NT clusters Almost all big hardware and software vendors involved No special hardware needed - but it may help Fault-tolerant first, scaleable second Microsoft, Oracle, SAP giving demos today Enables Commodity fault-tolerance Commodity parallelism (data mining, virtual reality…) Also great for workgroups! Billion Transactions per Day Project Building a 20-node Windows NT Cluster (with help from Intel) > 800 disks All commodity parts Using SQL Server & DTC distributed transactions Each node has 1/20 th of the DB Each node does 1/20 th of the work 15% of the transactions are “distributed” How Much Is 1 Billion Transactions Per Day? 1 Btpd = 11,574 tps (transactions per second) Millions of transactions per day ~ 700,000 tpm 1,000. (transactions/minute) 100. AT&T Mtpd 10. 185 million calls 1. (peak day worldwide) 0.1 AT&T Visa ~20 M tpd 1 Btpd BofA NYSE Visa 400 M customers 250,000 ATMs worldwide 7 billion transactions / year (card+cheque) in 1994 Parallelism The OTHER aspect of clusters Clusters of machines allow two kinds of parallelism Many little jobs: online transaction processing TPC-A, B, C… A few big jobs: data search and analysis TPC-D, DSS, OLAP Both give automatic parallelism Kinds of Parallel Execution Any Any Sequential Sequential Pipeline Program Program Partition Any Any Sequential Sequential outputs split N ways Program Program inputs merge M ways Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey Partitioned Execution Spreads computation and IO among processors Count Count Count Count Count Count A Table A...E F...J K...N O...S T...Z Partitioned data gives NATURAL parallelism Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey N x M way Parallelism Merge Merge Merge Sort Sort Sort Sort Sort Join Join Join Join Join A...E F...J K...N O...S T...Z N inputs, M outputs, no bottlenecks. Partitioned Data Partitioned and Pipelined Data Flows Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey The Parallel Law Of Computing Grosch's Law: 2x $ is 4x performance 1,000 MIPS 32 $ 1 MIPS .03$/MIPS 2x $ is 1$ 2x performance Parallel Law: Needs: 1,000 MIPS Linear speedup and linear scale-up 1,000 $ 1 MIPS Not always possible 1$ Thesis: Scaleable Servers Scaleable Servers Commodity hardware allows new applications New applications need huge servers Clients and servers are built of the same “stuff” Commodity software and Commodity hardware Servers should be able to Scale up (grow node by adding CPUs, disks, networks) Scale out (grow by adding nodes) Scale down (can start small) Key software technologies Objects, Transactions, Clusters, Parallelism The BIG Picture Components and transactions Software modules are objects Object Request Broker (a.k.a., Transaction Processing Monitor) connects objects (clients to servers) Standard interfaces allow software plug-ins Transaction ties execution of a “job” into an atomic unit: all-or-nothing, durable, isolated Object Request Broker Linking And Embedding Objects are data modules; transactions are execution modules Link: pointer to object somewhere else Think URL in Internet Embed: bytes are here Objects may be active; can callback to subscribers Objects Meet Databases The basis for universal data servers, access, & integration object-oriented (COM oriented) programming interface to data DBMS Database Breaks DBMS into components engine Spreadsheet Anything can be a data source Optimization/navigation “on top Photos of” other data sources A way to componentized a DBMS Mail Makes an RDBMS and O-R Map DBMS (assumes optimizer understands objects) Document Web Client HTML The Three VBscritpt VB Java plug-ins Tiers JavaScrpt Middleware Object ORB VB or Java VB or Java TP Monitor Script Engine Virt Machine server Web Server... Pool HTTP+ DCOM ORB Internet Object & Data server. DCOM (oleDB, ODBC,...) Legacy IBM Gateways 43 Server Side Objects Easy Server-Side Execution A Server Give simple execution Network environment Receiver Management Object gets Queue start Connections Configuration Context Security invoke Thread Pool shutdown Service logic Synchronization Everything else is Shared Data automatic Drag & Drop Business Objects 47 A new programming paradigm Develop object on the desktop Better yet: download them from the Net Script work flows as method invocations All on desktop Then, move work flows and objects to server(s) Gives desktop development three-tier deployment Software Cyberbricks Transactions Coordinate Components (ACID) Transaction properties Atomic: all or nothing Consistent: old and new values Isolated: automatic locking or versioning Durable: once committed, effects survive Transactions are built into modern OSs MVS/TM Tandem TMF, VMS DEC-DTM, NT-DTC Transactions & Objects Application requests transaction identifier (XID) XID flows with method invocations Object Managers join (enlist) in transaction Distributed Transaction Manager coordinates commit/abort Distributed Transactions Enable Huge Throughput Each node capable of 7 KtmpC (7,000 active users!) Can add nodes to cluster (to support 100,000 users) Transactions coordinate nodes ORB / TP monitor spreads work among nodes Distributed Transactions Enable Huge DBs Distributed database technology spreads data among nodes Transaction processing technology manages nodes Thesis: Scaleable Servers Scaleable Servers Built from Cyberbricks Allow new applications Servers should be able to Scale up, out, down Key software technologies Clusters (ties the hardware together) Parallelism: (uses the independent cpus, stores, wires Objects (software CyberBricks) Transactions: masks errors.
Pages to are hidden for
"Scaleable Computing Jim Gray Microsoft Corporation"Please download to view full document