Docstoc

Nortel

Document Sample
Nortel Powered By Docstoc
					            Scaleable Computing
                        Jim Gray
                    Microsoft Research
                        Gray@Microsoft.com
             http://research.Microsoft.com/~Gray/talks/

• Outline
  – The bandwidth revolution
  – ScaleUp, ScaleOut
  – TerraServer (Barclay, Slutz, Gray)


                      Gray @ Nortel 20 April 1999
                 Gilder‟s Law:
   3x bandwidth/year for 25 more years
• Today:
  – 10 Gbps per channel
  – 4 channels per fiber: 40 Gbps
  – 32 fibers/bundle = 1.2 Tbps/bundle
• In lab 3 Tbps/fiber (400 x WDM)
• In theory 25 Tbps per fiber
• 1 Tbps = USA 1996 WAN bisection bandwidth



                  1 fiber = 25 Tbps
                     Gray @ Nortel 20 April 1999
              Networking
          BIG!! Changes coming!
• Technology                                  • Software improving
   – 1 GBps bus “now”                               – User-level Net-IO
   – 1 Gbps links “now”
   – 1 Tbps links in 10 years
   – Fast & cheap switches                    • Software Challenge
• Standard wires for interconnect                   – reduce software tax
   – processor-processor                              on messages
   – processor-device (=processor)
                                                    – Today 30 K ins + 10 ins/byte
• Deregulation WILL work
                                                    – Goal: 1 K ins + .01 ins/byte
  someday


                                Gray @ Nortel 20 April 1999
              Technology (hardware)
       NOW                                             2003 Forecast (10x better)
• CPU: nearing 1 BIPS
   – but CPI rising fast (2-10)
                                               • CPU: 1bips real (smp)
     so less than 100 mips                           – 0.1$ - 1$/mips
   – 1$/mips to 10$/mips
                                               • DRAM: 1 Gb chip
• DRAM: 3 $/MB
• DISK: 20 $/GB                                   – 0.1 $/MB
• TAPE:                                        • Disk:
   – 20 GB/tape, 6 MBps                              – 10 GB smart cards
   – Lags disk                                         500GB RAID5 packs (NTinside)
   – 2$/GB offline, 15$/GB nearline                  – 3$ GB
• BUS/SAN: 10/1 GBps                           • BUS/SAN: 100/10 GBps
• WAN:     0.1 Mbps                            • WAN:     1 Gbps

                                  Gray @ Nortel 20 April 1999
     Microsoft SAN Infrastructure
        WinSock Direct Path
     App                         App
                                      110 MBps
    Winsock               Winsock       (that’s B not b)
                           Switch
                                      10% cpu
    MsAfd              MsAfd    HwSPI
U              U                        (not 200%)
K    AFD       K        AFD           Network faster than
                                        most IO
      TCP               TCP
                                        attachments
       IP                  IP               VIA
     NDIS               NDIS
    MiniPort            MiniPort
      HW                          HW
                   Gray @ Nortel 20 April 1999
                 SAN:
          Standard Interconnect
Gbps SAN: 110 MBps


                         • LAN faster than
   PCI: 70 MBps            memory bus?
                         • 1 GBps links in lab.
 UW Scsi: 40 MBps        • 100$ port cost soon
                         • Port is computer
 FW scsi: 20 MBps
                         • Winsock: 110 MBps
                              (10% cpu utilization at each end)

  scsi: 5 MBps       Gray @ Nortel 20 April 1999
             Outline

–The bandwidth revolution
–ScaleUp, ScaleOut
–TerraServer (Barclay, Slutz, Gray)



          Gray @ Nortel 20 April 1999
 Latency: How Far Away is the
      Andromeda
                Data?
10 9   Tape /Optical                                 2,000 Years
       Robot


 10 6 Disk                       Pluto                   2 Years




                             Sacramento                   1.5 hr
100    Memory
 10    On Board Cache              This Campus       10 min
  2    On Chip Cache                 This Room
  1    Registers                             My Head  1 min
                       Gray @ Nortel 20 April 1999
                System On A Chip
• Integrate Processing with memory on one chip
   –   chip is 75% memory now
   –   1MB cache >> 1960 supercomputers
   –   256 Mb memory chip is 32 MB!
   –   IRAM, CRAM, PIM,… projects abound
• Integrate Networking with processing on one chip
   – system bus is a kind of network
   – ATM, FiberChannel, Ethernet,.. Logic on chip.
   – Direct IO (no intermediate bus)
• Functionally specialized cards shrink to a chip.


                        Gray @ Nortel 20 April 1999
                 Scaleability
           Scale Up and Scale Out

                     Grow Up with SMP
                          4xP6 is now standard
SMP                  Grow Out with Cluster
Super Server
                     Cluster has inexpensive parts

Departmental                                   Cluster
Server                                         of PCs
Personal
 System
There'll be Billions Trillions Of Clients
     • Every device will be “intelligent”
     • Doors, rooms, cars…
     • Computing will be ubiquitous




                 Gray @ Nortel 20 April 1999
                Trillions
            Billions Of Clients
          Need Millions Of Servers
                   Billions
   All clients networked
                                                           Clients
    to servers
                                              Mobile
       May be nomadic                        clients
                                                                    Fixed
        or on-demand                                               clients
   Fast clients want                       Servers
    faster servers                                        Server

   Servers provide
       Shared Data
       Control
                                                        Super
       Coordination                                    server
       Communication
                     Gray @ Nortel 20 April 1999
Thin Client Support (FAT SERVERS )
                        TSO comes to NT

           lower per-client costs
                                                          Net PC
     Windows NT Server
     Terminal Server


                                                 Existing,
                                                 Desktop PC



        MS-DOS,                               Dedicated
        UNIX,                                 Windows
        Mac                                   terminal
        clients Gray @ Nortel 20 April 1999
           Windows 2000
              IntelliMirror™
• Extends CMU Coda File System ideas
• Files and settings mirrored on
  client and server
• Great for disconnected users
• Facilitates roaming
• Easy to replace PCs
• Optimizes network performance
        FAT STORAGE SERVERS
                 Gray @ Nortel 20 April 1999
 SMP -> nUMA: BIG FAT SERVERS
• Directory based caching   • Needs
  lets you build large SMPs   – 64 bit addressing
                              – nUMA sensitive OS
• Every vendor building a
                                 • (not clear who will do it)
  HUGE SMP
                            • Or Hypervisor
   – 256 way
                                                     – like IBM LSF,
   – 3x slower remote memory                         – Stanford Disco
   – 8-level memory hierarchy                            www-flash.stanford.edu/Hive/papers.html
      •
      •
          L1, L2 cache
          DRAM                                 • Not certain
      •   remote DRAM (3, 6, 9,…)
      •   Disk cache                                what happens next
      •   Disk
      •   Tape cache
      •   Tape




                                    Gray @ Nortel 20 April 1999
                                    Thesis
                      Many little beat few big
                                                                                 3
      $1                                                                     1 MM
     million    $100 K          $10 K
                                                                   Pico Processor
                                Micro       Nano
                                                        1 MB      10 pico-second ram
    Mainframe        Mini
                                                       100 MB  10 nano-second ram
                                                        10 GB 10 microsecond ram
                                                         1 TB 10 millisecond disc
                                                       100 TB 10 second tape archive
                                    3.5"   2.5" 1.8"
                            5.25"                                1 M SPECmarks, 1TFLOP
                9"
      14"                                                        106 clocks to bulk ram

   Smoking, hairy golf ball                                     Event-horizon on chip

   How to connect the many little parts?                        VM reincarnated

   How to program the many little parts?                        Multi-program cache,
                                                                 On-Chip SMP
   Fault tolerance & Management? 1999
                        Gray @ Nortel 20 April
4 B PC‟s (1 Bips, .1GB dram, 10 GB disk 1 Gbps Net, B=G)
The Bricks of Cyberspace
• Cost 1,000 $
• Come with
   – NT
   – DBMS
   – High speed Net
   – System management
   – GUI / OOUI
   – Tools
• Compatible with everyone else
• CyberBricks
                        Gray @ Nortel 20 April 1999
           Super Server: 4T Machine
   Array of 1,000 4B machines
    1 b ips processors
    1 B B DRAM
    10 B B disks
    1 Bbps comm lines                                           CPU



    1 TB tape robot
                                                                          50 GB Disc

   A few megabucks                                            5 GB RAM



   Challenge:
    Manageability
    Programmability                                          Cyber Brick
    Security                                                  a 4B machine
    Availability
    Scaleability
    Affordability
                                          Future servers are CLUSTERS
   As easy as a single system            of processors, discs

                                          Distributed database techniques
                          Gray @ Nortel 20 April 1999
                                          make clusters work
                  Scale OUT
           Clusters Have Advantages
• Fault tolerance:
  – Spare modules mask failures

• Modular growth without limits
  – Grow by adding small modules

• Parallel data search
  – Use multiple processors and disks
• Clients and servers made from the same stuff
  – Inexpensive: built with
    commodity CyberBricks

                       Gray @ Nortel 20 April 1999
     1988: IBM DB2 + CICS Mainframe
                 65 tps
•   IBM 4391
•   Simulated network of 800 clients
•   2m$ computer
•   Staff of 6 to do benchmark
         2 x 3725                                        Refrigerator-sized CPU
     network controllers



                                 16 GB
                                disk farm
                              4 x 8 x .5GB

                           Gray @ Nortel 20 April 1999
        1987: Tandem Mini @ 256 tps
      • 14 M$ computer (Tandem)
      • A dozen people (1.8M$/y)
      • False floor, 2 rooms of machines


                  32 node processor array       Admin expert

                                              Performance         Hardware experts
Simulate 25,600                               expert Network expert
                                                                      Auditor
    clients                                              Manager


                            40 GB
                    disk array (80 drives)




                               Gray @ Nortel 20 April 1999
                                                     DB expert OS expert
               1997: 9 years later
         1 Person and 1 box = 1250 tps
•    1 Breadbox ~ 5x 1987 machine room
•    23 GB is hand-held
•    One person does all the work
•    Cost/tps is 100,000x less
     5 micro dollars per transaction

                                                    4x200 Mhz cpu
    Hardware expert                                 1/2 GB DRAM
    OS expert                                       12 x 4GB disk
    Net expert
    DB expert
                                                        3 x7 x 4GB
    App expert        Gray @ Nortel 20 April 1999
                                                        disk arrays
        What Happened?
Where did the 100,000x come from?
•   Moore’s law:           100X (at most)
•   Software improvements:  10X (at most)
•   Commodity Pricing:     100X (at least)
•   Total              100,000X

•   100x from commodity
     – (DBMS was 100K$ to start: now 1k$ to start
     – IBM 390 MIPS is 7.5K$ today
     – Intel MIPS is 10$ today
     – Commodity disk is 50$/GB vs 1,500$/GB
     – ...

                           Gray @ Nortel 20 April 1999
                                                         time
Kilo
Mega    Computers shrink to a point
Giga
                  • Disks 100x in 10 years
                        2 TB 3.5” drive
Tera
Peta
                  • Shrink to 1” is 200GB
Exa               • Disk is super computer!
Zetta
Yotta




                      • This is already true of
                          printers and “terminals”
              Gray @ Nortel 20 April 1999
       All Device Controllers will be Cray 1‟s
• TODAY
   – Disk controller is 10 mips risc engine
     with 2MB DRAM                                              Central
   – NIC is similar power                                     Processor &
• SOON                                                          Memory
   – Will become 100 mips systems
     with 100 MB DRAM.
• They are nodes in a federation
        (can run Oracle on NT in disk controller).
• Advantages
   –   Uniform programming model
   –   Great tools                                             Tera Byte
   –   Security                                                Backplane
   –   economics (cyberbricks)
   –   Move computation to data (minimize traffic)
                                Gray @ Nortel 20 April 1999
   It‟s Already True of Printers
      Peripheral = CyberBrick
• You buy a printer
• You get a
  – several network interfaces
  – A Postscript engine
     •   cpu,
     •   memory,
     •   software,
     •   a spooler (soon)
  – and… a print engine.
                      Gray @ Nortel 20 April 1999
      Functionally Specialized Cards
            P mips processor
• Storage
                  ASIC                           Today:
                                                  P=50 mips
                                                  M= 2 MB
            M MB DRAM
• Network
                                                 In a few years

                  ASIC                            P= 200 mips
                                                  M= 64 MB
• Display

                  ASIC
                   Gray @ Nortel 20 April 1999
                       Implications
 Conventional                                   Radical
• Offload device handling                 • Move app to
  to NIC/HBA                                NIC/device controller
• higher level protocols:                 • higher-higher level
  I2O, NASD, VIA…                           protocols: DCOM.
• SMP and Cluster                         • Cluster parallelism is
  parallelism is important.                 VERY important.


           Central
         Processor &                                  h
          Memory
                        Gray @ Nortel 20 April 1999
      How Do They Talk to Each Other?
               •   Each node has an OS
               •   Each node has local resources: A federation.
               •   Each node does not completely trust the others.
               •   Nodes use RPC to talk to each other
                   – DCOM? IIOP? RMI?
Applications       – One or all of the above.              Applications
               • Huge leverage in high-level interfaces.
datagrams




                                                           datagrams
 streams




                                                            streams
               • Same old distributed system story.
   RPC




                                                              RPC
                                                                ?
     ?




VIAL/VIPL                                                  VIAL/VIPL
                                    Wire(s)
                             Gray @ Nortel 20 April 1999
            Disk = Node
•   has magnetic storage (100 GB?)
•   has processor & DRAM
•   has SAN attachment
•   has execution            Applications
    environment       Services                      DBMS
                              RPC, ...         File System
                             SAN driver         Disk driver
                                       OS Kernel




               Gray @ Nortel 20 April 1999
                 Scaleability
           Scale Up and Scale Out

                     Grow Up with SMP
                          4xP6 is now standard
SMP                  Grow Out with Cluster
Super Server
                     Cluster has inexpensive parts

Departmental                                   Cluster
Server                                         of PCs
Personal
 System
       HotMail: ~300 Computers




• FreeBSD and Solaris
                  Gray @ Nortel 20 April 1999
                                                Microsoft.com: ~150 nodes
   Building 11           Log Processing
                     Ave CFG: 4xP6,
                                              Staging Servers
                                                     (7)
                                                                      Ave CFG: 4xP5,
                                                                      512 RAM,
                                                                                                         The Microsoft.Com Site
                     1 GB RAM,                                        30 GB HD
   Internal WWW
                     180 GB HD                                        Ave Cost: $35K                                                                                                                                              European Data Center
                                                                                                                                                                                                                                        www.microsof t.com premium.microsof t.com
                     Ave Cost: $128K                                  FY98 Fcst: 12                                                                                                      IDC Stag ing Servers
                     FY98 Fcst: 2
                                                                                                   MOSWest                                                                                                                                                             (1)
                                                                                                                                                                                                                                     Ave CFG: 4xP6, (3)
                                                                                                                                                                                                                                                        Ave CFG: 4xP6,
                   FTP Servers                                                                                                                                                                                                                                 512 RAM,
                                                                                                                                                         SQLNet                                                                      512 RAM,
                Ave CFG: 4xP5,                                                                                                                                                                                                                                 30 GB HD               SQL SERVERS
                                                                                                                                                       Feeder LAN                                                                    50 GB HD
                512 RAM,                                                                                                                                                                   SQL Consolidators                                                   Ave Cost: $35K
  Download      30 GB HD                                                                                  Router                                                                                                                     Ave Cost: $50K
                                                                                                                                                                                                                                                               FY98 Fcst: 1                (2)
                                                                                                                      DMZ Stag ing Servers                                                                                           FY98 Fcst: 1
  Replication   Ave Cost: $28K                                                                                                                                                                                                                                                    Ave CFG: 4xP6,
                                                                                                                                                                                                     Ave CFG: 4xP6,                                                               512 RAM,
                FY98 Fcst: 0                                                Router                                                                                                                   1 GB RAM,                         FTP
                                                                                                                                                                    Live SQL Servers                                                                                              160 GB HD
                                                                                                                                                                                                     160 GB HD                    Download Server                                 Ave Cost: $80K
               SQL Reporting                                                                                                                                                   Ave CFG: 4xP6,        Ave Cost: $83K                     (1)                                       FY98 Fcst: 1
         Ave CFG: 4xP6,                      Live SQL Server                                             MOSWest                                                               512 RAM,              FY98 Fcst: 2                                              Switched
         512 RAM,                                                                                        Admin LAN          Ave CFG: 4xP6,                                     160 GB HD                                                                       Ethernet
         160 GB HD                                              All servers in Building11                                   512 RAM,                                           Ave Cost: $83K
         Ave Cost: $80K                                         are accessable from                                         50 GB HD                                           FY98 Fcst: 12
         FY98 Fcst: 2                                           corpnet.                                                    Ave Cost: $35K
                                                                                                                            FY98 Fcst: 2
                                                                                                                                                                                                                                                                              search.microsof t.com
                                                                                                                                                      msid.msn.com                                                                     msid.msn.com                                   (1)
                                                           register.microsof t.com                         www.microsof t.com                              (1)
                www.microsof t.com                                   (2) Ave CFG: 4xP6,                           (4)                                                                                                                       (1)
                                                                                                                                                                                                                                                                   Router
                       (4)                                                       512 RAM,
                                                                                                                                                                     search.microsof t.com
          Ave CFG: 4xP6,                                                         30 GB HD
          512 RAM,                                                                                                                                                           (3)                                Japan Data Center
                                                                                 Ave Cost: $43K                                                                                                                                          www.microsof t.com                          SQL SERVERS
          50 GB HD                                                               FY98 Fcst: 10                                                                               Ave CFG: 4xP6,
          Ave Cost: $50K                                                                                                                                                                                        premium.microsof t.com             (3)                                    (2)
                                                                                                                                                                             512 RAM,                                                                                                Ave CFG: 4xP6,
          FY98 Fcst: 17
                                                                                                           home.microsof t.com                                               30 GB HD                                   (1)         Ave CFG: 4xP6,                                   512 RAM,
                                                                         home.microsof t.com                                                  FDDI Ring                      Ave Cost: $28K                       Ave CFG: 4xP6,              512 RAM,                               160 GB HD
                                                                                (4)                               (3)                          (MIS2)                        FY98 Fcst: 7                         512 RAM,                    50 GB HD                               Ave Cost: $80K
         premium.microsof t.com                                                                                                                                                                                   30 GB HD                    Ave Cost: $50K                         FY98 Fcst: 1
                                                                                        Ave CFG: 4xP6                                                                                                             Ave Cost: $35K              FY98 Fcst: 1
                 (2)
      Ave CFG: 4xP6,                                                                    512 RAM                                                                                                                   FY98 Fcst: 1                                                      msid.msn.com
      512 RAM,                                                                          28 GB HD
                                                                                                                                                                    activ ex.microsof t.com
      30 GB HD                                     FDDI Ring                            Ave Cost: $35K      Ave CFG: 4xP6,                                                    (2)                                                                                                        (1)
                                                                                                                                                                                                                                                        Switched
      Ave Cost: $35K                                (MIS1)                              FY98 Fcst: 17       512 RAM,                                                         Ave CFG: 4xP6,
                                                                                                                                                                                                                                                        Ethernet
      FY98 Fcst: 3                                                                                          30 GB HD                                                         256 RAM,
                                                                                                            Ave Cost: $28K                                                   30 GB HD
                                                                                                            FY98 Fcst: 3                                                     Ave Cost: $25K                           FTP
                             Ave CFG: 4xP5,                                                              cdm.microsof t.com                                                  FY98 Fcst: 2                        Download Server
                             256 RAM,
                                                                                                               (1)                           Router                                                                    (1)             HTTP
                             12 GB HD                                                                                                                                                                                                                                       search.microsof t.com
                             Ave Cost: $24K                                                                                                                                                                                       Download Servers                                  (2)
                             FY98 Fcst: 0                                                                                         Router                                                                                                (2)                 Router
                                                                                                                                                                                        Internet
                                      msid.msn.com                                                         Router
                                           (1)                                                                                                                            Primary                                                                           2
                                                                                               Router                                                                    Gigaswitch                                                                                                             2
                                                                                                                                                                                                                                                           OC3                               Ethernet
                            premium.microsof t.com                                                                                                                                                                                                    (100Mb/Sec Each)
                                                                                               Router                                                                                   Internet                                                                                          (100 Mb/Sec Each)
         www.microsof t.com         (1)
                                                                                                           Router
                (3)                                                                                                                                                      Secondary
                                                                                                                                                                         Gigaswitch
                                                                                                                                  Router                                                                             13
                                                                                                           FTP.microsof t.com                Router                                                                 DS3
                                                                                                                                                                                                               (45 Mb/Sec Each)
                                                  FDDI Ring                                                       (3)
                                                                                                           Ave CFG: 4xP5,
       home.microsof t.com                         (MIS3)
                                                                        msid.msn.com                       512 RAM,                                         www.microsof t.com
              (2)                                                            (1)                           30 GB HD                                                (5)
                                                                                                           Ave Cost: $28K
                                                                                                           FY98 Fcst: 0


          Ave CFG: 4xP5,
          256 RAM,
                                                                                                         register.microsof t.com
                                                                                                                   (2)                         FDDI Ring
                                                                                                                                                (MIS4)
                                                                                                                                                                                                                                                  Internet
          20 GB HD
          Ave Cost: $29K
          FY98 Fcst: 2                                          register.microsof t.com                                                                         home.microsof t.com
          register.msn.com                                                (1)                            support.microsof t.com                                        (5)
                  (2)                                                                                             (2)
                                                                                                            Ave CFG: 4xP6,
                                        support.microsof t.com                                              512 RAM,
                    search.microsof t.com
                            (3)
                                                 (1)                                                        30 GB HD
                                                                                                            Ave Cost: $35K
                                                                                                                            Gray @ Nortel 20 April 1999
                                                                                                            FY98 Fcst: 9

\\Tweeks\Statistics\LAN and Server Name Info\Cluster Process Flow\MidYear98a.vsd
12/15/97
                              Other Clusters
• 16-node Cluster
  – 64 cpus
  – 2 TB of disk
  – Decision support
• 45-node Compaq Cluster
  –   140 cpus
  –   14 GB DRAM
  –   4 TB RAID disk
  –   OLTP (Debit Credit)
       • 1 B tpd (14 k tps)

                               Gray @ Nortel 20 April 1999
Berkeley NOW (network of workstations) Project
               http://now.cs.berkeley.edu/

                               • 105 nodes
                                     – Sun UltraSparc 170,
                                       128 MB,
                                       2x2GB disk
                                     – Myrinet interconnect (2x160MBps
                                       per node)
                                     – SBus (30MBps) limited
                               •    GLUNIX layer above Solaris
                               •    Inktomi (HotBot search)
                               •    NAS Parallel Benchmarks
                               •    Crypto cracker
                               •    Sort 9 GB per second
                 Gray @ Nortel 20 April 1999
    NCSA Super Cluster
http://access.ncsa.uiuc.edu/CoverStories/SuperCluster/super.html

    • National Center for Supercomputing Applications
      University of Illinois @ Urbana
    • 512 Pentium II cpus, 2,096 disks, SAN
    • Compaq + HP +Myricom + WindowsNT
    • A Super Computer for 3M$
    • Classic Fortran/MPI programming
    • DCOM programming model
                      Gray @ Nortel 20 April 1999
             Outline

–The bandwidth revolution
–ScaleUp, ScaleOut
–TerraServer (Barclay, Slutz, Gray)
 A scaleup example



          Gray @ Nortel 20 April 1999
Kilo
        Some Tera-Byte Databases
Mega      • The Web: 1 TB of HTML
          • TerraServer 1 TB of images
Giga      • Several other 1 TB (file) servers
          • Hotmail: 7 TB of email
Tera
          • Sloan Digital Sky Survey:
                40 TB raw, 2 TB cooked
Peta      • EOS/DIS (picture of planet each week)
              – 15 PB by 2007
Exa       • Federal Clearing house: images of checks
              – 15 PB by 2006 (7 year history)
Zetta     • Nuclear Stockpile Stewardship Program
              – 10 Exabytes (???!!)
Yotta                  Gray @ Nortel 20 April 1999
Kilo    A letter        Info Capture record
                             • You can
        A novel                          everything you see
Mega                                     or hear or read.
              A                        • What would you do
Giga         Movie                       with it?
Tera           Library of              • How would you
             Congress (text)             organize & analyze
Peta         LoC (image)                 it?

Exa                   Video                       8 PB per lifetime (10GBph)
            All Disks Audio                       30 TB (10KBps)
                      Read or write:              8 GB (words)
Zetta      All Tapes See: http://www.lesk.com/mlesk/ksg97/ksg.html

                    Gray @ Nortel 20 April 1999
Yotta
    Michael Lesk’s Points
    www.lesk.com/mlesk/ksg97/ksg.html

• Soon everything can be recorded and kept

• Most data will never be seen by humans

• Precious Resource: Human attention
            Auto-Summarization
            Auto-Search
  will be a key enabling technology.
             Gray @ Nortel 20 April 1999
        The TerraServer
http://www.terraserver.microsoft.com/




             Gray @ Nortel 20 April 1999
               Database & application UI
• Coverage: Range from 70ºN to 70ºS                       • Concept: User navigates
  today: 35% U.S., 1% outside U.S.                          an „almost seamless‟
• Source Imagery:                                           image of earth
    – 4 TB 1sq meter/pixel Aerial (USGS - 60,000
                                                                                    200x200 m tile
      46Mb B&W- 151Mb Color IR files)
    – 1 TB 1.56 meter/pixel Satellite
      (Spin-2 - 2400 300 Mb B&W)                                             ,4 x,4 km browse

• Display Imagery: 200x200 pixel images,                             .8 x .8 km 8m thumbnail
    subsample to build image pyramid
• Nav Tools:                                                    1.6x 1.6 km “city view”
    –   1.5 m place names
    –   “Click-on” Coverage map
    –   Expedia & Virtual Globe map
    –   Pick of the week
                                  Gray @ Nortel 20 April 1999
                      Image Data
                                       DRG
                                                     50,000
4 TB                                                 Topo
6TB                                                  Maps
Coming                                               adding
                                                     now


         USGS “DOQ”
                                                    1 TB
                       Spin-2                       WorldWide

                                                    New Data
                                                    Coming

                      Gray @ Nortel 20 April 1999
              Software Architecture
                                                                         Web Client
 Internet Information Server 4.0                                                   Java
                                                                         HTML
                                                                                  Viewer
         Terra-Server
  24 Active Server Pages
                                                                             IE 3…5
       Active Data Object                                                 Netscape 3…4
             ODBC                              The Internet
          Terra-Server
  19
       Stored Procedures

   39                                                                    Microsoft
(14 Img)
          SQL Server 7.0
                                                                     Site Serve EE 3.0
(8 Place) Terra-Server DB
                                                                    SPIN-2/USGS Store 13
                                                                    Active Server Pages
Terra-Server Web Site
                                                                 Image Delivery
                                                                                SQL Server
                                                                   Application

                                   Gray @ Nortel 20 April 1999   Image Commerce Site(s)
                How Images are Found
          Expedia
                                                          Name
             Map
                                                         Search
             22%
                                                           40%
Famous
 Places
   18%

          Geo
Coordinate                               Coverage
           1%                                      Map
                                                   19%

                     Gray @ Nortel 20 April 1999
               TerraServer: Lots of Web Hits
                                          35000000


                                          30000000
                                                                                   TerraServer Jun 22 to Feb 28 1999
 Summary          Total Average Max
                                          25000000
                                                                                                                                                           Sessions
Unique Users      17 M      69 k    150 k                                                                                                                  Hit
Sessions          24 M      94 k    172 20000000
                                        k                                                                                                                  Page View




                                          Count
                                                                                                                                                           DB Query
Hits              1.7 B     6.8 M   29 M15000000
                                                                                                                                                           Image
Page Views       274 M      1.1 M   6.6 M
                                        10000000

DB Queries        1.5 B     5.8 M   18 M 5000000
Image Xfers       1.3 B     5.0 M   15 M
                                                  0




                                                                7/6/98




                                                                                   8/3/98




                                                                                                                                                                                                      1/4/99




                                                                                                                                                                                                                         2/1/99
                                                      6/22/98




                                                                         7/20/98




                                                                                            8/17/98

                                                                                                      8/31/98

                                                                                                                9/14/98

                                                                                                                          9/28/98

                                                                                                                                    10/12/98

                                                                                                                                                10/26/98

                                                                                                                                                            11/9/98

                                                                                                                                                                      11/23/98

                                                                                                                                                                                 12/7/98

                                                                                                                                                                                           12/21/98




                                                                                                                                                                                                               1/18/99




                                                                                                                                                                                                                                  2/15/99
        As of Feb 28, 1999
                                                                                                                                               Date



• Today:
    –   1.7 billion web hits
    –   1 TB, largest SQL DB on the Web
    –   100 qps average, 1,000 Qps peak
    –                      so far
        1.5 B SQL queries Gray @ Nortel 20 April 1999
                      Logical Schema
Country          State             Image Data &               Theme Meta        Spin
 Name            Name
                                   Meta Data                  Information      Frame
                                                                                Meta


                                                               Img Meta        Tile Meta
PlaceType           Place
                    Name


 Feature            Where
  Type              Am I
                                          Jump          Browse         Thumb   Tile Img
                                           Img            Img           Img
                Gazetteer
Index on                             Lookup by UGrid or ZGrid ID plus resolution
• image, place, type                 Lookups are fast.
• image, state, type
• image, state, country, type
                                     Indices are in DRAM (auto-magically by SQL)
• image, place, state, type          SQL manages all the tiles and indices
• image, place, country, type        Images are brought in on demand
all lookups are fast            Gray @ Nortel 20 April 1999
Image Load and Update

DLT
Tape
             “tar”



                                                Image
                            Metadata            Cutter
Active Server Pages                             Merge
   Cut & Load               Load DB
                                                          ODBC Tx
                                                          TerraLoader
    Scheduling
      System                                               ODBC TX




                         Dither
                                              ODBC Tx    TerraServer
                 Image Pyramid                              SQL
                     From base                             DBMS
                Gray @ Nortel 20 April 1999
TerraServer Administrator Web Site
• Accessible by Microsoft, SPIN-2, and USGS
• Web browser forms to:
  – Edit Famous Places list
  – Modify Image Status fields
  – Define new TerraServer Administrators




                 Gray @ Nortel 20 April 1999
  Load & Backup&Recovery
• Backup and Recovery
  – Using Legato Networker
                                                      Performance
    integrated with SQL                      Data Bytes Backed Up             1.2   TB
    Backup/Restore Utility                   Total Time                      7.25   Hours
                                             Number of Tapes Consumed          27
  – Fast, incremental, differential,         Total Tape Drives                 10
    online                                   Data ThroughPut                  168   GB/Hour
                                             Average ThroughPut Per Device   16.8   GB/Hour
• Restore                                    Average Throughput Per Device   4.97   MB/Sec
                                             NTFS Logical Volumes               2
  – Fast, incremental (file oriented),
    not online.
• SQL Server Enterprise
  Manager
  – DBA Maintenance
  – SQL Performance Monitor
                           Gray @ Nortel 20 April 1999
                 Site Configuration

   9710                 Enterprise Storage Array                    Alpha
   TimberWolf                                                       8400

                         9 HSZ70 Ultra-SCSI                         (8x440)
                       Dual redundant Controllers                    10GB
                                                                      Ram

                       324 9 GB Seagate Disks




 Compaq      Compaq     Compaq           Compaq           Compaq     Compaq
  5500        5500       5500             5500             5500       5500
4x200mhz    4x200mhz   4x200mhz         4x200mhz         4x200mhz   4x200mhz
  Web         Web        Web              Web              Web        Web
 Servers     Servers    Servers          Servers          Servers    Servers
                           Gray @ Nortel 20 April 1999
                                                                              To the Web
•   Compaq AlphaServer 8400
•   8x400Mhz Alpha cpus                                   The
•   10 GB DRAM                                         Microsoft
•   324 9.2 GB StorageWorks Disks                     TerraServer
    – 3 TB raw, 2.4 TB of RAID5
• STK 9710 tape robot (4 TB)
                                                       Hardware
• WindowsNT 4 EE, SQL Server 7.0




                        Gray @ Nortel 20 April 1999
                 File System Config
• Use StorageWorks to form 28 RAID5 sets
    Each raid set has 11 disks (16 spare drives)
• Use NTFS to form 4 595GB NT volumes
    Each striped over 7 Raid sets on 7 controllers
• Create 26 20,000MB files on F:, 27 on G:
• DB is File Group of 53 files (1.011TB)




     F:                      G:                       H:
                                  Gray @ Nortel 20 April 1999
                                                                I:
   SQL 7 TerraServer Availability
• Operating for 9 months: 6400 hrs
• Unscheduled outage: 36.5 minutes:
 99.9905% scheduled up
• Scheduled outage: 60 minutes                 TotalTime (Hours)          Down Time
                                                                          (Hours:minutes)



• Availability:
                                           6480
                                                                   4:00
                                           5760
                                                                   3:30

  99.96% overall up                        5040
                                                                   3:00
                                           4320


• No NT failures (ever)                    3600

                                                        Up
                                                                   2:30


                                                                   2:00


• One SQL7 Beta2 bug
                                           2880
                                                                                 Scheduled
                                                                   1:30
                                           2160



• No failures in                           1440

                                            720
                                                                   1:00


                                                                   0:30

 July, Aug, Oct, Dec, Jan, Feb, Mar            0                   0:00
                                                                                   Un Scheduled

                        Gray @ Nortel 20 April 1999
              Things we did right...
• Use a database to store images:
   – Simplify management
   – Can dynamically load data into tables while viewing application
     is active
• Simple X, Y Z-Grid navigation system
• Used ImgStatus to control logical “presence” of the image
  in the app
• “Stitching tiles together” from multiple input images to
  form seamless mosaic
• Offering two forms of seamless -- time based (SPIN-2)
  and theme based (DOQ)
                         Gray @ Nortel 20 April 1999
        TS 3: Things are changing...
• Square Tiles, power of 2 size (200x200)
• Power of 2 zoom levels (2:1, 4:1, 8:1, etc.)
   so uniform tile size on each zoom (variable ground size per tile)
• Indexing system independent of tile size
•   Digital Raster Graphics (Topo maps)
•   Layered Maps (Topo merge with DOQ)
•   Integrate with other applications and services
•   Later:
    – Digital Elevation Models (DEMs)
    – Other foreign data sources (EU, etc.)

                         Gray @ Nortel 20 April 1999
        What TerraServer Shows
• Can serve huge databases on Internet
  for about a penny a page view
      mostly phone bill (!).
      Advertising pays more than a penny a page.
• Commodity tools do scale fairly far.
• A few people (3 developers, 1 operator)
  using power tools
  can build an impressive web site

                  Gray @ Nortel 20 April 1999
                               Thank You!




                                                      SPIN-2

Tom Barclay did most of this app,   Gray @ Nortel 20 April 1999
Slutz and Gray helped.
             Outline

–The bandwidth revolution
–ScaleUp, ScaleOut
–TerraServer (Barclay, Slutz, Gray)



          Gray @ Nortel 20 April 1999
    end


Gray @ Nortel 20 April 1999
                 Windows NT Versus UNIX
         Best Results on an SMP: SemiLog plot shows 3x (2 year) lead by UNIX
                                   see www.tpc.org




       120,000
                    tpmC vs Time                                100,000
                                                                             tpmC vs Time
       110,000
       100,000
        90,000
        80,000                      Unix                                            Unix
        70,000
tpmC




                                                         tpmC
        60,000          h
                                                                 10,000      h




        50,000
        40,000                               NT                                             NT
        30,000
        20,000
        10,000
             0                                                    1,000
                 Jan-       Jan-   Jan-   Jan- @Jan- 20 April 1999
                                           Gray Nortel                    Jan- Jan- Jan- Jan- Jan-
                                                                           95   96   97   98   99
                  95         96     97     98    99
TPC C Improvements (MS SQL) 40% hardware,
250%/year on Price,           100% software,
                         100% PC Technology
100%/year performance

 $1,000
                       $/tpmC vs time                                                   tpmC vs time
                                                                 100,000




                                                                   10,000
 $/tpmC




                                                                tpmC
     $100



                                                                       1,000


                                            1.5
                                       2.755676


          $10                                  Gray @   Nortel 20 April 1999
                                                                    100
            Jan-94 Jan-95 Jan-96 Jan-97 Jan-98 Dec-98
                                                                          Jan-94   Jan-95   Jan-96   Jan-97   Jan-98   Dec-98
     Price Breakdown (6 months old)

                        TPC Price/tpmC
70
                                          61
60
                   53
                                                    Sequent/Oracle 89 k tpmC @ 170$/tpmC
50   47
          45                                        Sun Oracle 52 k tpmC @ 134$/tpmC
                                                    HP+NT4+MS SQL 16.2 ktpmC @ 33$/tpmC
40                      35
                                               30
30


20                             17                                         17.0
                                                                                 12
               8                                            9
10                                                              7
                                                    4                 5               3

 0
     processor          disk               Nortel 20 April 1999 net
                                    Gray @ software                        total/10
                   (dis) Economy Of Scale
          60
                      Bang for the Buck/ tpmC/K$
          50   MS SQL Server


          40
tpmC/k$




          30

          20                                                  Sybase

          10
                              Informix                       Oracle

           0
               0     10,000      20,000        30,000         40,000   50,000   60,000

                                               tpmC
                               Gray @ Nortel 20 April 1999

				
DOCUMENT INFO