The Personal Petabyte The Enterprise Exabyte - Microsoft Research by yurtgc548

VIEWS: 3 PAGES: 49

									The Personal Petabyte
The Enterprise Exabyte


                     Jim Gray
                Microsoft Research
                 Presented at IIST
           Asilomar 10 December 2003
   http://research.microsoft.com/~gray/talks



                                               1
                      Outline

  • History
  • Changing Ratios
  • Who Needs a Petabyte?


Thesis:
     in 20 years, Personal Petabyte will be affordable.
     Most personal bytes will be video.
     Enterprise Exabytes will be sensor data.
                                                     2
             An Early Disk
• Phaistos Disk:
  – 1700 BC
  – Minoan
    (Cretian, Greek)




• No one can read it 

                             3
       Early Magnetic Disk 1956
• IBM 305 RAMAC

• 4 MB

• 50x24” disks

• 1200 rpm

• 100 ms access

• 35k$/y rent

• Included computer &
  accounting software
  (tubes not transistors)         4
        10 years later (1966 Illiac)
1.6 meters




                                       30 MB




                                               5
Or 1970 IBM 2314 at 29MB



           970




                           6
    History: 1980 Winchester
• Seagate 5 ¼” 5 MB   • Fujitsu Eagle 10” 450MB




                                            7
                     The MAD Future
                      Terror Bytes8
• In the beginning there was the
  Paramagnetic Limit: 10Gbpsi            Bit Density       Density vs Time
• Limit keeps growing
                                    b/µm2 Gb/in2
                                        3,000 2,000
                                                           b/µm2 & Gb/in2
                                                                                      ?: NEMS,
  (now ~ 200Gbpsi)
                                        1,000 600                                    Florescent?
• Mark H. Kryder, Seagate                                                            Holograpic,
                                         300 200                                        DNA?
  Future Magnetic Recording
                                                                        SuperParmagnetic Limit
  Technologies                           100   60


  FAST 2001@Monterey PDF. apologizes:    30     20
                                                                                Wavelength Limit
  “Only 100x density improvement,
                                                                                            ODD
  then we are out of ideas”
                                          10     6
                                                                       DVD

• That’s 20 TB desktop                    3      2
                                                           CD

          4 TB laptop!                     1    0.6
                                                      1990 1992 1994 1996 1998 2000 2002 2004 2006 2008



                                                                                              8
                        Outline
• History
• Changing Ratios
  –   Disk to Ram                             Disk Performance vs Time
  –   DASD is Dead                            100                     10.


  –   Disk space is free




                                 seeks per second
                                 bandwidth: MB/s




                                                                           Capacity (GB)
  –   Disk Archive-Interchange
                                                10                    1.
  –   Network faster than disk
  –   Capacity, Access
  –   TCO == people cost                            1                 0.1

  –
                                                    1980    1990   2000
      Smart disks happened
                                                           Year
  –   The entry cost barrier
• Who Needs a Petabyte?                                                9
                          Storage Ratios Changed
•             10x better access time                      • RAM/disk media price ratio
•             10x more bandwidth                            changed
•             100x more capacity                             – 1970-1990            100:1
•             Data 25x cooler (1Kaps/20MB vs 1Kaps/500MB)    – 1990-1995             10:1
•             4,000x lower media price                       – 1995-1997             50:1
•             20x to 100x lower disk price                   – today ~   1$/GB disk 200:1
•             Scan takes 10x longer (3 min vs 45 min)                  200$/GB dram

             Disk Performance vs Time                                                 Disk accesses/second                     Storage Price vs Time
                                                                                             vs Time                          Megabytes per kilo-dollar
             100                     10.                                        100
                                                                                                                            10,000.
seeks per second
bandwidth: MB/s




                                                          Accesses per Second




                                                                                                                             1,000.
                                          Capacity (GB)




                                                                                                                              100.




                                                                                                                    MB/k$
               10                    1.                                         10
                                                                                                                                10.


                                                                                                                                 1.

                   1                 0.1                                         1                                              0.1
                   1980    1990   2000
                                                                                 1980            1990        2000                 1980     1990   10   2000

                          Year                                                                  Year                                       Year
Price_Ram_TB(t+10) = Price_Disk_TB(t)
Disk Data Can Move to RAM in 10 years

 • Disk ~100x cheaper                  Storage Price vs Time
   than RAM per byte                  Megabytes per kilo-dollar
                                  10,000.

 • Both get 100x bigger            1,000.
   in 10 years.
                                    100.



                          MB/k$
 • Move data to                       10.                    100:1
   main memory                         1.
                                                   10 years
 • Seems: RAM/Disk                    0.1

   bandwidth ~100:1                     1980       1990
                                                   Year       11
                                                                  2000
          DASD           (direct access storage device)    is Dead
• accesses got cheaper
   – Better disks
   – Cheaper disks!                                   1.E+6                     1000
• Disk access/bandwidth: the                          1.E+5
                                                              Kaps over time
  scarce resource
• 2003: 100 minute Scan                               1.E+4    Kaps/$




                                                                                      Kaps/disk
                                                  Kaps/$
  1990: 5 minute Scan                                 1.E+3                     100
• Sequential bandwidth                                1.E+2
  50x faster than random
                                                      1.E+1
  Random Scan 3 days                                                     Kaps
• Ratio will get 10x worse                            1.E+0                     10
  in 10 years                                              1970 1980 1990 2000
  100x more capacity,
   10x more bandwidth.
• Invent ways to
  trade capacity for bandwidth                                  300 GB
  Use the capacity
      without using bandwidth.                                                  12
                                                                    50 MB/s
Disk Space is “free”
Bandwidth & Accesses/sec are not
•   1k$/TB going to 100$/TB
•   20 TB disks on the (distant) horizon
•   100x density,
•   Waste capacity intelligently
    – Version everything
    – Never delete anything
    – Keep many copies
       •   Snapshots
       •   Mirrors (triple and geoplex)
       •   Cooperative caching (Farsite and OceanStore)
       •   Disk Archive
                                                          13
       Disk as Archive-Interchange
• Tape is archive / interchange / low cost
• Disc now competitive in all 3 categories
• What format? Fat? CDFS?..
• What tools?
• Need the software to do
     disk-based backup/restore
• Commonly snapshot (multi-version FS)
• Radical: peer-to-peer file archiving
    – Many researchers looking at this
      OceanStore, Farsite, others…
                                             14
               Disk vs Network
         Now the Network is Faster (!)
• Old days:
                                                                Disk vs Net Bandwidth
   – 10 MBps disk, low cpu cost ( 0.1 ins/b)
                                                      1000
   – 1 MBps net, huge cpu cost (10 ins/b)                        disk
• New days:                                           100        net
   – 50 MBps disk, low cpu cost




                                               MB/s
   – 100 MBps net, low cpu cost (toe, rdma)            10



• Consequence:                                          1

   – You can remote disks.
   – Allows consolidation                              0.1
                                                         1970      1980    1990    2000        2010
   – Aggregate (bisection) bandwidth
     still a problem.

                                                                                          15
           Storage TCO == people time
1980 rules-of-thumb:
  1 systems programmer per mips
                                           • Automate everything
  1 data admin per 10GB                    • Use redundancy to
    800 sys programmers + 4 data admins
     for your laptop                         mask (and repair)
Sometimes it must seem like that but…        problems.
Today one data admin per 1 TB ... 300 TB
                                           • Save people,
Depending on process and data value.         spend hardware




                                                             16
        Smart Disks Happened
Disk appliances are here:
  Cameras
  Games
  PVRs
  FileServers
Challenge:
  entry price


                               18
          The Entry Cost Barrier
            Connect the Dots
•   Consumer electronics want low entry cost
•   1970: 20,000$
•   1980: 2,000$


                      ln(price)
                                        Wante
•   2000:    200$                       dToday


•   2010      20$
                                  Time
• If magnetics can’t do this,
                another technology will.
• Think: copiers, hydraulic shovels,…            19
                                                         Yotta
                        Outline
• History                                                Zetta
• Changing Ratios
                                                         Exa
• Who Needs a Petabyte?
  –   Petabyte for 1k$ in 15-20 years
                                                         Peta
  –   Affordable but useless
  –   How much information is there?       We are here
                                                         Tera
  –   The Memex vision
  –   MyLifeBits                                         Giga
  –   The other 20% (enterprise storage)
                                                         Mega

                                                         Kilo
                                                           20
                   A Bleak Future:
                The ½ Platter Society?
• Conclusion from Information Storage Industry Consortium
   HDD Applications Roadmap Workshop:
    – “Most users need only 20GB”
    – We are heading to a ½ platter industry.
• 80% of units and capacity
  is personal disks
  (not enterprise servers).
• The end of disk capacity demand.
• A zero billion dollar industry?

                                                            21
Try to fill a terabyte in a year
         Item         Items/TB   Items/day
     300 KB JPEG           3M         9,800
       1 MB Doc            1M         2,900
    1 hour 256 kb/s         9K           26
      MP3 audio
   1 hour 1.5 Mbp/s        290          0.8
     MPEG video

Petabyte volume has to be some form of video.
                                                22
  Growth Comes From NEW Apps
• The 10M$ computer of 1980 costs 1k$ today
• If we were still doing the same things,
  IT would be a 0 B$/y industry
• NEW things absorb the new capacity


• 2010 Portable ?
  – 100 Gips processor
  – 1 GB RAM
  – 1 TB disk
  – 1 Gbps network
  – Many form factors                         23
     The Terror Bytes are Here

• 1 TB costs 1k$ to buy
• 1 TB costs 300k$/y to own                              Yotta
    • Management & curation are expensive
                                                         Zetta
    • (I manage about 15TB in my spare time.
                                                         Exa
      no, I am not paid 4.5M$/y to manage it)
  – Searching 1TB takes minutes                 We are
                                                         Peta
                                                 here
    or hours or days or..                                Tera
• I am Petrified by Peta Bytes                           Giga
• But… people can “afford” them so,                      Mega
  we have lots to do – Automate!
                                                           24
                                                         Kilo
          How much information is there?
                                                                                           Yotta
• Soon everything can be                                                   Everything
      recorded and indexed                                                                 Zetta
                                                                               !
• Most bytes will never be                                                  Recorded
                                                                             All Books      Exa
  seen by humans.                                                            MultiMedia
• Data summarization,                                                                      Peta
  trend detection                                                          All books
  anomaly detection                                                         (words)        Tera
  are key technologies
See Mike Lesk:
                                                                                   .Movi
                                                                                     e     Giga
  How much information is there:
      http://www.lesk.com/mlesk/ksg97/ksg.html
See Lyman & Varian:                                                           A Photo
                                                                                           Mega
  How much information
http://www.sims.berkeley.edu/research/projects/how-much-info/                   A Book     25
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli
                                                                                           Kilo
                      Memex
         As We May Think, Vannevar Bush, 1945

“A memex is a device in which an individual
  stores all his books, records, and
  communications, and which is mechanized
  so that it may be consulted with exceeding
  speed and flexibility”
“yet if the user inserted 5000 pages of
  material a day it would take him hundreds
  of years to fill the repository, so that he can
  be profligate and enter material freely”

                                                    26
Why Put Everything in Cyberspace?

 Low rent                                          Point-to-Point
  min $/byte                                       OR




                       Immediate OR Time Delayed
                                                   Broadcast

 Shrinks time
  now or later
                                                    Locate
 Shrinks space                                      Process
  here or there                                     Analyze
                                                    Summarize

 Automate processing
                                                                    27
  knowbots
      How Will We Find Anything?
• Need Queries, Indexing, Pivoting,
  Scalability, Backup, Replication,
  Online update, Set-oriented access
• If you don’t use a DBMS,
  you will implement one!
• Simple logical structure:
  – Blob and link is all that is inherent
  – Additional properties (facets == extra tables)
    and methods on those tables (encapsulation)
• More than a file system
• Unifies data and meta-data
                                                     SQL ++
                                                     DBMS
                                                          28
MyLifeBits The guinea pig
• Gordon Bell is digitizing his life
• Has now scanned virtually all:
   –   Books written (and read when possible)
   –   Personal documents (correspondence, memos, email, bills, legal,0…)
   –   Photos
   –   Posters, paintings, photo of things (artifacts, …medals, plaques)
   –   Home movies and videos
   –   CD collection
   –   And, of course, all PC files
• Now recording: phone, radio, TV (movies), web pages…
  conversations
• Paperless throughout 2002. 12” scanned, 12’ discarded.
• Only 30 GB!!! Excluding digital videos
• Video is 2+ TB and growing fast                                     29
Capture and encoding




                       30
I mean everything




                    31
        gbell wag: 67 yr, 25Kday life
           a Personal Petabyte                                         1PB
                            Lifetime Storage

     1000.

      100.

       10.

TB      1.

       0.1

      0.01

     0.001
             Msgs    web      Tifs   Books   jpegs   1KBps   music   Videos
                    pages                            sound
                                                                        32
80% of data is personal / individual.
  But, what about the other 20%?
• Business
  – Wall Mart online: 1PB and growing….
  – Paradox: most “transaction” systems < 1 PB.
  – Have to go to image/data monitoring for big data
• Government
  – Government is the biggest business.
• Science
  – LOTS of data.
                                                   33
        Information Avalanche
• Both
  – better observational instruments and
  – Better simulations
  are producing a data avalanche
• Examples                                         Image courtesy of C. Meneveau & A. Szalay @ JHU
  – Turbulence: 100 TB simulation
              then mine the Information
  – BaBar: Grows 1TB/day
              2/3 simulation Information
              1/3 observational Information
  – CERN: LHC will generate 1GB/s
              10 PB/y
  – VLBA (NRAO) generates 1GB/s today
  – NCBI: “only ½ TB” but doubling each year, very rich dataset.
  – Pixar: 100 TB/Movie
                                                                                      34
 Q: Where will the Data Come From?
      A: Sensor Applications
• Earth Observation
   – 15 PB by 2007
• Medical Images & Information + Health Monitoring
   – Potential 1 GB/patient/y  1 EB/y
• Video Monitoring
   – ~1E8 video cameras @ 1E5 MBps
       10TB/s  100 EB/y
       filtered???
• Airplane Engines
   – 1 GB sensor data/flight,
   – 100,000 engine hours/day
   – 30PB/y
• Smart Dust: ?? EB/y
http://robotics.eecs.berkeley.edu/~pister/SmartDust/                     35
http://www-bsac.eecs.berkeley.edu/~shollar/macro_motes/macromotes.html
      DataGrid Computing

• Store exabytes twice
   (for redundancy)
• Access them from anywhere
• Implies huge archive/data
  centers
• Supercomputer centers
  become super data centers
• Examples:
  Google, Yahoo!, Hotmail,
  BaBar, CERN, Fermilab,
                              40
  SDSC, …
                      Outline

  • History
  • Changing Ratios
  • Who Needs a Petabyte?


Thesis:
     in 20 years, Personal Petabyte will be affordable.
     Most personal bytes will be video.
     Enterprise Exabytes will be sensor data.
                                                    41
Bonus Slides




               42
                       TerraServer V4
• 8 web front end
• 4x8cpu+4GB DB
• 18TB triplicate disks
  Classic SAN
                                                 WEB
    (tape not shown)                     SAN     x8

•   ~2M$
•   Works GREAT!
•   2000…2004                           SQL x4

•   Now replaced by..

                                                 43
TerraServer V5
•   Storage Bricks
    – “White-box commodity servers”
    – 4tb raw / 2TB Raid1 SATA storage
    – Dual Hyper-threaded Xeon 2.4ghz, 4GB RAM
• Partitioned Databases (PACS – partitioned array)
    – 3 Storage Bricks = 1 TerraServer data
    – Data partitioned across 20 databases              KVM / IP
    – More data & partitions coming
• Low Cost Availability
    – 4 copies of the data
        • RAID1 SATA Mirroring
        • 2 redundant “Bunches”
    – Spare brick to repair failed brick
      2N+1 design
    – Web Application “bunch aware”
        • Load balances between redundant databases
        • Fails over to surviving database on failure
• ~100K$ capital expense.                                 44
       How Do You Move A Terabyte?
                Speed           Rent         $/TB
Context                               $/Mbps                         Time/TB
                Mbps          $/month        Sent
Home phone        0.04             40            1,000       3,086    6 years
Home DSL           0.6             70             117         360    5 months
T1                 1.5           1,200            800        2,469   2 months
T3                 43           28,000            651        2,010     2 days
OC3                155          49,000            316         976     14 hours
OC 192            9600        1,920,000           200         617    14 minutes
100 Mpbs           100                                                 1 day
Gbps              1000                                               2.2 hours

Source: TeraScale Sneakernet, Microsoft Research, Jim Gray et. all             45
        Key Observations
       for Personal Store
      And for Larger Stores.
• Schematized storage can help
  organization and search.
• Schematized XML data sets
  a universal way exchange data
    answers and new data.
• If data are objects, then
  need standard representation
  for classes & methods.
                                  46
Longhorn - For Knowledge Workers
  • Simple (Self-*): auto install/manage/tune/repair.
  • Schema: data carries semantics
  • Search: find things fast (driven by schema)
  • Sync:      “desktop state” anywhere
  • Security: (Palladium) -- trustworthy
                    - privacy
                    - trustworthy (virus, spam,..)
                    - DRM (protect IP)
  • Shell:      task-based UI (aka activity-based UI)
  • Office-Longhorn
    – Intelligent documents
    – XML and Schemas
                                                  47
        How Do We Represent It
        To The Outside World?                                  <?xml version="1.0" encoding="utf-8" ?>
                                                               - <DataSet xmlns="http://WWT.sdss.org/">
                                                               - <xs:schema id="radec" xmlns=""
                                                               xmlns:xs="http://www.w3.org/2001/XMLSchema"
                                                               xmlns:msdata="urn:schemas-microsoft-com:xml-
                                                               msdata">
                                                               <xs:element name="radec" msdata:IsDataSet="true">
                                                               <xs:element name="Table">




         Schematized Storage
                                                                               <xs:element name="ra" type="xs:double"
                                                                              minOccurs="0" />
                                                                               <xs:element name="dec" type="xs:double"
                                                                              minOccurs="0" />
                                                                              …
                                                                              - <diffgr:diffgram
                                                                              xmlns:msdata="urn:schemas-microsoft-
                                                                              com:xml-msdata"



• File metaphor too primitive: just a blob
                                                                               xmlns:diffgr="urn:schemas-microsoft-
                                                                              com:xml-diffgram-v1">
                                                                              - <radec xmlns="">
                                                                              - <Table diffgr:id="Table1"
                                                                              msdata:rowOrder="0">




• Table metaphor too primitive: just records
                                                                               <ra>184.028935351008</ra>
                                                                               <dec>-1.12590950121524</dec>
                                                                               </Table>
                                                                              …
                                                                              - <Table diffgr:id="Table10"
                                                                              msdata:rowOrder="9">



• Need Metadata describing data context                                        <ra>184.025719033547</ra>
                                                                               <dec>-1.21795827920186</dec>
                                                                              </Table>
                                                                              </radec>



    –
                                                                              </diffgr:diffgram>

        Format                                                                </DataSet>



                                                      schema
    –   Providence (author/publisher/ citations/…)
    –   Rights
    –   History
                                                     Data or
    –   Related documents                            difgram
•   In a standard format
•   XML and XML schema
•   DataSet is great example of this                                                                48
•   World is now defining standard schemas
         There Is A Problem
Niklaus Wirth:
Algorithms + Data Structures = Programs
• GREAT!!!!
  – XML documents are portable objects
  – XML documents are complex objects
  – WSDL defines the methods on objects
    (the class)
• But will all the implementations match?
  – Think of UNIX or SQL or C or…
• This is a work in progress.               49
   Disk Storage Cheaper Than Paper
• File               Cabinet (4 drawer)       250$
  Cabinet:           Paper (24,000 sheets)    250$
                     Space (2x3 @ 10€/ft2)    180$
                     Total                    700$
                     0.03 $/sheet
                       3 pennies per page
• Disk:              disk (250 GB =)           250$
                     ASCII: 100 m pages
                     2e-6 $/sheet(10,000x cheaper)
                        micro-dollar per page
                     Image: 1 m photos
                     3e-4 $/photo (100x cheaper)
                        milli-dollar per photo

• Store everything on disk
                                                      50
  Note: Disk is 100x to 1000x cheaper than RAM
                   Data Analysis
• Looking for
  – Needles in haystacks – the Higgs particle
  – Haystacks: Dark matter, Dark energy
• Needles are easier than haystacks
• Global statistics have poor scaling
  – Correlation functions are N2, likelihood techniques N3
• As data and computers grow at same rate,
  we can only keep up with N logN
• A way out?
  – Discard notion of optimal
    (data is fuzzy, answers are approximate)
  – Don’t assume infinite computational resources or memory
• Requires combination of statistics & computer science
                                                    51
         Analysis and Databases
• Much statistical analysis deals with
  –   Creating uniform samples –
  –   Data filtering
  –   Assembling relevant subsets
  –   Estimating completeness
  –   Censoring bad data
  –   Counting and building histograms
  –   Generating Monte-Carlo subsets
  –   Likelihood calculations
  –   Hypothesis testing
• Traditionally these are performed on files
• Most of these tasks are much better done inside DB
• Bring Mohamed to the mountain, not the mountain to him
                                                      52
      Data Access is hitting a wall
     FTP and GREP are not adequate
•   You can GREP 1 MB in a second   •   You can FTP 1 MB in 1 sec
•   You can GREP 1 GB in a minute   •   You can FTP 1 GB / min (= 1 $/GB)
•   You can GREP 1 TB in 2 days     •   … 2 days and 1K$
•   You can GREP 1 PB in 3 years.   •   … 3 years and 1M$

• Oh!, and 1PB ~5,000 disks


• At some point you need
      indices to limit search
      parallel data search and analysis
• This is where databases can help

                                                                        53
         Smart Data (active databases)
• If there is too much data to move around,
       take the analysis to the data!
• Do all data manipulations at database
  – Build custom procedures and functions in the database
• Automatic parallelism
• Easy to build-in custom functionality
  – Databases & Procedures being unified
  – Example temporal and spatial indexing
    pixel processing, …
• Easy to reorganize the data
  – Multiple views, each optimal for certain types of analyses
  – Building hierarchical summaries are trivial
                                                             54
• Scalable to Petabyte datasets

								
To top