Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

DISC-concept

VIEWS: 3 PAGES: 41

									   Data
      Intensive
          Super
              Computing
    Randal E. Bryant
Carnegie Mellon University
    http://www.cs.cmu.edu/~bryant
   Data
      Intensive
          Super Scalable
              Computing
    Randal E. Bryant
Carnegie Mellon University
    http://www.cs.cmu.edu/~bryant
Examples of Big Data Sources
Wal-Mart
          267 million items/day, sold at 6,000 stores
          HP building them 4PB data warehouse
          Mine data to manage supply chain, understand market
           trends, formulate pricing strategies




Sloan Digital Sky Survey
          New Mexico telescope captures 200 GB image data / day
          Latest dataset release: 10 TB, 287 million celestial objects
          SkyServer provides SQL access

–3 –
Our Data-Driven World
Science
          Data bases from astronomy, genomics, natural languages,
           seismic modeling, …

Humanities
          Scanned books, historic documents, …

Commerce
          Corporate sales, stock market transactions, census, airline
           traffic, …
Entertainment
          Internet images, Hollywood movies, MP3 files, …

Medicine
          MRI & CT scans, patient records, …
–4 –
 Why So Much Data?
We Can Get It
          Automation + Internet

We Can Keep It
          Seagate Barracuda
          1 TB @ $159 (16¢ / GB)

We Can Use It
          Scientific breakthroughs
          Business process efficiencies
          Realistic special effects
          Better health care

Could We Do More?
          Apply more computing power to this
           data
–5 –
Google’s Computing Infrastructure




            200+ processors
            200+ terabyte database
            1010 total clock cycles
            0.1 second response time
            5¢ average advertising revenue
–6 –
Google’s Computing Infrastructure
System
        ~ 3 million processors in clusters of ~2000 processors each
        Commodity parts
           x86 processors, IDE disks, Ethernet communications
           Gain reliability through redundancy & software management
        Partitioned workload
           Data: Web pages, indices distributed across processors
           Function: crawling, index generation, index search, document
            retrieval, Ad placement
                                            Barroso, Dean, Hölzle, “Web Search for a Planet:
                                            The Google Cluster Architecture” IEEE Micro 2003


A Data-Intensive Scalable Computer (DISC)
        Large-scale computer centered around data
           Collecting, maintaining, indexing, computing

–7 –    Similar systems at Microsoft & Yahoo
Google’s Economics
Making Money from Search
          $5B search advertising revenue in 2006
          Est. 100 B search queries
           5¢ / query average revenue

That’s a Lot of Money!
          Only get revenue when someone clicks
           sponsored link
          Some clicks go for $10’s

That’s Really Cheap!
          Google + Yahoo + Microsoft: $5B
           infrastructure investments in 2007


–8 –
Google’s Programming Model
MapReduce
                                                                      Reduce
                        k1             k1                       kr


                                                                          Key-Value
                                                                          Pairs



              M    M     M                                    M       Map

              x1   x2    x3                                      xn
          Map computation across many objects
             E.g., 1010 Internet web pages
          Aggregate results in many different ways
          System deals with issues of resource allocation & reliability
                                                Dean & Ghemawat: “MapReduce: Simplified Data
–9 –                                            Processing on Large Clusters”, OSDI 2004
 MapReduce Example
 1                  3                      6                3                      1

   dick               and                  come                 see                spot             Sum
                                                                                
                                                                        see, 1
   dick, 1                                              come, 1
                                                                                                    Word-Count
                                  come, 1                come, 1                    spot, 1
                                                                                                    Pairs
                             and, 1
                come, 1                see, 1  come, 2                   and, 1
                                                              and, 1


             M                     M                    M                    M                M     Extract

         Come,                  Come                Come,                 Come             Come
          Dick                   and                come.                  and              and
                                 see.                                      see.             see
                                                                                           Spot.

              Create an word index of set of documents
              Map: generate word, count pairs for all words in document
              Reduce: sum word counts across documents
– 10 –
 DISC: Beyond Web Search
 Data-Intensive Application Domains
            Rely on large, ever-changing data sets
               Collecting & maintaining data is major effort
            Many possibilities

 Computational Requirements
            From simple queries to large-scale analyses
            Require parallel processing
            Want to program at abstract level
 Hypothesis
            Can apply DISC to many other application domains



– 11 –
 The Power of Data + Computation
 2005 NIST Machine Translation Competition
            Translate 100 news articles from Arabic to English

 Google’s Entry
            First-time entry
               Highly qualified researchers
               No one on research team knew Arabic
            Purely statistical approach
               Create most likely translations of words and phrases
               Combine into most likely sentences
            Trained using United Nations documents
               200 million words of high quality translated text
               1 trillion words of monolingual text in target language
            During competition, ran on 1000-processor cluster
               One hour per sentence (gotten faster now)
– 12 –
 2005 NIST Arabic-English
 Competition Results
  Expert human
     translator
                                          BLEU Score
                       BLEU Score
                                               Statistical comparison to
                       0.7
             Usable                             expert human translators
         translation
                       0.6                     Scale from 0.0 to 1.0
Human-edittable
    translation
                       0.5
                              Google      Outcome
            Topic             ISI
    identification            IBM+CMU          Google’s entry
                              UMD
                       0.4    JHU+CU            qualitatively better
                              Edinburgh
                                               Not the most
                       0.3                      sophisticated approach
            Useless
                                               But lots more training
                       0.2
                                                data and computer power
                              Systran
                       0.1
                              Mitre
                              FSC
                       0.0
– 13 –
 Oceans of Data, Skinny Pipes

                                       1 Terabyte
                                              Easy to store
                                              Hard to move



               Disks         MB / s        Time
         Seagate Barracuda    115       2.3 hours
         Seagate Cheetah      125       2.2 hours
             Networks        MB / s        Time
           Home Internet     < 0.625    > 18.5 days
          Gigabit Ethernet   < 125      > 2.2 hours
           PSC Teragrid      < 3,750   > 4.4 minutes
– 14 –     Connection
 Data-Intensive System Challenge
 For Computation That Accesses 1 TB in 5 minutes
            Data distributed over 100+ disks
               Assuming uniform data partitioning
            Compute using 100+ processors
            Connected by gigabit Ethernet (or equivalent)



 System Requirements
            Lots of disks
            Lots of processors
            Located in close proximity
               Within reach of fast, local-area network



– 15 –
 Desiderata for DISC Systems
 Focus on Data
            Terabytes, not tera-FLOPS

 Problem-Centric Programming
            Platform-independent expression of data parallelism

 Interactive Access
            From simple queries to massive computations

 Robust Fault Tolerance
            Component failures are handled as routine events


 Contrast to existing supercomputer / HPC systems

– 16 –
 System Comparison: Data
Conventional Supercomputers                                DISC



   System                                              System




            Data stored in separate              System collects and
             repository                            maintains data
               No support for collection or         Shared, active data set
                management                        Computation colocated with
            Brought into system for               storage
             computation                             Faster access
               Time consuming
               Limits interactivity
– 17 –
 System Comparison:
 Programming Models
  Conventional Supercomputers                                          DISC
         Application
         Programs                                                      Application
                                                                       Programs
                                                 Machine-Independent
                Software                          Programming Model
                Packages
                                                                         Runtime
                             Machine-Dependent                           System
                             Programming Model
             Hardware                                                   Hardware

              Programs described at very              Application programs
              low level                                 written in terms of high-level
                Specify detailed control of            operations on data
                 processing & communications           Runtime system controls
             Rely on small number of                   scheduling, load balancing,
              software packages                         …
                Written by specialists
                Limits classes of problems &
– 18 –
                 solution methods
 System Comparison: Interaction
Conventional Supercomputers                          DISC
 Main Machine: Batch Access                 Interactive Access
            Priority is to conserve              Priority is to conserve
             machine resources                     human resources
            User submits job with                User action can range from
             specific resource                     simple query to complex
             requirements                          computation
            Run in batch mode when               System supports many
             resources available                   simultaneous users
                                                     Requires flexible
 Offline Visualization                                programming and runtime
            Move results to separate                 environment
             facility for interactive use




– 19 –
 System Comparison: Reliability
   Runtime errors commonplace in large-scale systems
                Hardware failures
                Transient errors
                Software bugs

Conventional Supercomputers                            DISC
 “Brittle” Systems                        Flexible Error Detection and
            Main recovery mechanism is   Recovery
             to recompute from most              Runtime system detects and
             recent checkpoint                    diagnoses errors
            Must bring down system for          Selective use of redundancy
             diagnosis, repair, or                and dynamic recomputation
             upgrades                            Replace or upgrade
                                                  components while system
                                                  running
                                                 Requires flexible
                                                  programming model &
– 20 –
                                                  runtime environment
 What About Grid Computing?
            “Grid” means different things to different people

 Computing Gird
            Distribute problem across many machines
               Geographically & organizationally distributed
            Hard to provide sufficient bandwidth for data exchange

 Data Grid
            Shared data repositories
            Should colocate DISC systems with repositories
               It’s easier to move programs than data




– 21 –
 Compare to Transaction Processing
 Main Commercial Use of Large-Scale Computing
            Banking, finance, retail transactions, airline reservations, …

 Stringent Functional Requirements
            Only one person gets last $1 from shared bank account
               Beware of replicated data
            Must not lose money when transferring between accounts
               Beware of distributed data
            Favors systems with small number of high-performance,
             high-reliability servers

 Our Needs are Different
            More relaxed consistency requirements
               Web search is extreme example
            Fewer sources of updates
            Individual computations access more data
– 22 –
 Traditional Data Warehousing
                                          Database

         Raw               Bulk                        User
         Data             Loader                       Queries




                         Schema
                         Design

 Information Stored in Digested Form
            Based on anticipated query types
            Reduces storage requirement
            Limited forms of analysis & aggregation


– 23 –
 Next-Generation Data Warehousing
                        Large-Scale
                        File System

         Raw                               Map /
                                                           User
         Data                             Reduce
                                                           Queries
                                          Program



 Information Stored in Raw Form
            Storage is cheap
            Enables forms of analysis not anticipated originally

 Express Query as Program
            More sophisticated forms of analysis




– 24 –
 Why University-Based Project(s)?
Open
            Forum for free exchange of ideas
            Apply to societally important, possibly noncommercial
             problems

Systematic
            Careful study of design ideas and tradeoffs

Creative
            Get smart people working together

Fulfill Our Educational Mission
            Expose faculty & students to newest technology
            Ensure faculty & PhD researchers addressing real problems


– 25 –
 Designing a DISC System
 Inspired by Google’s Infrastructure
            System with high performance & reliability
            Carefully optimized capital & operating costs
            Take advantage of their learning curve



 But, Must Adapt
            More than web search
               Wider range of data types & computing requirements
               Less advantage to precomputing and caching information
               Higher correctness requirements
            102–104 users, not 106–108
               Don’t require massive infrastructure


– 26 –
 Constructing General-Purpose DISC
 Hardware
            Similar to that used in data centers and high-
             performance systems
            Available off-the-shelf


 Hypothetical “Node”
            1–2 dual or quad core processors
            1 TB disk (2-3 drives)
            ~$10K (including portion of routing network)




– 27 –
 Possible System Sizes
 100 Nodes                        $1M
            100 TB storage
            Deal with failures by stop & repair
            Useful for prototyping

 1,000 Nodes                     $10M
            1 PB storage
            Reliability becomes important issue
            Enough for WWW caching & indexing

 10,000 Nodes                  $100M
            10 PB storage
            National resource
            Continuously dealing with failures
            Utility?
– 28 –
 Implementing System Software
 Programming Support
            Abstractions for computation & data representation
               E.g., Google: MapReduce & BigTable
            Usage models

 Runtime Support
            Allocating processing and storage
            Scheduling multiple users
            Implementing programming model
 Error Handling
            Detecting errors
            Dynamic recovery
            Identifying failed components
– 29 –
 Getting Started
 Goal
            Get faculty & students active in DISC

 Hardware: Rent from Amazon
            Elastic Compute Cloud (EC2)
               Generic Linux cycles for $0.10 / hour ($877 / yr)
            Simple Storage Service (S3)
               Network-accessible storage for $0.15 / GB / month ($1800/TB/yr)
            Example: maintain crawled copy of web (50 TB, 100
             processors, 0.5 TB/day refresh) ~$250K / year

 Software
            Hadoop Project
               Open source project providing file system and MapReduce
               Supported and used by Yahoo
               Prototype on single machine, map onto cluster
– 30 –
 Rely on Kindness of Others




            Google setting up dedicated cluster for university use
            Loaded with open-source software
               Including Hadoop
            IBM providing additional software support
            NSF will determine how facility should be used.
– 31 –
 More Sources of Kindness
                               Yahoo: Major
                                supporter of
                                Hadoop
                               Yahoo plans to
                                work with other
                                universities




– 32 –
 Beyond the U.S.




– 33 –
 CS Research Issues
 Applications
            Language translation, image processing, …

 Application Support
            Machine learning over very large data sets
            Web crawling

 Programming
            Abstract programming models to support large-scale
             computation
            Distributed databases

 System Design
            Error detection & recovery mechanisms
            Resource scheduling and load balancing
            Distribution and sharing of data across system
– 34 –
 Exploring Parallel Computation Models
                    MapReduce

                                                                   MPI
SETI@home                                             Threads             PRAM



    Low Communication                                       High Communication
    Coarse-Grained                                                 Fine-Grained

 DISC + MapReduce Provides Coarse-Grained Parallelism
            Computation done by independent processes
            File-based communication
 Observations
            Relatively “natural” programming model
            Research issue to explore full potential and limits
               Dryad project at MSR
– 35 –         Pig project at Yahoo!
         Message Passing
  P1      P2      P3   P4   P5
                                 Existing HPC Machines
                                          Characteristics
                                                Long-lived processes
                                                Make use of spatial locality
                                                Hold all program data in
                                                 memory
                                                High bandwidth
                                                 communication
               Shared Memory
                                 Memory
 P1       P2     P3    P4   P5            Strengths
                                                High utilization of resources
                                                Effective for many scientific
                                                 applications

                                          Weaknesses
                                                Very brittle: relies on
                                                 everything working correctly
– 36 –                                           and in close synchrony
P1       P2     P3      P4   P5
                                  HPC Fault Tolerance
         Checkpoint
                                           Checkpoint
                                                   Periodically store state of all
                                  Wasted            processes
                                  Computation
                                                   Significant I/O traffic

              Restore                      Restore
                                                   When failure occurs
                                                   Reset state to that of last
         Checkpoint
                                                    checkpoint
                                                   All intervening computation
                                                    wasted

                                           Performance Scaling
                                                   Very sensitive to number of
                                                    failing components
– 37 –
Map/Reduce Operation
                               Characteristics
         Map/Reduce                  Computation broken into
                      Map             many, short-lived tasks
                      Reduce            Mapping, reducing
                      Map            Use disk storage to hold
                      Reduce
                                      intermediate results
                      Map
                      Reduce   Strengths
                      Map            Great flexibility in placement,
                      Reduce          scheduling, and load
                                      balancing
                                     Handle failures by
                                      recomputation
                                     Can access large data sets
                               Weaknesses
                                     Higher overhead
– 38 –                               Lower raw performance
 Choosing Execution Models
 Message Passing / Shared Memory
            Achieves very high performance when everything works well
            Requires careful tuning of programs
            Vulnerable to single points of failure

 Map/Reduce
            Allows for abstract programming model
            More flexible, adaptable, and robust
            Performance limited by disk I/O

 Alternatives?
            Is there some way to combine to get strengths of both?



– 39 –
 Concluding Thoughts
 The World is Ready for a New Approach to Large-Scale
   Computing
            Optimized for data-driven applications
            Technology favoring centralized facilities
               Storage capacity & computer power growing faster than network
                bandwidth

 University Researchers Eager to Get Involved
            System designers
            Applications in multiple disciplines
            Across multiple institutions




– 40 –
 More Information


    “Data-Intensive Supercomputing:
     The case for DISC”

          Tech   Report: CMU-CS-07-128
                  from
          Available
          http://www.cs.cmu.edu/~bryant



– 41 –

								
To top