Docstoc

ppt - Computer Science Division

Document Sample
ppt - Computer Science Division Powered By Docstoc
					        Berkeley RAD Lab:
           Research in
Internet-scale Computing Systems
           Randy H. Katz
       randy@cs.berkeley.edu
           28 March 2007
                      Five Year Mission
• Observation: Internet systems complex, fragile, manually
  managed, evolving rapidly
   – To scale Ebay, must build Ebay-sized company
   – To scale YouTube, get acquired by a Google-sized company
• Mission: Enable a single person to create, evolve, and
  operate the next-generation IT service
   – “The Fortune 1 Million” by enabling rapid innovation
• Approach: Create core technology spanning systems,
  networking, and machine learning
• Focus: Making datacenter easier to manage to enable
  one person to Analyze, Deploy, Operate a scalable IT
  service


                                                                2
         Jan 07 Announcements by
           Microsoft and Google
• Microsoft and Google race to build next-gen DCs
  – Microsoft announces a $550 million DC in TX
  – Google confirm plans for a $600 million site in NC
  – Google two more DCs in SC; may cost another $950
    million -- about 150,000 computers each
• Internet DCs are the next computing platform
• Power availability drives deployment decisions




                                                         3
                     Datacenter is the Computer
    • Google program == Web search, Gmail,…
    • Google computer ==
          Warehouse-sized
          facilities and workloads
          likely more common
          Luiz Barroso’s talk at RAD Lab 12/11/06



Sun Project Blackbox             Compose datacenter from 20 ft. containers!
10/17/06
                                       – Power/cooling for 200 KW
                                       – External taps for electricity,
                                         network, cold water
                                       – 250 Servers, 7 TB DRAM,
                                         or 1.5 PB disk in 2006
                                       – 20% energy savings               4
                                       – 1/10th? cost of a building
                  Datacenter Programming
                          System
 • Ruby on Rails: open source Web framework
   optimized for programmer happiness and
   sustainable productivity:
      – Convention over configuration
      – Scaffolding: automatic, Web-based, UI to stored data
      – Program the client: write browser-side code in Ruby, compile to
        Javascript
      – “Duck Typing/Mix-Ins”
 • Proven Expressiveness
      – Lines of code Java vs. RoR: 3:1
      – Lines of configuration Java vs. RoR: 10:1
 • More than a fad
      – Java on Rails, Python on Rails, …

See http://www.theserverside.com/news/thread.tss?thread_id=33120          5
See web2.wsj2.com/ruby_on_rails_11_web_20_on_rocket_fuel.htm
          Datacenter Synthesis + OS


       Synth                    OS




• Synthesis: change DC via written specification
  – DC Spec Language compiled to logical configuration
• OS: allocate, monitor, adjust during operation
  – Director using machine learning, Drivers send commands
                                                         6
                “System” Statistical
                 Machine Learning
• S2ML Strengths
  – Handle SW churn: Train vs. write the logic
  – Beyond queuing models: Learns how to handle/make
    policy between steady states
  – Beyond control theory: Coping with complex cost
    functions
  – Discovery: Finding trends, needles in data haystack
  – Exploit cheap processing advances: fast enough to
    run online
• S2ML as an integral component of DC OS

                                                          7
              Datacenter Monitoring
• S2ML needs data to analyze
• DC components come with sensors already
  – CPUs (performance counters)
  – Disks (SMART interface)
• Add sensors to software
  – Log files
  – D-trace for Solaris, Mac OS
• Trace 10K++ nodes within and between DCs
  – *Trace: App-oriented path recording framework
  – X-Trace: Cross-layer/-domain including network layer

                                                           8
            Middleboxes in Today’s DC
                         • Middle boxes inserted on
                           physical path
                            – Policy via plumbing
                            – Weakest link: 1 point
                              of failure, bottleneck
                            – Expensive to upgrade
    High Speed Network        and introduce new
                              functionality
intrusion                • Identity-based Routing
detector                   Layer: policy not
load                       plumbing to route
balancer                   classified packets to
firewall                   appropriate middlebox
                           services
                                                       9
                First Milestone:
            DC Energy Conservation
• DCs limited by power
  – For each dollar spent on servers, add $0.48
    (2005)/$0.71 (2010) for power/cooling
  – $26B spent to power and cool servers in 2005 grows
    to $45B in 2010
• Attractive application of S2ML
  – Bringing processor resources on/off-line: Dynamic
    environment, complex cost function, measurement-
    driven decisions
     • Preserve 100% Service Level Agreements
     • Don’t hurt hardware reliability
     • Then conserve energy
• Conserve energy and improve reliability
  – MTTF: stress of on/off cycle vs. benefits of off-hours   10
         DC Networking and Power
• Within DC racks, network equipment often
  the “hottest” components in the hot spot
• Network opportunities for power reduction
  – Transition to higher speed interconnects (10
    Gbs) at DC scales and densities
  – High function/high power assists embedded in
    network element (e.g., TCAMs)



                                               11
             Thermal Image of Typical
                  Cluster Rack


Rack
Switch
                                               QuickTime™ and a
                                           TIFF (LZW) decomp resso r
                                        are neede d to see this picture.




         M. K. Patterson, A. Pratt, P. Kumar,                                                     12
         “From UPS to Silicon: an end-to-end evaluation of datacenter efficiency”, Intel Corporation
             DC Networking and Power
• Selectively power down ports/portions of net elements
• Enhanced power-awareness in the network stack
   – Power-aware routing and support for system virtualization
      • Support for datacenter “slice” power down and restart
   – Application and power-aware media access/control
      • Dynamic selection of full/half duplex
      • Directional asymmetry to save power,
        e.g., 10Gb/s send, 100Mb/s receive
   – Power-awareness in applications and protocols
      • Hard state (proxying), soft state (caching),
        protocol/data “streamlining” for power as well as b/w reduction
• Power implications for topology design
   – Tradeoffs in redundancy/high-availability vs. power consumption
   – VLANs support for power-aware system virtualization

                                                                          13
           Why University Research?
• Imperative that future technical leaders learn to
  deal with scale in modern computing systems
• Draw on talented but inexperienced people
   – Pick from worldwide talent pool for students & faculty
   – Don’t know what they can’t do
• Inexpensive -- allows focus on speculative ideas
   – Mostly grad student salaries
   – Faculty part time
• Tech Transfer engine
   – Success = Train students to go forth and replicate
   – Promiscuous publication, including source code
   – Ideal launching point for startups                   14
       Why a New Funding Model?
• DARPA has exiting long-term research in
  experimental computing systems
• NSF swamped with proposals, yielding
  even more conservative decisions
• Community emphasis on theoretical vs.
  experimental-oriented systems-building
  research
• Alternative: turn to Industry for funding
  – Opportunity to shape research agenda

                                              15
              New Funding Model
• 30 grad students + 5 undergrads+ 6 faculty + 4 staff
• Foundation Companies: $500K/yr for 5 years
   – Google, Microsoft, Sun Microsystems
   – Prefer founding partner technology in prototypes
   – Many from company attend retreats, advise on directions, head start
     on research results
   – Putting IP in Public Domain so partners use but not sued
• Large Affiliates $100K/yr: Fujitsu, HP, IBM, Siemens
• Small Affiliates $50K/yr: Nortel, Oracle
• State matching programs add $1M/year: MICRO, Discovery



                                                                       Qui ckTi me™ and a
                                                              TIFF (Uncompress ed) decompress or
                                                                 are needed to see thi s pi cture.



                                                                                16
                         Summary
• “DC is the Computer”
  – OS: ML+VM, Net: Identity-based Routing, FS: Web
    Storage
  – Prog Sys: RoR, Libraries: Web Services
  – Development Environment: RAMP (simulator), AWE
    (tester), Web 2.0 apps (benchmarks)
  – Debugging Environment: *Trace + X-Trace
• Milestones
  – DC Energy Conservation + Reliability Enhancement
  – Web 2.0 Apps in RoR


                                                       17
                             Conclusions
• Develop-Analyze-Deploy-Operate modern systems at
  Internet scale
   – Ruby-on-Rails for rapid applications development
   – Declarative datacenter for correct-by-construction system
     configuration and operation
   – Resource management by System Statistical Machine Learning
   – Virtual Machines and Network Storage for flexible resource
     allocation
   – Power reduction and reliability enhancement by fast power-
     down/restart for processing nodes
   – Pervasive monitoring, tracing, simultation, workload generation for
     runtime analysis/operation




                                                                     18
                  Discussion Points
• Jointly designed datacenter testbed
  – Mini-DC consisting of clusters, middleboxes, and
    network equipment
  – Representative network topology
• Power-aware networking
  – Evaluation of existing network elements
  – Platform for investigating power reduction schemes in
    network elements
• Mutual information exchange
  – Network storage architecture
  – System Statistical Machine Learning

                                                        19
         Ruby on Rails = DC PL
• Reasons to love Ruby on Rails
1. Convention over Configuration
  •   Rails framework feature enabled by Ruby language
      feature (Meta Object Programming)
2. Scaffolding: automatic, Web based, (pedestrian)
   User Interface to stored data
3. Program the client: v 1.1 write browser-side code
   in Ruby then compile to Javascript
4. “Duck Typing/Mix-Ins”
  •   Looks like string, responds like string, it’s a string!
  •   Mix-in improvement over multiple inheritance


                                                                20
                       DC Monitoring
• Imagine a world where path information always
  passed along so that can always track user
  requests throughout system
• Across apps, OS, network components and
  layers, different computers on LAN, …
  –   Unique request ID
  –   Components touched
  –   Time of day
  –   Parent of this request



                                                  21
               *Trace: The 1% Solution
• *Trace Goal: Make Path Based Analysis have low
  overhead so it can be always on inside datacenter
   – “Baseline” path info collection with ≤ 1% overhead
   – Selectively add more local detail for specific requests
• *Trace: an end-to-end path recording framework
   – Capture & timestamp a unique requestID across all system
     components
   – “Top level” log contains path traces
   – Local logs contain additional detail,
     correlated to path ID
   – Built on X-trace




                                                                22
                X-Trace: comprehensive tracing
                through Layers, Networks, Apps
• Trace connectivity of distributed
  components
    – Capture causal connections
      between requests/responses
• Cross-layer
    – Include network and middleware
      services such as IP and LDAP
• Cross-domain
    – Multiple datacenters, composed   • “Network path” sensor
      services, overlays, mash-ups        – Put individual
    – Control to individual                 requests/responses, at
      administrative domains                different network layers, in the
                                            context of an end-to-end
                                            request


                                                                          23
                           Actuator:
                  Policy-based Routing Layer
• Assign ID to incoming packets (hash + table lookup)
• Route based on IDs, not locations (i.e., not IP addr)
    – Sets up logical paths without changing network topology
• Set of common middle boxes get single ID
    – No single weakest link: robust, scalable throughput

         Load-    Intrusion-
                             Service
Firewall Balancer Detection
 (IDF)   (IDLB)   (IDID)
                             (IDS)                          • So simple
                                                            can be done
                                                             in FPGA?

           (IDID,IDS)    pkt                                • More general
          Identity-based Routing Layer                      than MPLS
            (IDF,IDLB)    pkt
                                                                     24
                         pkt
         Other RAD Lab Projects
• Research Accelerator for MP (RAMP)
  = DC simulator
• Automatic Workload Evaluator (AWE)
  = DC tester
• Web Storage (GFS, Bigtable, Amazon S3)
  = DC File System
• Web Services (MapReduce, Chubby)
  = DC Libraries

                                       25
                   1st Milestone:
               DC Energy Conservation
•   Good match to Machine Learning
    – An optimization, so imperfection not catastrophic
    – Lots of data to measure, dynamically changing
      workload, complex cost function
       •   Not steady state, so not queuing theory
       •   PG&E trying to change behavior of datacenters
•   Properly state problem:
    1. Preserve 100% Service Level Agreements
    2. Don’t hurt hardware reliability
    3. Then conserve energy
•   Radical idea: can conserving energy improve
    hardware reliability?
                                                           26
              1st Milestone: Conserve
             Energy & Improve Reliability
• Improve component reliability?
• Disks: Lifetimes measured in Powered On
  Hours, but limited to 50,000 start/stop cycles
• Idea, if turn off disks 50%, then ≈ 50% annual
  failure rate as long as don’t exceed 50,000
  start/stop cycles (≈ once per hour)
• Integrated Circuits: lifetimes affected by Thermal
  Cycling (fast change bad), Electromigration (turn
  off helps), Dielectric Breakdown (turn off helps)
• Idea: If limited number of times cycled thermally,
  could cut IC failure rate due EM, DB by ≈ 30%?
See “A Case For Adaptive Datacenters To Conserve Energy and Improve
Reliability,” Peter Bodik, Michael Armbrust, Kevin Canini, Armando Fox,
                                                                          27
Michael Jordan and David Patterson, 2007.
        RAD Lab 2.0 2nd Milestone:
           Killer Web 2.0 Apps
• Demonstrate RAD Lab vision of 1 person
  creating next great service and scale up
• Where get example great apps, given grad
  students creating the technology?
• Use “Undergraduate Computing Clubs” to create
  exciting apps in RoR using RAD Lab equipment,
  technology
  – Armando Fox is RoR club leader
  – Recruited Real World RoR programmer to develop
    code and advise RoR computing club
  – ≈30 students joined club Jan 2007
  – Hire best ugrads to build RoR apps in RAD Lab
                                                     28
                    Miracle of
               University Research
• Talented (inexperienced) people
   – Pick from worldwide talent pool for students & faculty
   – Don’t know what they can’t do
• Inexpensive
   – Mostly grad student salaries ($50k-$75k/yr overhead)
   – Faculty part time ($75k-$100k/yr including overhead)
• Berkeley & Stanford Swing for Fences (R, not r or D)
• Even if hit a single, train next generation of leaders
• Technology Transfer engine
   – Success = Train students to go forth & multiply
   – Publish everything, including source code
   – Ideal launching point for startups

                                                              29
                Chance to Partner with
                  a Great University
• Chance to Work on the “Next Great Thing”
• US News & World Report ranking of CS Systems universities: 1
  Berkeley, 2 CMU, 2 MIT, 4 Stanford
• Berkeley & Stanford some the top suppliers of systems
  students to industry (and academia)
• National Academy study mentions Berkeley in 7 of 19 $1B+
  industries from IT research, Stanford 4 times
      • Timesharing (SDS 940), Client-Server Computing (BSD Unix),
        Graphics, Entertainment, Internet, LANs, Workstations (SUN), GUI,
        VLSI Design (Spice),
        RISC (ARM, MIPS, SPARC), Relational DB (Ingres/Postgres),
        Parallel DB, Data Mining, Parallel Computing, RAID, Portable
        Communication (BWRC), WWW, Speech Recognition, Broadband
        last mile (DSL)

                                                                     30
           Years to > $1B IT industry from Research Start
National Research Council Computer Science & Telecommunications
                    Physical RAD Lab:
                    Radical Collocation



• Innovation from spontaneous meetings of people
  with different areas of expertise
• Communication inversely proportional to distance
    – Almost never if > 100 feet or on different floor
•   Everyone (including faculty) in open offices
•   Great Meeting Rooms, Ubiquitous Whiteboards
•   Technology to concentrate: Cell phone, Ipod, laptop
                                                   32
•   Google “Physical RAD Lab” to learn more
                  Example of
                Next Great Thing
Berkeley Adaptive Distributed systems
   Laboratory (“RAD Lab”)
  – Founded 12/2005: with Google,
     Microsoft, Sun as founding partners
  –   Armando Fox, Randy Katz, Mike Jordan,
      Anthony Joseph, Dave Patterson, Scott
      Shenker, Ion Stoica
  – Google “RAD Lab” to learn more

                                              33
                     RAD Lab Goal:
                   Enable Next “Ebay”
• Create technology to enable next great Internet Service to
  grow rapidly without growing the organization rapidly
   – Machine Learning + Systems is secret sauce
• Position: “The datacenter is the computer”
   – Leverage point is simplifying datacenter management
• What is the programming language of the datacenter?
• What is CAD for the datacenter?
• What is the OS for the datacenter?




                                                           34

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:8/23/2011
language:English
pages:34