MySQL_Cluster-Scaling_Hadoop_for_MultiCore_and_Highly__Threaded_Systems by yvtong

VIEWS: 10 PAGES: 33

									Scaling Hadoop for
Multi-Core and Highly
Threaded Systems


Jangwoo Kim, Zoran Radovic
Performance Architects
Architecture Technology Group
Sun Microsystems Inc.
         Project Overview
         Hadoop Updates
         CMT Hadoop Systems
         Scaling Hadoop on CMT
         Virtualization Technologies
         Zones
         Logical Domains

AGENDA   Case Study: E-mail Discovery
         Conclusions

                                        2
Project Overview
• Chip Multi-Threading (CMT) processors and Hadoop are
  designed for maximum throughput
• Sun's JVM optimized for CMT
   >  Java has been widely deployed by many customers on CMT
       Hadoop is written with Java – an ideal throughput candidate
• Seemed like a great fit for Hadoop with the potential for a
  greatly reduced footprint


•  Related Work by Ning Sun and Lee Anne Simmons:
   >  Blueprint: Using Logical Domains and CoolThreads
       Technology: Improving Scalability and System Utilization
                                                                     3
700+ attendees..   4
Hadoop Expands Beyond the Web...
    Some examples from the Summit
       Genetic  Sequence Analysis
       Parallel Data Mining in Telco
       Natural language learning
       Business Fraud Detection
       Clinical Trials
       Retail Business Planning
       ...




                                       5
Map/Reduce Organization

                copy                         Output
Input     map                                HDFS
HDFS                   sort/merge
Split 0                             reduce   Part 0
Split 1

Split 2   map

Split 3                sort/merge
                                    reduce   Part 1

          map




                                                      6
Next-Gen Hadoop – Low Latency Focus
  Hadoop is traditionally optimized for throughput
  World Record Sort source code changes

       http://developer.yahoo.net/blogs/hadoop/Yahoo2009.pdf
          “Winning a 60 Second Dash with a Yellow Elephant”
       Reducer  Improvements (Shuffle); memory to memory merge
       Fetch of multiple map outputs from the same node
          Reduces number of server connections
       Improved  timeout behavior
       Better data corruption detection (CRC32 improvements)
       Map output compression (45% of the original size)
       Improved and multi-threaded data partitioning
       Lower latency with faster “heartbeat”
                                                                  7
OpenSolaris 2009.06
    OpenSolaris Moves Into Enterprise
       UltraSPARC   T1 and T2 Support, Sun4u
  5 Year Enterprise Support
  Datacenter-Ready Installation


  New and Modern Networking Stack

       http://opensolaris.org/os/project/crossbow/
                Optimized
       Multi-Core
       Easy Network Virtualizetion and Resource Control

    Powerful, Built-in and Free Virtualization Techology
       http://opensolaris.org/os/community/ldoms
    http://www.opensolaris.com/learn/features/whats-new/200906/
                                                                   8
UltraSPARC™ T2 Processor
• 8 SPARC V9 cores @ 1.4Ghz             42 GB/s read, 21 GB/s write

  >  8 vertical threads per core
  >  2 execution pipelines per core
  >  1 instruction/cycle per pipeline
  >  1 FPU per core
  >  1 SPU (crypto) per core
  >  4MB 16-way 8-bank L2$
• 64 threads
• 2.5Ghz x8 PCI-Express interface
• 2 x 10Gb on-chip Ethernet
• Crypto processor per core
• Power: 84 watts (typical)
• http://www.opensparc.net
                                                                      9
CMT Hadoop Systems
          T5440                Blade 6000
 4U 2P US-T2 Plus Platform   10U US-T2 Blades
   & Sun Storage J4400




         T5240
 2U 2P US-T2 Plus Server



                                                10
CMT Hadoop Node and Rack Specs




* http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html




                                                                                        11
Ideal Performance Model

job start


    all maps start        all maps finish


                mapping
                                                               job completion


                             shuffle        all reduces all reduces
                              start              start      finish


                                       shuffling    reducing



              All tasks start and finish
            simultaneously
                                                                                12
Performance Model with Serialized
Tasks
job start


 first map start   last map starts     last map finishes


                      mapping

                                                                        job completion


                      first map      last shuffle   last reduce   last reduce
                       finishes         starts         starts       finishes


                                     shuffling               reducing

Launching many tasks can incur significant overhead
                                                                                         13
Distributed Performance Data Collection
    Created a set of scripts to facilitate distributed execution for
     performance data collection and analysis
         Based     on traditional single-node system analysis tools
            mpstat, nicstat, iostat, vmstat, ...
         Varaiable sampling frequency to monitor hardware utilization
         Pinpoint which resource is a bottleneck at any point
            CPU utilization, network, disk I/O
               where no resource is fully utilized may indicate poorly-
         Periods
        tuned Hadoop configuration or other system issues
    Hadoop log processing to monitor Hadoop task timeline
         Examine      startup rate, Hadoop phase overlap
    Scripts and details are available here:
         http://blogs.sun.com/jgebis/                                     14
    Serialized Task Launching Overhead
                  30GB sort on a single T5240 node (128 threads, 128GB RAM, 16 disks)‫‏‬
Time (min)‫‏‬




              Mapping

               Shuffling


                Reducing




                                         (#map, # reduce)‫‏‬

              <60% CPU utilization
              Significant launching overhead limits scalability
                                                                                         15
10-Node 150G Sort – Task Timeline
Detailed Look: One T2 Blade (64 threads)‫‏‬




                                            16
10-Node 150G Sort – Utilization Stats




                                        17
Intra-node Virtualization:
Logical Domains (LDOMs)‫‏‬
• Hardware-assisted Virtualization
• Single hypervisor
   > OS-Level Isolation
   > Dedicated H/W threads and memory
     Logical        Logical             Logical         Logical
    Domain 0       Domain 1            Domain 2        Domain N
       Job
     Tracker
                     Task
                    Tracker
                                         Task
                                        Tracker
                                                   …     Task
                                                        Tracker

    Name Node      Data Node           Data Node       Data Node



                               Hypervisor


                                                                   18
 Example LDOMs Configuration
• Single control domain
   >  Virtual disk server (vds)
   >  Virtual network switch (vsw)
   >  Virtual console concentrator (vcc)

• Multiple logical domains
•  ldm add-vcpu 8 ldom0                    (cpu)‫‏‬
•  ldm add-memory 16G ldom0               (memory)‫‏‬
•  ldm add-vdisk vdisk0 control-vds ldom0 (disk)‫‏‬
•  ldm add-vnet vnet0 control-vsw ldom0    (network)‫‏‬

• Single control domain
       ldm bind ldom0          (bind)
       ldm start ldom0         (boot)
   >  OS Install as usual
                                                        19
Intra-node Virtualization: Zones
(Containers)‫‏‬
• Software (OS) Virtualization
• Single operating system
   > Application-Level Isolation
   > No H/W threads and memory dedicated
     Zone 0         Zone 1         Zone 2          Zone N



       Job           Task           Task
                                              …     Task
     Tracker        Tracker        Tracker         Tracker

    Name Node      Data Node      Data Node       Data Node




                                                              20
     Example Zones Configuration

• Create zones
    zonecfg –z zone0 –f zone0.config
    zone0.config:
•  “create;
•  add net; set physical=interface; set address=IP; ..
•  add fs; set dir=mount_path ; set raw=partition; ..
•  ..”

• Zone administration
•  zoneadm –z zone0 boot                (boot)‫‏‬
•  zoneadm list                         (list)‫‏‬
•  zoneadm –z zone0 halt                (halt)‫‏‬




                                                         21
Example 4-LDOM Setup
• Evenly distributing H/W resources




 * LDOM/ZONE administration scripts and details available here: http://blogs.sun.com/jangwook/
                                                                                                 22
    Scaling Hadoop with Intra-node
    Virtualization
                    30GB sort on a single T5240 node (128 threads, 128GB RAM, 16 disks)‫‏‬
Time (min)‫‏‬




              Mapping

              Shuffling

              Reducing




                               (#map, # reduce, #virtual nodes)‫‏‬


               ~100% CPU utilization with 4 logical domains
                                                                                           23
Scaling Sorting Workload
(Without Virtualization)‫‏‬
                        Large data sorting performance
    (Sun Blade 6000: 10 nodes, 640 threads, 64GB RAM/node, 4 disks/node)‫‏‬
   Time (min)‫‏‬




                                    Data Size


CMT Hadoop systems scale nicely with larger datasets
                                                                            24
E-mail Discovery Overview
• Preparing data for searching over large email corpus
• Five phases with different MapReduce profiles
   1. PipelineMapReduce – Reads and parses 27GB of raw emails
   2. DocumentSeqFileToMapFile – Prepares MapFile to retrieve data
   3. PersonNormalization – Groups data into unique entities
   4. Consumer – Creates indices
   5. ThreadDetection – Conversation threads detected
• Output is a set of shards used in an E-mail discovery search
  application



                                                                     25
E-mail Discovery (http://www.it-discovery.com/)‫‏‬




                                                   26
E-Discovery Results
                              Email processing performance
  Time (min)‫‏‬




                1 node         1 node        10 nodes      15 nodes
                128 threads    256 threads   640 threads   60 EC2 units



CMT Hadoop systems scale for throughput applications
                                                                          27
Performance / 40U Rack
                         Email processing performance normalized to a 40U rack

                                                                              4.6X
  Relative performance




                                                              3.1X

                                               2.0X

                               1.0




                            40 nodes        5 nodes         40 nodes       20 nodes
                            4 EC2 units /   256 threads /   64 threads /   128 threads /
                              node            node            node            node



 High performance with smaller datacenter footprint
                                                                                           28
MySQL Enterprise Solution
Enterprise software, services delivered as annual subscription

                    Most up-to-date MySQL software
  Database          Monthly rapid updates
                    Quarterly service packs
                    Hot-fix programSubscription:
                    Indemnification
                                   MySQL Enterprise
                                   License (OEM):
                                    Embedded Server
                    Virtual database assistant
                                    Support
                    Global monitoring of all servers
   Monitoring
                                    MySQL Cluster Carrier-Grade
                    Web-based central console
                    Built-in advisors, expert advice
                    Problem query Training
                                    detection/analysis
                                   Consulting
                                   NRE
                    Online self-help MySQL Knowledge Base
  Support           24/7 problem resolution with priority escalation
                    Consultative help
                    High-Availability and Scale-Out
                                                                       29
Conclusions
• Hadoop and Java scale well on CMT systems
• Startup cost dominates performance on highly threaded
  systems (256 threads per node)
• Virtualization techniques enable good scalability, high
  system utilization and better performance
    >  Parallelized startup
    >  Less external node-to-node Ethernet traffic
• Hadoop consolidation on CMT systems reduces datacenter
  footprint, power and cooling costs
• Next-gen Hadoop focuses on performance and latency


                                                            30
Software Stack, Pointers to Download
• Sun CMT servers
  > http://www.sun.com/servers/coolthreads/overview/index.jsp
• Hadoop 0.20.0
  > http://hadoop.apache.org
• JVM from Sun 1.6.0_13
  > http://www.java.sun.com
• OpenSolaris for SPARC 2009.6
  > http://www.opensolaris.org
• LDOMs 1.1
  > http://opensolaris.org/os/community/ldoms

                                                                31
    Learn More Free                     Try it Yourself
• Using LDom and CoolThreads       • Try free for 60 days: Sun
  Technology: Improving              Enterprise SPARC rack or blade
  Scalability and Utilization        systems and storage
• Improving Database Scalability   • Test Hadoop on up to 128
  on T5440 Blueprint                 threads
• Deploying Web 2.0 Applications   • 60 days to decide to buy
  on Sun Servers and the           • Return and pay nothing – not
  OpenSolaris Operating Systems      even shipping if you don't
     Tech Resources tab at
 sun.com/mysqlsystems                  sun.com/tryandbuy
                                                                      32
Scaling Hadoop for
Multi-Core and Highly
Threaded Systems

Jangwoo Kim (jangwoo.kim@sun.com)‫‏‬
Zoran Radovic (zoran.radovic@sun.com)‫‏‬
Denis Sheahan (denis.sheahan@sun.com)‫‏‬
Joseph Gebis (joseph.gebis@sun.com)‫‏‬

This is an extended version of our Hadoop Summit '09
presentation, Santa Clara, CA, June 2009
http://developer.yahoo.com/events/hadoopsummit09

								
To top