Document Sample
hp_oracle-Red_Hat__Overview Powered By Docstoc
					Composite Architectures
High Performance and Low Latency

High Performance on Wall Street
September, 2007

Tim Burke, Director of Emerging Technologies, Red Hat
The broader picture -
Enterprise computing platform
   Most use cases are not isolated systems
   Commonly networked pipeline of systems
   TCP/IP, middleware, messaging
   3rd party software
   Not 1-size-fits all

                   Job Scheduler
                                   Computation          Archival - file/DB
                   Load Balancer


The broader picture -
Enterprise computing platform
   Broader solution stack perspective
    ●   General purpose systems
    ●   Compute, analytics servers – low latency - realtime
    ●   Storage tier
    ●   High speed messaging tying it all together

                  Job Scheduler
                                    Computation        Archival - file/DB
                  Load Balancer


Agenda – How new open source technologies
can fill the needs of complex environments
   Linux is the established baseline for high performance
     ● Performance proof-points
                                          Available Now
     ● Feature highlights
     ● Importance of tuning
   Near-term new technologies combine to bring things to the
    next level
     ● Messaging
                                    Available January
     ● Realtime kernel
                                     2008, October beta

Red Hat Enterprise Linux
performance highlights

 Recent Benchmark Results
    World Record TPC-H Performance with 3000GB database
      ●   HP : Oracle : Red Hat
      ●   5% faster and 30% cheaper than #2 : Sun Solaris 10 on E25K SPARC Server
    Red Hat Enterprise Linux also holds 6 of the top 10 results at 300GB database size

                                            HP BladeSystem ProLiant BL25p cluster 64P DC Spec.
                                            Performance: 110,576.5 QphH@3000GB
                                            Price/Performance: $37.80 USD/QphH@3000GB
                                            Database Total System Cost: $4,179,238 USD
                                            Database Software: Oracle Database 10g Release 2,
                                            Enterprise Edition with Oracle Real Application Clusters
                                            and Partitioning
                                            Operating System: Red Hat Enterprise Linux 4 ES
                                            Total # Nodes/Processors/Cores/Threads: 64/64/128/128
                                            Processors: Dual-Core AMD Opteron(tm) 285, 2.6GHz/1MB
                                            Availability: June 8, 2006 Submitted: June 8, 2006

Source: www.tpc.org 20-Mar-2007                                                                        6
SPECweb2005: world record – Sept, 2007
   Red Hat Enterprise Linux dominates Web Serving performance testing
   30% higher results that preceding in serving static, dynamic and SSL
    based web pages
   Intel quad core Xeon announcement
   HP DL580

      Source: Intel news release Sept-2007                                 7
   Red Hat Enterprise Linux performance is proven across multiple workloads
     ●   Many world record performance results
   Example: Apache Tomcat running on Red Hat Enterprise Linux & Windows 2003,
    tested by www.webperformanceinc.com (2006)


                                           CPU Utilization

                 Red Hat Enterprise Linux can handle more web requests with lower
                       CPU utilization and higher throughput than Windows

Performance – leadership RMDS
   RMDS (Reuters Market Data System) report, Sept 14, 2007
     ●   By STAC (Securities Technology Analysis Center)
   2,800,000 updates/sec = highest source distributor throughput to date on 4
    socket or 2 socket server
   2.200,000 updates/sec = hightest point-to-point server throughput

The road to high performance & low latency
   Terminology – throughput & low latency
   Why low latency matters
   Performance doesn't just happen
   There is no panacea
   Workload dependent tuning is essential

Throughput & variability
                              Average times vary
                              No prioritization
                              Favors throughput
                               (traffic volume / time)

Predictability, deterministic – High speed lane
                              Prioritization
                              Determinism =
                               consistent response
                               time = low latency
                              Downside - Average
                               throughput may

Why determinism matters

   Dependably knowing a transaction will complete in finite time
    = determinism
   When “cl ose enough” isn' t “good enough”
   Timing constraints and consequences may vary
     ● Travel web site – a f ew seconds / missed booking
     ● Program trading – mi lliseconds / missed trades, irregularity
     ● Command & control – m icroseconds / life-threatening

    RHEL5 vs tuned RHEL5
    System tuning gets you most determinism

                       Red Hat Confidential   14
Tuning strategies – workload dependent
   Isolating application from kernel processing

      CPU-0                 CPU-1                 CPU-2             CPU-3

                                                        Disk        Network
App1       App1       App2       App2       General
                                                        Device      Device
Thread 0   Thread 1   Thread 0   Thread 1   OS Kernel
                                                        Interrupt   Interrupt

Tuning strategies – workload dependent
   Balancing application with kernel interrupt handers for
    cache efficiency (ie, network interrupt with application
    messaging thread)

      CPU-0                 CPU-1                 CPU-2             CPU-3

App1       App2       App1                   App2      Disk
                                 Network                           General
compute    compute    network                storage   Device
                                 device                            OS Kernel
Thread 0   Thread 1   packet                 thread    Interrupt

Optimizing File System Performance
   Results with various database tuning options
     ● RAW vs EXT3/GFS/NFS w/ o_direct (ie directIO in iozone)
     ● ASYNC IO options
        ● RHEL3 – DIO+AIO not optimal (page cache still active)
        ● RHEL4
           ● EXT3 supports AIO+DIO out of the box
           ● GFS – U2 full support AIO+DIO / Oracle cert
           ● NFS – U3 full support of both DIO+AIO
     ● HUGHMEM kernels on x86 kernels
       HugeTLBS – us e larger page sizes for share mem (ipcs)

Other typical tuning for determinism

   Disable unnecessary services/daemons (ie full gnome
    desktop on servers, misc monitoring agents, sendmail)
   Disk I/O elevator options
   Avoid filesystem journaling if unnecessary, or trim journal
   Loader options to resolve symbols at startup

What if tuning standard RHEL still doesn't meet
your determinism needs?
   New capabilities – enterprise compute platform
     ● Realtime kernel (RHEL-RT)
     ● High speed messaging middleware (AMQP)

    Improved determinism of realtime kernel

                        Red Hat Confidential   20
       Red Hat Performance 05/2007
    Detail zoom­in of RHEL5 vs RHEL­RT

                  Red Hat Confidential   21
Deterministic scheduling – the core of all
realtime offerings
   Realtime application thread runs, then typically blocks for event or
   Event (ie interrupt) or timeout expires <-- start the latency clock here
   Kernel acknowledges interrupt (or timer event)
   Kernel stops running task if high priority realtime thread is runnable
   Kernel saves context of prior running lower priority task
   Kernel loads context of high priority realtime application thread
   Kernel starts high priority application thread <-- stop the latency clock
   Deterministic scheduling = consistent, predictable duration between
    above start/stop time.

Timing quantified – demonstrated results

   Deterministic upper bound on latency.
    ●   Context switch latency under 25 µs, 99.9999% under 20 µs (from
        interrupt to commence running new process)
          ● Average of 38% improvement vs stock RHEL5

    ●   10 µs sleep time, vs 2 ms in stock RHEL5
    ●   Highly accurate gettimeofday() with nanosecond resolution
         ●   Can be tuned to µs accuracy vs ms based system call
             performance optimization
    ●   Note: lower bound on timing constraints is hardware dependent

    Realtime kernel work upstream
     Mainstream community initiative          
                                                  Robust futexes (2.6.17)
     Work over the last 2+ years              
                                                  Lock validator (2.6.18)
     Maintained in -rt tree                   
                                                  Priority inheritance futexes (PI-futex)
          90% Red Hat contribution                (2.6.18)
          120,000 lines of change at peak
                                                  Generic IRQ layer (2.6.18)
     Over the last year, almost 1/2 patches
                                                  Core time re-write (2.6.18)
     have moved from -rt to mainline          
                                                  Sleepable RCU (2.6.19)
     kernel:                                  
                                                  Latency Tracer (circa 2.6.18)
     BKL preemptable (2.6.8)                  
                                                  High-res+dynticks (2.6.21)
     Mutex patch (2.6.16)                     
                                                  CFS – completely fair scheduler
     Semaphore-to-Mutex conversion                (2.6.23)
     (ongoing ~85% done)                      
                                                  Conversion of spin-locks to mutex
     Hrtimers subsystem (2.6.16)                  (2.6.23+)
                                                  All Interrupt handling in threads
                                                  Full rt-preempt (~2.6.24+)

Upstream success – Improve kernel lock
   Improve granularity – identify and correct contention points
   Mutex rather than semaphores
     ●   Mutexes are lighter weight
   Lock validator
     ●   Efficient runtime confirmation of lock ordering
     ●   Can detect race conditions without actually hitting them
   Priority Inheritance (PI)
     ●   Prevents low priority processes blocking higher priority. Problem scenario:
          ●   Low priority process takes lock
          ●   High priority process needs lock, but must wait
          ●   Long running medium priority process preempts low priority process
     ●   Solution: temporarily boost low pri process to allow completion
     ●   Required for realtime java – 1000's of threads

Upstream success – timer precision & interrupt
   Timer enhancements
     ●   Infrastructure cleanup – factor common code, increase fields to represent
         nanosecond precision
     ●   Timer precision – utilize high resolution hardware timers at microsecond
         precision rather than approximate periodic time interrupt millisecond precision
     ●   Generic timeof day – cleanly accommodate diverse clock sources
     ●   VDSO gettimeofday() - performance enhancement for millisecond accuracy
     ●   Dynamic ticks – power savings – no need to to timer interrupt 1000 times per
         second on idle system – transition to low power state (great for OLPC)
   Interrupt handling
     ●   Generic IRQ mechanism – infrastructure cleanup – factor common code
     ●   More fine-grained hardware interrupt control
   CFS – Completely fair scheduler
     ●   Provides fair interactive response times in almost all situations
     ●   Includes modular scheduler framework – realtime task scheduler first

Realtime performance monitoring tools

   Existing standard RHEL5 based performance monitoring tools remain
    ●   Gdb, OProfile Frysk – source level debuggers & profiler
    ●   SystemTap, kprobe – kernel event tracing and dynamic data
    ●   kexec/kdump standard kernel dump / savecore capabilities
   Latency Tracer – new RHEL-RT feature
    ●   Runtime trace capture of longest latency codepaths – both kernel
        and application. Peak detector
    ●   Selectable triggers for threshold tracing
    ●   Detailed kernel profiles based on latency triggers

    Red Hat Performance
       Number of Samples per 10k messages
                                                       Histogram of Tibco EMS Response Times
                                                    9997204   10x variability improvement 
                                            12000                                                            11708
                                             8000                                                                            Standard RHEL5 
                                             6000                                                                            Realtime RHEL5 
                                             5000                                                                            (tuned)
                                             1000          744 1050
                                                                           2         0         0         0      0      0
                                                     <1    1­2    2   5­       10­       20­       50­        > 200  Peak 
                                                     ms    ms     ­   10ms 20ms 50ms 100m ms                         ms

                                                                               Red Hat Confidential                                            28
Full application compatibility
- No application changes required
   All of the realtime enhancements are in the kernel – under the hood
    from an application perspective.
   No application changes are required to benefit from realtime
     ●   Applications which are latency bottlenecked due to kernel scheduling and 
         interrupt handling will see benefit.
     ●   Latencies introduced entirely in userspace (sub­optimal application code,  
         unbounded java garbage collection, etc) are not eliminated.
   Recompilation is not required (same gcc/glibc as standard RHEL5)
     ●   Applications recompiled on RHEL5 benefit from pi mutex glibc 
         implementation enhancements to avoid syscall overhead on uncontested 

Realtime Java (RTSJ)
   Versions of Java which are more deterministic – primarily by removing
    garbage collection unpredictability and inter-JVM communication
   RHEL-RT is the only Linux kernel having the prerequisites (ie, Priority
    Inheritance, preemption)
   Working closely w/ IBM
     ●   IBM WebSphere Real Time
     ●   Realtime spec conformant – 200,000 rt thread capable
     ●   Exclusive realtime garbage collector
     ●   1ms max GC pause time
     ●   Uses at most 30% cpu in any 10ms window
   Deployed by US Navy
     ●   DDG Destroyer program

Summary – Which RHEL kernel is best for you?
   Standard RHEL4 & 5                         RHEL-realtime
    ●   General purpose server, without         ●   Specialized server for
        strict performance SLAs                     deterministic low-latency
    ●   Throughput intensive – ie                   performance SLA
        database (TPC-C/H), file serving,       ●   Specifically identified high
        web / mail servers, HPC clusters            priority processes requiring
    ●   No identified high priority                 rapid scheduling in response to
        processes - treated equally                 events – ie recalculate of risk /
    ●   Tuning with cpu affinity and                trade based on new info
        interrupt binding [in RHEL4+5]          ●   Requirements for high precision in
        provides required latency                   timing, ie gettimeofday()
    ●   Cost sensitive
    ●   Virtualization capabilities
    ●   3rd party kernel ISVs

The broader picture for high performance
   Realtime kernel
   Infiniband
   AMQP messaging
   Realtime java

Collaboratively developed realtime OS + messaging.
      Realtime java. The power of community.

     Financial Services 
                                Telco / NEBS               Federal

       Red Hat AMQP Messaging                   IBM – realtime java

               Red Hat ­ RHEL­RT                Linux Kernel Community

         IBM                Intel              AMD              HP

         Realtime & Messaging go hand­in­hand
    Most of the projects that drive realtime requirements include 
        Requirements cover: predictability, high speed, reliability, multi­
        vendor interoperability,  security and scalability.
    Realtime has advantages for any message system, but unique 
    advantages when combined with Red Hat's AMQP based messaging.

What is AMQP

An Open Standard for Middleware:
 Middleware: software that connects other software together. Middleware
   connects islands of automation, both within an enterprise and out to
   external systems.

Why it is different:
 A straight-forward and complete solution for business messaging

 Cost effective for pervasive deployment

 Totally open (developed in partnership)

 Created by users and technologists(Messaging, OS, and Network) working
 Made to satisfy real needs (IVQ, LVQ, Replay, ...)

In development for 3+ years, went public on June 20th 2006
A Comprehensive Solution

Messaging Middleware should…
 Provide Event Notification, Messaging, File Transfer               Publish/
   Deals with business transaction processing                      Subscribe
   Technology agnostic (there is more than Java)

 Meet real-world requirements of mission-critical systems
 Be Trustworthy
   Robust, available, scalable, secure, resilient                  Messaging
   Aims to be stable over the long run

 Provide a common infrastructure for the enterprise                          transact

AMQP meets these needs in one protocol                              File Transfer
 Usually provided by 3 different proprietary products

 One solution reduces costs, increases efficiency and simplifies               report
    How does it fit together / what is it used for?

Now becoming available (performance, determinism, value) -- combine
 AMQP middleware and

 Linux Real Time Kernel and

 Gigabit Ethernet or Infiniband networking, Async durable IO

   ...offers a compelling solution for mission critical or large scale deployment

Development AMQP (www.amqp.org)
 AMQP v0-8 implementations are in production in several firms (TCP/Ethernet)

 AMQP v0-10 will be released soon (TCP/SCTP/Ethernet)

 AMQP v0-11 circa Q4 (TCP/UDP Multicast/SCTP/Ethernet/Infiniband)

Applicable to most lines of business
 Initiative started in Finance, but interest and applicable to Telcom, Government, Healthcare and e-
   commerce / Retail.
   What makes the RHM/RHEM distributions special

Red Hat Messaging implements AMQP and ....
  Provides a stable tested and supported distribution
  Will be supported in multiple languages and platforms (non-Linux also)

Provides the fastest reliable-messaging platform
  Adds optimizations and new IO layers for Linux
  Can take advantage of realtime Linux work for predictability

Adds QoS features
  direct write and persistence alternatives
  Integrates with OS Clustering facilities
  Support for initial value cache, replay, last value cache
  Transactions

Becomes standard facility
  Can utilize, virt/Xen, OS security
  Forms the basis for many other new distributed OS services

All work is open source !!
Infiniband drivers & libraries

   Include Infiniband OFED 1.2 kernel & libraries
   Kernel components
     ●   IB hardware drivers: mthca, ipath, ehca, amso1100, iw_cxgb3
     ●   Protocols: SDP, iSER, SRP, IboIB, VNIC
   Infiniband libraries
     ●   Libverbs – direct verbs API programming
     ●   Librdmacm – connection management
     ●   OpenSM – subnet fabric manager
     ●   Dapl libraries – DAT-1.2 compliant
     ●   OpenMPI libraries
   AMQP based middleware optimizations to utilize infiniband

When? - Target Date – realtime & messaging
compute platform
   Shipping early CY '08 (January)
   First Beta test milestone fall '07 (Oct / Nov)
     ● External customer and partner participation invited
     ● Contact your Red Hat rep for details on participating in
       early evaluations


Shared By: