Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Dirk - PowerPoint

VIEWS: 27 PAGES: 24

									Power Analyzer
Program Review


    Dirk Grunwald
University of Colorado




                  Department of Computer Science
                  University of Colorado
                              Overview

   Clustered Voltage Scaling
       Motivator to insure out power
        analyzer can model new            Architecture
        microarchitectural
        mechanisms


   Operating System Voltage                Operating
    Scaling
                                            Systems
   Memory system control for
    low power
                                          Runtime Systems




                                         Department of Computer Science
                                         University of Colorado
Lessons from Physics…




               Department of Computer Science
               University of Colorado
Dynamic Voltage Scaling in SA-2

       800                                             1000
                                                       900
                                                       800
       600
                                                       700




                                                               Power (mW)
                                                       600
MIPS



       400                                             500
                                                       400
                                                       300
       200
                                                       200
                                                       100
         0                                             0
              150 MHz      400 MHz     600 MHz

       MIPS    185           500         750
       mW       40           180         450


                     2 Billion Instructions
                         @ 750 MIPS
                      Takes 2.7 seconds
                      Consumes 1200 J.
                                          Department of Computer Science
                                          University of Colorado
          Running “just fast enough”
             cuts energy 3-fold



150Mhz
                Busy                                    432 J


600 Mhz
                      Idle                         1200 J

           12 second duty cycle
                              Department of Computer Science
                              University of Colorado
  You can exploit slack at many levels


 Circuits   – Dual Vdd or Dual Vt designs
  “Design methodology of Ultra Low-Power
   MPEG4 Codec Core Exploiting Voltage Scaling
   Techniques” – Igarashi et al, DAC’98 ~50%
   power reduction, no performance loss

 Architecture
  Macro-level clustered voltage scaling, multi-
   voltage multithreading

 Operating    Systems
 Applications   / Runtime Systems

                                Department of Computer Science
                                University of Colorado
  Why are we looking at system-level
         savings & modeling

 Computer architecture folks are the
 only people who would watch an MPEG
 movie at 120 fps…
 We ignore interactive applications
 But “that’s the future of computing”

 And   we ignore human response times..
 Do credit card transactions need to be faster
  than a 1/10th of a second?




                               Department of Computer Science
                               University of Colorado
            Clustered Voltage Scaling


   Voltage Scaling normally refers to varying
    voltage over time.


   CVS is voltage scaling in space
     Run  part of a processor at different V & f
     Historically, done at circuit level
     We’re trying to exploit at component level




                                         Department of Computer Science
                                         University of Colorado
           Clustered Voltage Scaling




   CVS already applied at circuit level
   Mitsubishi designed MPACT media processor w/2
    voltage levels for 43% power savings, 10% area
    increase, no performance hit


                                   Department of Computer Science
                                   University of Colorado
                  Slack Scheduling


   Use inherent instruction dependencies and
    operational latencies to form alternate schedules
   Slack is the minimum time difference, in cycles,
    between when an instruction’s output is produced and
    when it is consumed


   We want to exploit slack within a cluster of functional
    units
   Schedule “slackfull” instructions to slower pipelines
    that run at ½ speed and reduced power


                                      Department of Computer Science
                                      University of Colorado
                    Exploiting Slack


add   r0, r1,     r2; (A)
sub    r3, r4,    r5; (B)        A                B              C
and   r9, 0x1,    r9; (C)                                        D
ornot r5, r9,     r10; (D)                                       E
xor    r2, r10,   r11; (E)

Add       Sub             And
(A)       (B)             (C)



                  ornot         Slow           Fast            Fast
                   (D)           A1               B              C
                                 A2                              D
                                                                 E
          xor
          (E)
                                      Department of Computer Science
                                      University of Colorado
                Simulation Methodology

   Simulation architecture
       SimpleScalar 3.0a w/Cai+Wattch mode
       4-wide 21264; 16-entry RUU
   SPECint95 benchmarks




                                              Department of Computer Science
                                              University of Colorado
                 Results and Potential


   Over 90% of issue cycles
    have at least one
    instruction that has 1 cycle
    of slack
     This means that 90% of the
      time, we could run one
      instruction on a “slow pipe”
      without impacting
      performance

     Between   1-7% have 2
      cycles



   68-87% instructions with
    slack are integer


                                     Department of Computer Science
                                     University of Colorado
          Story Gets Better With More
             Aggressive Processor

   Slack is affected by
    deeper RUU              RUU Size 3


                                      U                 V              W
   More opportunities to
    find slack
                            RUU Size 4                             X

   More slack values
    > 1 cycle available
                            RUU Size 5                   Y




                                  Department of Computer Science
                                  University of Colorado
         Operating System Scheduling


   Goals: Control power
    using clock / voltage
    scheduling
     Real systems
     Real apps

   To date: comparison
    study showing that
    previously proposed
    heuristics don’t really
    work well
     Why know why
     How to fix it


                              Department of Computer Science
                              University of Colorado
        What are some challenges?


 How   slow is fast enough?
 How do I tell the architecture?
 Enforce constraints?
 Not miss deadlines?

 CanI define benchmarks and evaluation
 methodologies for human-scale
 computing where voltage?




                               Department of Computer Science
                               University of Colorado
        Difficult to predict application
                    demands



  Goal: Don’t disturb                           Speech
 application behavior.                         Rendering
Inelastic performance.




                               Audio
                             Rendering



                              Department of Computer Science
                              University of Colorado
                          Prior work


   Weiser et al and Govil et al
     Used  “Intervals”
     Selected average
      weighted average
     Reported
      great success


   Pering et al
     Tried intervals
     Switched to RTOS,
      which has high
      demands on
      applications



   Is RTOS really needed?

                                       Department of Computer Science
                                       University of Colorado
                            Evaluation


   Implement clock scheduling module in Linux 2.0.35
    kernel
     Extensible   – can model all practical prior policies

   Strong SA-1100 provides 15 “clock steps” from 56Mhz
    to beyond 206Mhz
   Used modified motherboard
     Useful,but not critical in early study
     Drop from 1.5V to 1.23V only provides 10% power reduction



   Measured “reasonable” applications
     Text   speaker, chess player, Web browser, MPEG video player



                                               Department of Computer Science
                                               University of Colorado
             Widely Varying Power Usage

   Better evaluation metric given
    resources is scheduling
    stability.
       E.g. MPEG-1 player runs at
        80% utilization at 206Mhz
       Should be able to “settle” at
        ~176mhz @ 93% utilization



   But, the “best” policy has
    widely varying power settings
    from
       “Best” policy is assume next
        scheduling interval is same as
        prior
       Only run at fastest or slowest
        setting
   This is awful, but we know
    why this happens                     Department of Computer Science
                                         University of Colorado
                                    Leading to bursty energy demands..
                                                                                                               inst power


            1.8


            1.6


            1.4                                                                              clock speed
                                                                                             200 Mhz ->50-                                                                                                       scribble
                                                                                             >200                                                                                         gc?
            1.2
power (w)




             1


            0.8


            0.6


            0.4


            0.2
                                                                                                                 calculator                   scribble                  scribbile not
                      rebooting the itsy and having Java boot (8                                                                                                        being used
                      times)
             0
                  0




                                                         117
                                                               137
                                                                     156
                                                                           176
                                                                                 195
                                                                                       215
                                                                                             234
                                                                                                   254
                                                                                                         274
                                                                                                               293
                                                                                                                      313
                                                                                                                            332
                                                                                                                                  352
                                                                                                                                        371
                                                                                                                                              391
                                                                                                                                                    410
                                                                                                                                                           430
                                                                                                                                                                 449
                                                                                                                                                                       469
                                                                                                                                                                             488
                                                                                                                                                                                   508
                                                                                                                                                                                         528
                                                                                                                                                                                               547
                                                                                                                                                                                                     567
                                                                                                                                                                                                           586
                                                                                                                                                                                                                 606
                                                                                                                                                                                                                       625
                                                                                                                                                                                                                             645
                                                                                                                                                                                                                                   664
                      19.5
                             39.1
                                    58.6
                                           78.2
                                                  97.7




                                                                                                                     time (seconds)

                                                                                                                            inst power                    Department of Computer Science
                                                                                                                                                          University of Colorado
                What we’re doing now


   All this work done in old O/S
     Upgrading   to Linux 2.4.0, interoperation with iPAQ & Itsy

   Determine minimal O/S mechanism
     Simple: “Go fast” vs “I went too slow”
     More complex: soft real time system
     Application-specific behavior state
     Via queue length for events in e.g. Java applications



   Trying to implement control mechanism in a number
    of processor families
     AMD K6-III+ Mobile
     SpeedStep
     “Xscale”



                                              Department of Computer Science
                                              University of Colorado
                    Memory Management
                     Energy Efficiency

   Dynamic memory management a large part of
    “complex” applications
   Implemented four memory management mechanisms
    on Itsy
     No allocation, explicit allocation, conservative allocation,
      incremental conservative allocation

   Measured processor, system with DAQ
   Energy not always correlated with performance
                of CPU and memory system fairly complicated
     Interaction
     SA-1 places CPU in sleep mode on memory traffic, slows clock.
     Thus, memory traffic can take much time but less energy

   Plan to exploit interaction with O/S for powering down
    memory pages
                                              Department of Computer Science
                                              University of Colorado
          Collateral & Related Projects


   NSF ITR proposal funded on integrated power
    management of wireless and system
    resources
     Management     of 802.11b performance for location,
      trajectory, real time constraints
     Adhoc routing for global energy minimization



   Broader effort at leading to inter-departmental
    center
     “Colorado   Center for low-power, ubiquitous, mobile and
      pervasive systems” (CCLUMPS)
     Circuits, architecture, telecom, computer services,
      applications


                                         Department of Computer Science
                                         University of Colorado

								
To top