VADA Lab's Biweekly Project Meeting - PowerPoint by 0Q0S3Rc

VIEWS: 0 PAGES: 77

									SoC 저전력 설계 기법


         조준동
 SungKyunKwan University
       VADA Lab.



                           1
             · Content


   Introduction
   SOC Design Trends
   System Level Low Power Design
   Architecture Level Low Power Design
   Conclusion




                                          2
              · SOC Design Trends

 Expected to integrate more and more complex
   • Web-browsing, real-time video processing, speech
     recognition and synthesis

 Average operating power at or below 100mW and
  standby power levels at or below 2mW

 Performance levels must increase from 300 million
  operations per second (MOPS) today to 2500 MOPS in
  2016



                                                        3
                     Achieving functionality while
             maximizing battery life and minimizing size
                                                                 Noise
                                                GPS              cancellation
                             Cochlear implant
                                                                 headphones
            Cellular phone



Medical
watch




                       Hearing
                       aid                      Portable
Digital still camera                            audio      Digital radio



                                                                                4
             QoS vs. Power

• How accurate should I make my FDCT?




                                        5
                 SOC Design Characteristics

 The new version of ITRS predicts that Moore’s law
  will continue on a two to three year cycle throughout this
  period (2001-2016)

 One of the key design challenges is to effectively use
  the dramatically increasing transistor counts, given
  certain power and productivity constraints

 “Bottom-up” - based on system constraints
  “Top-down” - based on design resource constraints



                                                               6
                    Energy-Flexibility Gap

            1000
                   신호처리    200 MOPS/mW
(MOPS/mW)




                    ASIC
에너지 효율




            100
                     재구성 구조          10-80 MOPS/mW
             10
                    신호처리 프로세서            3 MOPS/mW
                     ASIPs, DSPs
              1

                     임베디드 프로세서(ARM)             0.5 MOPS/mW
             0.1
                                         가용성


                                                          7
                                                              6
               Radio systems


• WiFi – 10-100Mbits/sec unlicensed band
  – OFDM, M-ary coding
• 3G – .1-2 Mbits/sec wide area cellular
  – CDMA, GMSK
• Bluetooth – .8 Mbit/sec cable replacement
  – Frequency hop
• ZigBee – .02-.2 Kbits/sec low power, low cost
  – QPSK
• UWB – Recently allowed by FCC
  – Short pulses (no carrier), bi-phase or PPM

                                                  8
                Data rate

                                            UWB
100 Mbit/sec
                  802.11g                  802.11a
 10 Mbit/sec      802.11b


  1 Mbit/sec       3G
                             Bluetooth
100 kbits/sec                ZigBee
                 ZigBee
 10 kbits/sec UWB
          0 GHz 1GHz    2 GHz 3 GHz 4 GHz 5 GHz 6 GHz

                                                     9
             Cost (projections)


$1000

$100         3G
                                  802.11a
                   802.11b,g
 $10                              UWB
                    Bluetooth
          ZigBee     ZigBee
  $1
        UWB
$ .10
   0 GHz 1GHz 2 GHz 3 GHz 4 GHz 5 GHz 6 GHz


                                              10
                Power Dissipation


 10 W
                                     802.11a
  1W            3G      802.11bg


100 mW                  Bluetooth
                                    UWB
 10 mW                  ZigBee
               ZigBee
         UWB
 1 mW
    0 GHz 1GHz 2 GHz 3 GHz 4 GHz 5 GHz 6 GHz


                                               11
            Why Low-Power Devices?
• Practical reasons
  (Reducing power requirements of high
  throughput portable applications)
• Financial reasons
  (Reducing packaging costs and achieving
  memory savings)
• Technological reasons
  (Excessive heat prevents the realization of
  high density chips and limits their
  functionalities)


                                                12
         Different Constraints for
         Different Application Fields

• Portable devices: Battery life-time
• Telecom and military: Reliability
  (reduced power decreases
  electromigration, hence increases
  reliability)
• High volume products: Unit cost
 (reduced power decreases packaging cost)


                                        13
          Driving Forces for Low-Power:
          Deep-Submicron Technology

  ADVANTAGES             DISADVANTAGES
 Smaller geometries    Higher power
 Higher clock           consumption
  frequencies           Lower reliability




                                             14
               Dynamic Power Consumption
• Average power consumption by a node cycling
  at each period T:
 (each period has a 01 or a 1 0 transition)


                            Ecycle
      Pswitchingbattery            C0VDD f CLK
                                        2

                             T
   Average power consumed by a node with
   partial activity
   (only a fraction    of the periods has a transition)
        Pswitchingbattery  C V    2
                                  0 DD CLKf
                                                           15
             · Power Model



• Power dissipation in
  logic blocks, consists
  of both dynamic
  (switching) and static
  (standby)




                             16
             · Power Model


• Memory power is due
  primarily to row/column
  decoders and bit and
  word line switching
  activity

• Consider the power
  dissipated when the
  bitlines are switched by
  approximately VDD
  during write cycles


                             17
            · Chip Composition (Future)
 Low-power digital SOC designs of the future will
  be 90-95% memory and 5-10% logic, including
  overhead
 Future chips may be dominated by memory due to
  power and resource constraints




                                                18
          Three Factors affecting Energy
– Reducing waste by Hardware
  Simplification: redundant h/w extraction,
  Locality of reference,Demand-driven /
  Data-driven computation,Application-
  specific processing,Preservation of data
  correlations
– All in one Approach(SOC): I/O pin and
  buffer reduction
– Voltage Reducible Hardwares
   – 2-D pipelining (systolic arrays)
   – Parallel processing
                                         19
            저전력 설계 기법들…
• Voltage and process scaling
• Design methodologies
  – Power-aware design flows and tools, trade area for
    lower power
• Architecture Design
• Power down techniques
  – Clock gating, dynamic power management
• Dynamic voltage scaling based on workload
• Power conscious RT/ logic synthesis
• Better cell library design and resizing
  methods
  – Cap. reduction, threshold control, transistor layout
                                                      20
SoC Design Flow




                  21
            Power Analysis
• Fast and accurate analysis in the design
  process
  – Power budgeting
  – Knowledge-based architectural and
    implementation decisions
  – Package selection
  – Power hungry module identification
• Detailed and comprehesive analysis at the
  later stages
  – Satisfaction of power budget and constraints
  – Hot spots



                                                   22
Power Savings




                23
Estimation Expectations




                          24
          System Level Power Optimization

• Algorithm selection / algorithm
  transformation
• Identification of hot spots
• Low Power data encoding
• Quality of Service vs. Power
• Low Power Memory mapping
• Resource Sharing / Allocation


                                        25
          Flow

• C/C++ Compilation
• Program Execution
• Building design representation
• Loading profiling data
• Setting constraints
• Power estimation
• Identification of Hot Spots


                                   26
              IBM’s PowerPC

• Optimum Supply Voltage through Hardware Parallel,
  Pipelining ,Parallel instruction execution
   – five instruction in parallel (IU, FPU, BPU, LSU,
     SRU) , RISC
   – FPU is pipelined so a multiply-add instruction can
     be issued every clock cycle
   – Low power 3.3-volt design
   – 603e provides four software controllable power-
     saving modes.
• Copper Processor with SOI
• IBM’s Blue Logic ASIC :New design reduces of power
  by a factor of 10 times

                                                      27
            Silicon-on-Insulator


• How Does SOI Reduce Capacitance ?




  Eliminated junction capacitance by using
 SOI (similar to glass) is placed between the
 impuritis and the silicon substrate
 high performance, low power, low soft error

                                                28
          Why Copper Processor?
• Motivation: Aluminum resists the flow
 of electricity as wires are made thinner
 and narrower.
• Performance: 40% speed-up
• Cost: 30% less expensive
• Power: Less power from batteries
• Chip Size: 60% smaller than Aluminum
  chip

                                            29
          Factors Influencing Ceff


• Circuit function
• Circuit technology
• Input probabilities
• Circuit topology




                                     30
             Some Basic Definitions
• Signal probability of a signal g(t) is given by

                         1 T2
           P g   lim  g t dt
                    T  T T 2

   Signal activity of a logic signal g(t) is given by

                             ng T 
               A g   lim
                        T    T
 where ng(t) is the number of transitions of g(t) in
  the time interval between –T/2 and T/2.

                                                        31
             Factors Influencing Ceff:

         Circuit Function
• Assume that there are M mutually independ
  ent signals g1, g2,...gM each having a signal pr
  obability Pi and a signal activity Ai, for i  n.
• For static CMOS, the signal probability at th
  e output of a gate is determined according to
  the probability of 1s (or 0s) in the logic descri
  ption of the gate
                  P1              P1
                          P1P2           1-(1-P1)(1- P2)
 P1       1-P1
                  P2              P2
                                                    32
              Factors Influencing Ceff:

   Circuit Function (Static CMOS)
• Transistors connected to the
  same input are turning on
  and off simultaneously when
  the input changes
• CL of a static CMOS gate is
  charged to VDD any time a
  01 transition at the output
  node is required.
• CL of a static CMOS gate is
  discharged to ground any
  time a 1 0 transition at
  the output node is required.


                                 NOR Gate   33
         Factors Influencing Ceff:
         Circuit Function (Static CMOS)

• State transition diagram of the NOR gate




      1  pY  pY  pY 1  pY   3 8
                       
            pY '             pY '



                                             34
         Factors Influencing Ceff:
         Circuit Function (Static CMOS)

• State transition diagram of the NOR gate




           pY ' pY  pY pY '  1 2


                                             35
          Factors Influencing Ceff:
          Input Probabilities (Static CMOS)

• Signal activity calculation: Boolean
  Difference
            f xi  f       xi 1
                                     f   xi 0
  It signifies the condition under which output f
  is sensitized to input xi
  If the primary inputs to function f are not
  spatially correlated, the signal activity at f is

            Af      Pf
                   1i  N
                                     xi  Axi
                                                      36
              Power Reduction Methods:
              Architecture Driven Supply
              Voltage Scaling
• Strategy:
   1. Modify the architecture of the system so as to
     make it faster.
   2. Reduce VDD so as to restore the original speed.
     Power consumption has decreased.
• The most common architectural changes rely on the
  exploitation of parallelization and pipelining.
• Drawback:
   The additional circuitry required to compensate the
     speed degradation may dominate, and the power
     consumption may increase.
• Consequence:
  Parallelism and pipelining do not always pay-off.

                                                         37
Parallel Architectures




  Ppar=0.36Pref
                         38
Parallel-Pipelined Architectures




    Ppar=0.2Pref
                             39
                    Loop unrolling
• The technique of loop unrolling replicates the body of a
  loop some number of times (unrolling factor u) and then
  iterates by step u instead of step 1. This transformation
  reduces the loop overhead, increases the instruction
  parallelism and improves register, data cache or TLB
  locality.
                                              for i = 2 to N - 2 step 2
 for i = 2 to N - 1
                                                 A(i ) = A(i ) + A(i - 1) A(i + 1)
   A(i ) = A(i ) + A(i - 1) A(i + 1)
                                                 A(i  1) = A(i  1) + A(i ) A(i + 2)

Loop overhead is cut in half because two iterations are performed in each iteration.
If array elements are assigned to registers, register locality is improved because A(i) and
A(i +1) are used twice in the loop body.
Instruction parallelism is increased because the second assignment can be performed
while the results of the first are being stored and the loop variables are being updated.


                                                                                      40
                 Loop Unrolling (IIR filter example)

   Two output samples are computed in parallel based
   on two input samples.

             Yn1  X n1  A  Yn2
             Yn  X n  A  Yn1  X n  A  ( X n1  A  Yn2 )


 Neither the capacitance switched nor the voltage is altered.
However, loop unrolling enables several other transformations
(distributivity, constant propagation, and pipelining). After
distributivity and constant propagation,
                                                     Yn 1  X n 1  A  Yn 2
                                                     Yn  X n  A  Yn1  A2  Yn2

The transformation yields critical path of 3, thus voltage can be
dropped.
                                                                                       41
Loop Unrolling for Low Power




                               42
Loop Unrolling for Low Power




                               43
Loop Unrolling for Low Power




                               44
                              Encoding

• Bus-invert (BI) code
   – Appropriate for random data patterns
   – Redundant code (1 extra bus line)
   – Reduce avg. transitions up to 25%
         0000                    0000          0            X                                 D                Z
         1010                    1010          0
         0100                    1011          1
         1111                    1111          0                       Majority
                                                                        voter
                                                                                              D                inv
         1010                    1010          0
         0100                    1011          1
         1101                    0010          1            Z                             X
         0011                    0011          0
                                                          inv
R. J. Fletcher, “Integrated circuit having outputs configured for reduced state changes,” May 1987, U.S. Patent 4667337.
M. R. Stan and W. P. Burleson, “Bus-invert coding for low-power I/O,” IEEE Tr. on VLSI Systems, Mar. 1995, pp. 49-58.


                                                                                                                      45
               Different Supply Voltages
               for Different Units
• Partition the chip into multiple sub-units each
  of which is designed to operate at a specific
  supply voltage

                   5V
                   SLOW         5V        3V

      FAST                      5V




                                        3V
     SLOW
            SLOW




                              3V
                                   3V
                   SLOW                   3V

                                                46
                                    Channel
                                                                    47




                           (Gaussian, Ricean, Rayleigh)
                                                      COFDM
                                                     Modulator
                       COFDM
                                                       (IFFT,
                       Modulator
                                                    Phase/Timing
                        (FFT)
                                                    Lock, Frame
                                                        Sync
                          Time                          Time
Eureka 147/KDMB을 위한




                       Interleaver                  Deinterleaver
                      Convolutional                     Viterbi
                        Encoder                        Decoder
COFDM 모뎀 블록도




                       Scrambler                      Scrambler
                      Convoluional                  Convoluional
                       Interleaver                  Deinterleaver
                      Reed Solomon                 Reed Solomon
                        Encoder                      Decoder

                        Serial




                                                      Serial
                        Data




                                                      Data
                                          BERT
                                 (Bit-Error-Ratio-Tester)
     DMB 변복조부 국내․외 현황
  업체명               생산품목과 주요 특징
   TI       DRE200 : 범 용 DSP 사 용 하 여
  (미국)      COFDM/Audio FEC/Decoder수행, 160mW
 ATMEL      U2739M : Oak DSP사용하여 COFDM복조
  (독일)      , HW Audio / FEC Decoding, 860mW
Panasonic   MN66720UC : SDSP for COFDM, MDSP
  (일본)      for Audio,

  Frontier  Chorus FS1010 : Special   DSP   for
Silicon(영국) COFDM/Audio, 100mW

                                              48
               저전력 소모 기술 개발 현황
      개발자                   응용 제품                특징
   IBM, Austin                              Linux power
                     DPM (PowerPC 405LP)
   Low power                                management
                     휴대용 프로세서
Computing Research                          (90% 전력 감소)
                     Power Aware            전력관리, 스케줄링,
   DoD DARPA
                     Communication          OS 시스템
                     PCF50606:
                                            Programmed
     Philips         Single Chip power
                                            power
STMicroelectronics   management unit (for
                                            management
     Atmel           smart phone and
                                            (70% 전력 감소)
                     wireless PDA)
                                            RTL 구조의 HDL
                                            및 SystemC로
    Atrenta사           GlassSpy CAD tool
                                            gate된 클록 구조를
                                            생성

                                                          49
                                                                                                                                                      VADA Lab’s 저전력 IP’s

                                                                                                                                                                                                                                         buffer i ( wt )

                                            y
 x                                                                                  z
                                                                                   40

                                                                                                                                                  Conventional FEQ   Low-Power FEQ
                                                                                   35
                                                                                         Conventional FEQ


                                                                                   30
                                                                                                                                                                                                                                                                                                                         A DDRES S BUS (8BIT)

      Conjugator              c                                                    25
                                                                                                       Low-Power FEQ
                                                                                                                                                                                                           PE                      PE                          PE                     PE

       x*
                                                                                  20                                                                                                                                                                                                             Memory
                                                                                                                                                                                                                                                                                                     PDF                 DATA BUS(32BIT)
                             Coefficient                  Error
                                                                                   15

                                                                                                                                                                                                                                                                                                ( b j ( wt 1 )   HOST                            Crypto    CLK
                              Update                      Control                  10
                                                                                                                                                                                                     comparator            comparator                    comparator                comparator                      CPU       DATA BUS(32BIT)    Processor
                                                                                   5

                                                                                                                                                                                                                                                                                                                                RESET
                              Learning
                                                                                    0
                                                                                                                                                                                                                                                                                                                                 CS                         DW
                                                                                                                                                                                                                                                                                                Transition
                                                                                               1




                                                                                                             1




                                                                                                                        1




                                                                                                                                 1




                                                                                                                                        1




                                                                                                                                             1




                                                                                                                                                  1




                                                                                                                                                              1




                                                                                                                                                                      1
                                                                                    1



                                                                                             10




                                                                                                            20




                                                                                                                       30




                                                                                                                               40




                                                                                                                                       50




                                                                                                                                            60




                                                                                                                                                 70




                                                                                                                                                            80




                                                                                                                                                                     90
                              Constant                                              -5
                                                                                                                                                                                                                                                                                                                                  RD
                              Control
                                                                                                                                                                                                                                   Control Generator                                              (a )
                                                                                                                                                                                                                                                                                                        ij                        WR
                                                                                                                                                                                            i ( wt 1 )
     Low-Power Equalizer for xDSL                                                                                                                                                                                                                                                                                                                                               DIN_Reg




                                                                                                                                                                                                                                                                                                                   스마트 카드용 차세대
                                                                                                                                                                                                                                                                                                                                                                                                      Key_add




     21% 전력 감소, SNR=40dB                                                                                                                                                                    Maximizing Memory Data Reuse for                                                                                                                                       clk
                                                                                                                                                                                                                                                                                                                                                                   enb
                                                                                                                                                                                                                                                                                                                                                                    rst
                                                                                                                                                                                                                                                                                                                                                                   Key
                                                                                                                                                                                                                                                                                                                                                                               Key
                                                                                                                                                                                                                                                                                                                                                                            Generation
                                                                                                                                                                                                                                                                                                                                                                                               clk
                                                                                                                                                                                                                                                                                                                                                                                              enb
                                                                                                                                                                                                                                                                                                                                                                                             sel_1

                                                                                                                                                                                                                                                                                                                                                                                            sub
                                                                                                                                                                                                                                                                                                                                                                                            Key
                                                                                                                                                                                                                                                                                                                                                                                                       Mux_1


                                                                                                                                                                                                                                                                                                                                                                                                      Byte_Sub




                                                                                                                                                                                            Lower Power Motion Estimation                                                                                          저전력 보안 프로세서 칩 설계
                                                                                                                                                                                                                                                                                                                                                                                                      Shift_Low



                                                                                                                                                                                                                                                                                                                                                                     clk                   enb       Mix_Column
                                                                                                                                                                                                                                                                                                                                                                     rst     Control       sel_1
                                                                                                                                                                                                                                                                                                                                                                    start                  sel_2
                                                                                                                                                                                                                                                                                                                                                                                             sel_2     Mux_2




                                                                                                                                                                                            33% 전력 감소, 52Mhz 2.1배 면적증가                                                                                             ECC, Rijndael, DES, SHA                                                            Key_add

                                                                                                                                                                                                                                                                                                                                                                                DOUT_Reg




                     search data buf f er                                   ref erence data buf f er

                                                                                                                                                                                            (SCI 논문)
                                                modified                shift register                           modified
                                                   PE                                                               PE
      external         external
      memory           memory                   modified                                                         modified
       search          current                     PE                                                               PE
        data             data                   modified                                                         modified
                                                   PE                                                               PE



              address                           modified                                                         modified
             generator                             PE                                                               PE

                                                modified                                                         modified
                                                   PE                                                               PE

          clock generator
                                                modified                                                      modified
                                                   PE                                                            PE

                                                modified                                                         modified
                                                   PE                                                               PE

            contorl signal
              generator                         modified                                                         modified
                                                   PE                                                               PE
                                                 c1_sum




                                                               c2_sum




                                                                                                            c3_sum




                                                                                                                              c4_sum




                                                                         shift registors

                                            compa          compa                                      compa                 compa
                                             rator          rator                                      rator                 rator



                                            Motion Vector
                                                                                                                                                                                     IS-95 기반 CDMA의 Double Dwell                                                                                                  OFDM-based high-speed
Fast and Low Power Viterbi Search                                                                                                                                                    Searcher 저전력 및 co-design 설계                                                                                                  wireless LAN platform
Engine using Inverse Hidden Markov                                                                                                                                                   67% 전력 감소, 41% 면적감소                                                                                                          20.7Mhz, 237000 gates
Model
68% 전력 감소, 71%속도개선,
                                                                                                                                                                                                       NCO
                                                                                                                                                                                                       NCO          CR
                                                                                                                                                                                                                    CR
                                                                                                                                                                                                                             CPE                                       CSI

                                                                                                                                                                                                                                                           Channel

                                                                                                                                                                                                                                                                                        High-Flexible Design of OFDM
                                                                                                                                                                                                                    GI
                                                                                                                                                                                                                    GI                   Phase
                                                                                                                                                                                                                                         Phase             Channel       Viterbi
                                                                                                                                                                                       ADC
                                                                                                                                                                                       ADC            Demod
                                                                                                                                                                                                      Demod                    FFT                       Estimator
                                                                                                                                                                                                                                                          Estimator      Viterbi
                                                                                                                                                                                                                               FFT

1.9배면적증가                                                                                                                                                                               IF
                                                                                                                                                                                                                  Removal
                                                                                                                                                                                                                  Removal

                                                                                                                                                                                                                  Coarse
                                                                                                                                                                                                                  Coarse
                                                                                                                                                                                                                                        Rotator
                                                                                                                                                                                                                                        Rotator          /Equalizer
                                                                                                                                                                                                                                                          /Equalizer      FEC
                                                                                                                                                                                                                                                                          FEC




                                                                                                                                                                                                                                                                                        Tranceiver for DVB-T (개발 중)
                                                                                                                                                                                                                   STR
                                                                                                                                                                                                                   STR


삼성 휴먼 테크 우수논문상, ‘02
                                                                                                                                                                                             DP
                                                                                                                                                                                             DP                                                   Fine
                                                                                                                                                                                                                                                  Fine
                                                                                                                                                                                                      Timing                                      STR              SER
                                                                                                                                                                                                                                                                   SER
                                                                                                                                                                                            AGC
                                                                                                                                                                                            AGC       Timing                                      STR
                                                                                                                                                                                       RF            Processor
                                                                                                                                                                                                     Processor    GI/FFT
                                                                                                                                                                                                                  GI/FFT
                                                                                                                                                                                                                  Detector
                                                                                                                                                                                                                  Detector




                                                                                                                                                                                                                                                                             DSP
                                                                                                                                                                                                                                                                             DSP

                                                                                                                                                                                                                                                                         ASIC
                                                                                                                                                                                                                                                                         ASIC




                                                                                                                                                                                                                                                                                                                                                                  50
         기타 저전력 설계 기법 사례



• 변화된 수 체계의 사용
• Scheduling/ordering
• 알고리즘 치환
• 신호 및 통계적 분석




                           51
                 수체계 변환에 의한 저전력 기법 –
                 I.1
•   Logarithmic Number System의 사용
                                    A  S A LA
•   Log 수 체계
                                         0,         if      A0
     – 연산 모듈 중 크기가 가장 큰 FFT         SA  
       에 적용
                                         1,         if      A0
     – look-up table이 크기에 변수
     – 어떤 수를 부호와 크기 영역으로
       분리한다. 크기 영역에 대해서 2의
                                         log 2  A ,        if   A 
                                    LA  
                                         log 2  ,
       log를 취한 값을 산출한다.
     – 변환된 log 값을 어떤 n 비트로 제
                                                              if   A 
       한된 표현 범위의 값을 갖는 2진수
       로 표현.                        A  1  2  S A 2 LA
•   LNS 연산
     – 곱셈 : 가산                      n  I  b 1
     – 가감산 : 가산고 감산 및 look-up
       table                        LA  ln 1 lblb 1 l0 
                                    ˆ
•   연산의 정확도
     – 소수부가 2비트 이상의 경우 BER               2b LA  0.5 / 2b ,
                                                                     if LA  0
       성능 감소 없음                     ˆ
                                    LA   b
•   전력 소모                                2 LA  0.5 / 2b ,
                                                                     if LA  0
     – 실험 결과 일반 butterfly FFT에
       비하여 약 60% 정도 까지 전력 소
       모가 감소함
     – 7.8mW -> 3.1mW
                                                                          52
수체계 변환에 의한 저전력 기법 –
I.2




                      53
            연산 순차 변환에 의한 저전력 기법 –
            I.1
• coefficient ordering
  – radix-4 pipeline 저전력 FFT 프로세서의 전력
    소모를 줄이기 위해 연산 순서를 변형
     • Coefficient ordering
        – 복소 곱셈기의 고정된 계수 입력에 대한 스위칭 동작 감축
     • 새로운 commutator 구조
        – 추가적인 dual-port RAM 사용
  – 16과 64 포인트 FFT에 대하여 각각 23% 및
    9%의 전력 감소 효과.
     • 보다 큰 FFT에서 효과가 감소



                                        54
연산 순차 변환에 의한 저전력 기법 –
I.2




                    55
        알고리즘 치환에 의한 저전력 – I.1

• 64-point FFT에 적용
 – 64 포인트 FFT를 알고리즘 변환에 의해 수식
   을 치환
 – 2개의 2차원 구조의 8 포인트 FFT로 분할한
   다.
   • 복소 곱셈은 shift-and-add 방식으로 구현한다.
• 전력 소모
 – in-house 0.25µ/m BiCMOS technology 공정
   의 20 MHz 1.8v 공급 전압 하에서 평균 동적
   전력 소모 41mW

                                           56
알고리즘 치환에 의한 저전력 –
I.2




             N 1
      Ar    B k WN
                       rk

             k 0


                    7
                         sl 7            
      As  8t    W64  Bl  8mW8sm W8lt
                   l 0    m 0          


                                        57
            신호 및 통계적 분석에 의한 저전력 –
            I.1
• 전력 소모의 비율
  – 전체 전력 소모의 절반 가량은 복소 곱셈기에서 이루어 진다.
• Butterfly 곱셈의 내용 분석
  – 계수 곱셈의 경우
     • generic stage에서 M개의 계수 중에서 총 0.25*M+3은 1
  – (1, 0)의 cosine과 sine에 대해서 clock gating 사용 가능
• Frequency division duplex 모뎀의 경우
  – ETSI 표준의 4.3125KHz tone spacing을 갖는, 4096 DMT
     • upstram carrier중 41%, donwstream중 26%, 그외 30%는 사용되지
       않는다.
  – ETSI 표준의 4.3125KHz tone spacing을 갖는, 1024 DMT
     • 각각 13%, 68%, 18% 이다.
  – 59~87%의 IFFT(up) 입력은 0이고 31~74%dml FFT(down)입력
    은 0이다.
  – clock gating 가능.
  – 초기 입력 단에서 적용 가능


                                                        58
            Clock Network Power Managements

• 50% of the total power
• FIR (massively pipelined circuit):
video processing: edge detection
voice-processing (data transmission like xDSL)
Telephony: 50% (70%/30%) idle,
      동시에 이야기하지 않음.
with every clock cycle, data are loaded into the
  working register banks, even if there are no
  data changes.


                                                   59
                  Wireless Interface Power-Saving
                  Ronny Krashinsky and Hari Balakrishnan
                  MIT Laboratory for Computer Science


• Sleep to save energy, periodically wake to check for pending
  data
   – PSM protocol: when to sleep and when to wake?
• A PSM-static protocol has a regular sleep/wake cycle

       Measurements of Enterasys Networks RoamAbout 802.11 NIC

              PSM off                                      PSM on


                                          power
   power




             750mW                                              100ms
                                                    50mW
                 time                                  time

                                                                        60
                                        Ronny Krashinsky and
                                        Hari Balakrishnan, MIT

         PSM off                       PSM on
Mobile             Serve      Mobile                Serve
           Access                           Access
Device             r          Device                r
             Point                            Point
  SYN                        0ms
                           AWAKE
  ACK
 DATA                       SLEEP

                           100ms
 tim
 e




                           200ms



                                                                 61
                The PSM-static Dilemma

Compromise between performance and energy
  If PSM-static is too coarse-grained, it harms
  performance by delaying network data

   If PSM-static is too fine-grained, it wastes
   energy by waking unnecessarily


Solution: dynamically adapt to network activity to
maintain performance while minimizing energy
  – Stay awake to avoid delaying very fast RTTs
  – Back off (listen to fewer beacons) while idle
                                                    62
          Why Hardware for Motion Estimation?

• Most Computationally demanding part
  of Video Encoding
• Example: CCIR 601 format
• 720 by 576 pixel
• 16 by 16 macro block (n = 16)
• 32 by 32 search area (p = 8)
• 25 Hz Frame rate (f frame = 25)
• 9 Giga Operations/Sec is needed for Full
  Search Block Matching Algorithm.
                                                63
            Why Reconguration in Motion Estimation?

• Adjusting the
  search area at
  frame-rate               Motion Vector Distributions

  according to the
  changing
  characteristics of
  video sequences
• Reducing Power
  Consumption by
  avoiding
  unnecessary
  computation
                                                         64
             Architecture for Motion Estimation

From P. Pirsch et al, VLSI Architectures for Video
Compression, Proc. Of IEEE, 1995




                                                     65
                     DIGLOG multiplier
Cmult (n)  253n 2 , Cadd (n)  214n, where n  world length in bits


A  2 j  AR , B  2 k  BR
A  B  (2  AR )(2  BR )  2  BR  2  AR  AR  BR
             j             k                j            k




                                1st Iter 2nd Iter 3rd Iter
  Worst-case error               -25%           -6%          -1.6%
  Prob. of Error<1% 10%                         70%          99.8%

  With an 8 by 8 multiplier, the exact result can be obtained at a maximum of
  seven iteration steps (worst case)

                                                                                66
                    Low Power CDMA Searcher

    CDMA 단말기에 사용하기위한 MSM
    (Mobile Station Modem) 칩의 Searcher Engine에 대한
     RTL수준 저전력 설계 구현. 동작 주파수 : 12.5MHz
    Data flow graph를 사용하여 rescheduling, pre-
    computation 및 strength reduction, Synchronous
    Accumulator를 이용한 저전력 설, area와 power를 각각
    최대 67.68%, 41.35% 감소 시킴. San Kim and Jun-
    Dong Cho, “Low Power CDMA Searcher”, CAD and
    VLSI Workshop, May. 1999.
•   Inki Hwang, San Kim and Jun-Dong Cho, “CDMA Searcher Co-Design”,


•   ASIC Workshop, Sep. 1999   .

                                                                       67
          CDMA Searcher




그림 1). 상세 블록도




                          68
            탐색자 (Searcher)

• IS-95 기반의 DS/CDMA 시스템에서 기지국에서 전송하
  는 파일롯 채널을 입력으로 하여, 초기 동기를 획득하는
  장치
• 탐색자 (Searcher)의 종류
  – 상관기를 사용하는 방식, 정합필터를 응용한 방식
  – 상관기를 사용한 직렬 탐색 및 Double Dwell 방식을 사용함.
• 국부 (단말기) PN 코드 발생기
  – 15개의 register를 사용하여 생성.
  – 생성 다항식




                                             69
Operation Flow

 1 기지국에서 전송하는 파일럿 채널을 단말기에서 발
   생된 PN부호열과 역확산 과정 수행.
 2 역확산된 결과를 동기 누적 횟수 Nc 만큼 누적한 후
   에너지 계산 과정을 거침 (제곱 연산).
 3 에너지 계산 결과값들은 첫번째 임계치와 비교하여
   초과할 경우 뒷 단에서 비동기 누적(Nn) 수행.
 4 그렇지 못할 경우 PN부호열을 한 칩 빨리 발생시키
   고 입력되는 신호에 대하여 앞의 과정을 반복.
 5 비동기 누적을 거친 결과값을 두번째 임계치와 비
   교.
 6 초과하면 탐색 과정을 종료하고, 그렇지 않을 경우
   PN부호열을 한 칩 빨리 발생시키고 앞의 과정을 반
   복.


                                 70
                  Pre-computation


◈ A comparator              ◈ Precomputation for
  example : Shrinivas         external idleness : M.
  Devadas, 1994               Alidina, 1994




                                                       71
Low Power Comparator




                       72
              Three Input ALU
              ( Ovadia Bat-Sheva, 1998 )




    MUL0             MUL1                     MUL0              MUL1

     P0               P1                       P0                P1




     ALU           ALU/ASU                             3IALU


     acc0            acc1                              acc1



      Two ALUs Structure                   Three Input ALU Structure

The three input ALU consumes much less power than an ALU
   and an ASU
A drawback of using a 3I-ALU is the added complexity in
   calculating the carry and overflow.
                                                                       73
                                            Carry Save Adder 및 Pre-computation 적용

RX I         TX I       RX Q         TX Q       RX I           TX Q       RX Q         -TX I   RX I         TX I   RX Q         TX Q       RX I           TX Q   RX Q         -TX I


       XOR                     XOR                      XOR                      XOR                  XOR                 XOR                      XOR                  XOR




                                                                                                                                동기 누적 단
                                                                                 동기 누적 단
                    +                                                 +

                                                                                                             CSA                                            CSA

                    +                                                 +


                                                                             에너 지 계 산단                                                                              에너 지 계 산단
                ()2                                               ()2                                          ()2                                           ()2




                                            >          max 값    선택                                                                     >          max 값    선택




                                            >          θ와
                                                        1      비교                                                                      >          θ와
                                                                                                                                                   1      비교




                                                                             비동 기 누 적단                                                                              비동 기 누 적단

                                            +                                                                                          +




                                            >          θ와
                                                        2      비교                                                                      >          θ와
                                                                                                                                                   2      비교




                                                                                                                                                                                      74
                    Rescheduled Data Flow Graph
RXI   TX RX     TX RX      TX RX      -TX I
        I Q      Q I        Q Q
  XOR         XOR        XOR         XOR      동기 누적단
                                                – Carry Save Adder
                                                  (or 3 Iinput ALU)
        CSA                    CSA                사용
           동기 누적단

        | |                    | |
                                              임계치 비교
                     > max 값 선택                 – Pre-computation 적
                     >   θ 와 비교
                                                  용
                          1
                     2
                    ()         에너지 계산단
                                              에너지 계산단
                     +         비동기 누적단          – Data Flow 순서를
                                                  변화하여 곱셈 과정
                     >   θ 와 비교
                          2                       을 줄임
                                                                      75
Image 압축




           76
              Link Adaptation Technique
              Adaptive Modulation and Coding


Throughput
                             16QAM, R=1/2
   Modulation/Coding
transition, 8PSK->16QAM            16QAM, R=1/4

                                    8PSK, R=1/4
Hull of AMC
                                   QPSK, R=1/4


                                            C/I
                                                  77

								
To top