SHARC

Document Sample
SHARC Powered By Docstoc
					           SHARC
  ‘S’uper ‘H’arvard ‘ARC’hitecture



Nagendra Doddapaneni
                          ER         HAR   VARD
                SUP



               Arc                   ect   ure
                         hit




                                                  1
                 Overview
•Harvard Architecture
•Super Harvard
Architecture
•TigerSHARC processor




                            2
               Outline
• Background
• Harvard Architecture
  − Why?
  − What?
• Modern CPU Chip Design
• Super Harvard Architecture
• TigerSHARC Processor
                               3
               Outline
• Background <-
• Harvard Architecture
  − Why?
  − What?
• Modern CPU Chip Design
• Super Harvard Architecture
• TigerSHARC Processor
                               4
                  Background
•von Neumann Architecture
  −Single storage for instructions and data




•Digital Signal Processors
   −Specialized microprocessor designed specifically for
   digital signal processing, generally in real time
                                                           5
               Outline
• Background
• Harvard Architecture
  − Why? <-
  − What?
• Modern CPU Chip Design
• Super Harvard Architecture
• TigerSHARC Processor
                               6
  Why Harvard Architecture ?

• von Neumann bottleneck
    (‘memory bound’)
• DSP applications
• In von Neumann architecture
  − Either reading an instruction
  − Or reading/writing from/to memory


                                        7
Harvard Architecture (cont…)




                               8
               Outline
• Background
• Harvard Architecture
  − Why?
  − What? <-
• Modern CPU Chip Design
• Super Harvard Architecture
• TigerSHARC Processor
                               9
What is Harvard Architecture ?
•Physically separate storage and signal pathways for
instruction and data
•Next instruction fetched, when executing current
instruction
•Program memory can be small and wide
•Data memory can be large and narrower




                                                       10
               Outline
• Background
• Harvard Architecture
  − Why?
  − What?
• Modern CPU Chip Design <-
• Super Harvard Architecture
• TigerSHARC Processor
                               11
     Modern CPU chip design
• Incorporate features from both architectures
• ‘On chip’ cache memory – divided into
  instruction cache and data cache.
  Harvard architecture used when CPU accesses
  cache memory.
• On a cache miss, ‘off chip’ main memory is
  accessed using von Neumann architecture.
  Main memory is not separated into data and
  instruction sections.
                                                 12
                Outline
• Background
• Harvard Architecture
  − Why?
  − What?
• Modern CPU Chip Design
• Super Harvard Architecture <-
• TigerSHARC Processor
                                  13
  Super Harvard Architecture
• Cache used to store instructions, leaving
  both instruction bus and data bus free to
  fetch operands


• Harvard Architecture + cache = Extended
  Harvard Architecture or Super Harvard
  Architecture

                                              14
               Outline
• Background
• Harvard Architecture
  − Why?
  − What?
• Modern CPU Chip Design
• Super Harvard Architecture
• TigerSHARC Processor <-
                               15
            TigerSHARC Processor
• Processor Architecture
• Instruction Parallelism and SIMD Operation
• Integer ALU
• Computational blocks
    −   X and Y Register File
    −   X and Y ALU
    −   Multiplier
    −   Shifter
    −   CLU
•   Program Sequencer
•   I J and K buses
•   DMA Controller
•   Applications

                                               16
            TigerSHARC Processor
• Processor Architecture <-
• Instruction Parallelism and SIMD Operation
• Integer ALU
• Computational blocks
    −   X and Y Register File
    −   X and Y ALU
    −   Multiplier
    −   Shifter
    −   CLU
•   Program Sequencer
•   I J and K buses
•   DMA Controller
•   Applications

                                               17
              TigerSHARC
         Processor Architecture

•3 128-bit data
buses
•2 IALU’s
•2 Computational
    Blocks
  − ALU ( Float and
  Integer )
  − SHIFTER
  − MULTIPLIER
  − CLU
                                  18
            TigerSHARC Processor
• Processor Architecture
• Instruction Parallelism and SIMD Operation <-
• Integer ALU
• Computational blocks
    −   X and Y Register File
    −   X and Y ALU
    −   Multiplier
    −   Shifter
    −   CLU
•   Program Sequencer
•   I J and K buses
•   DMA Controller
•   Applications

                                                  19
                 TigerSHARC
Instruction Parallelism and SIMD Operation

• Core can execute simultaneously one to four
  32-bit instructions encoded in single
  instruction line (VLIW).
• Can execute in parallel? Depends on….
  − Instruction line resources each requires
  − Source and Destination of registers used
• Supports SIMD operations through the use of
  both Computational Blocks in parallel.
• Each Computational Block can execute four 16
  -bit or eith 8-bit SIMD computations in
  parallel.
                                                 20
            TigerSHARC Processor
• Processor Architecture
• Instruction Parallelism and SIMD Operation
• Integer ALU <-
• Computational blocks
    −   X and Y Register File
    −   X and Y ALU
    −   Multiplier
    −   Shifter
    −   CLU
•   Program Sequencer
•   I J and K buses
•   DMA Controller
•   Applications

                                               21
                                 TigerSHARC
                                 Integer ALU
•31 32 bit general registers + 1 status
register + 8 dedicated registers for
circular buffers
• Performs integer ALU operations and
data addressing
• ALU instructions: ADD, SUB, ARS, LRS
(right shifts only), ROT (left and right),
AND NOT, NOT, OR, XOR, ABS, MIN,
MAX, CMP
• Status flags: zero (Z), negative (N),
overflow (V), carry (C)
• Instruction conditions: EQ, LT, LE,
NEQ, NLT, NLE
• Instruction options: unsigned (U),
circular buffer (CB), bit reverse (BR),
computed jump (CJMP)
• Address related operations: data
address generation, circular buffers,
bit reverse, UREG moves, DAB control.
                                               22
            TigerSHARC Processor
• Processor Architecture
• Instruction Parallelism and SIMD Operation
• Integer ALU
• Computational blocks
    −   X and Y Register File <-
    −   X and Y ALU
    −   Multiplier
    −   Shifter
    −   CLU
•   Program Sequencer
•   I J and K Buses
•   DMA Controller
•   Applications

                                               23
 TigerSHARC Computational Blocks
       X and Y Register File

•Register File Syntax
  −Each Block has 32x32
  bit Data registers
  −Each register can store
  4x8 bit, 2x16 bit or
  1x32 bit words.
  −Registers can be
  combined into dual or
  quad groups. These
  groups can store 8, 16,
  32, 40 or 64 bit words.
                                   24
   TigerSHARC Computational Blocks
         X and Y Register File
•Register File Syntax




                                     25
 Volatile registers in each block

• 24 Volatile Data registers in each block
  − XR0 – XR23
  − YR0 – YR23
• 2 ALU summation registers in each block
  − XPR0, XPR1, YPR0, YPR1
• 5 MAC accumulate registers in each block
  − XMR0 – XMR3, YMR0 – YMR3
  − XMR4, YMR4 – Overflow registers
                                             26
            TigerSHARC Processor
• Processor Architecture
• Instruction Parallelism and SIMD Operation
• Integer ALU
• Computational blocks
    −   X and Y Register File
    −   X and Y ALU <-
    −   Multiplier
    −   Shifter
    −   CLU
•   Program Sequencer
•   I J and K buses
•   DMA Controller
•   Applications

                                               27
  TigerSHARC
  X and Y ALU
• 2x64 bit input paths
• 2x64 bit output
  paths
• 8, 16, 32, or 64 bit
  addition/subtractio
  n - Fixed-point
• 32 or 64 bit logical
  operations - fixed-
  point
• 32 or 40 bit floating
  -point operations

                          28
     Sample ALU Instruction
• Example of 16 bit
  addition
• XYSR1:0 = R31:30 +
  R25:24
• Performs addition in
  X and Y Compute
  Blocks

                              29
            TigerSHARC Processor
• Processor Architecture
• Instruction Parallelism and SIMD Operation
• Integer ALU
• Computational blocks
    −   X and Y Register File
    −   X and Y ALU
    −   Multiplier <-
    −   Shifter
    −   CLU
•   Program Sequencer
•   I J and K buses
•   DMA Controller
•   Applications

                                               30
    TigerSHARC
     Multiplier
• Operates on fixed,
  floating and complex
  numbers.
• Fixed-Point numbers
    − 32x32 bit with 32 or 64
      bit results
    − 4 (16x16 bit) with 4x16 or
      4x32 bit results
• Floating-Point numbers
    − 32x32 bit with 32 bit
      result
    − 40x40 bit with 40 bit
      result
• Complex Numbers
    − 32x32 bit with 32 bit
      result
    − Fixed-point only
• Results stored in MR
  register
                                   31
                       TigerSHARC
                        Multiplier




XR0 = R1*R2;;
XR1:0 = R3*R5;;                         XFR0 = R1*R2;;
XMR1:0 = R3*R5;; //uses XMR4 overflow   XFR1:0 = R3:2*R5:4;; //40 bit multiply
XR2 = MR3:2, XMR3:2 = R3*R5;;                               //32 bit mantissa
XR3:2 = MR1:0, XMR1:0 = R3*R5;;

                                                                            32
             TigerSHARC Processor
• Processor Architecture
• Instruction Parallelism and SIMD Operation
• Integer ALU
• Computational blocks
     −   X and Y Register File
     −   X and Y ALU
     −   Multiplier
     −   Shifter <-
     −   CLU
•   Program Sequencer
•   I J and K data buses
•   DMA Controller
•   Applications

                                               33
  TigerSHARC
    Shifter
• Operates on one 64-bit,
  one or two 32-bit, two or
  four 16-bit, and four or
  eight 8-bit fixed-point
  operands
• Shifts and rotates bits
• manipulation operations,
  like bit set, clear, toggle
  and test
• Bit FIFO operations to
  support bit streams           34
           TigerSHARC Processor
• Processor Architecture
• Integer ALU
• Computational blocks
   −   X and Y Register File
   −   X and Y ALU
   −   Multiplier
   −   Shifter
   −   CLU <-
• Program Sequencer
• J and K data buses
• I bus – data bus

                                  35
            TigerSHARC CLU
• CLU instructions are designed to support
  different algorithms used for communications
  applications
• Algorithms supported are
  − Viterbi Decoding (minimal distance decoding
    algorithm)
  − Turbo-code Decoding (variant of Viterbi decoding)
  − De-spreading for Code Division Multiple Access
    (CDMA) systems (used for tasking a signal in wide
    Pseudo Noise spread bandwidth)

                                                        36
            TigerSHARC Processor
• Processor Architecture
• Instruction Parallelism and SIMD Operation
• Integer ALU
• Computational blocks
    −   X and Y Register File
    −   X and Y ALU
    −   Multiplier
    −   Shifter
    −   CLU
•   Program Sequencer <-
•   I J and K buses
•   DMA Controller
•   Applications

                                               37
              TigerSHARC
          Program Sequencer

• Supplies instruction addresses to memory
• IAB caches up to five fetched instruction lines
  waiting to execute
• It extracts an instruction line from IAB and
  distributes to appropriate core component for
  execution
• Determine flow control for instructions like
  JMP, CALL
• Reduce branch delays using branch prediction
  and BTB                                           38
            TigerSHARC Processor
• Processor Architecture
• Instruction Parallelism and SIMD Operation
• Integer ALU
• Computational blocks
    −   X and Y Register File
    −   X and Y ALU
    −   Multiplier
    −   Shifter
    −   CLU
•   Program Sequencer
•   I J and K buses <-
•   DMA Controller
•   Applications

                                               39
      TigerSHARC
architecture at a glance




                           40
TigerSHARC Buses
• DRAM divided into 6 blocks of 4Mbits
• 6 blocks connect to four 128-bit wide internal
  buses through a crossbar connection
• Internal bus architecture provides a total
  memory bandwidth of 32Gbytes/sec
• Core and I/O can access
  − twelve 32-bit data words
  − four 32-bit instructions
  per cycle

                                                   41
            TigerSHARC Processor
• Processor Architecture
• Instruction Parallelism and SIMD Operation
• Integer ALU
• Computational blocks
    −   X and Y Register File
    −   X and Y ALU
    −   Multiplier
    −   Shifter
    −   CLU
•   Program Sequencer
•   I J and K buses
•   DMA Controller <-
•   Applications

                                               42
            TigerSHARC
           DMA Controller

• On-chip, with 14 DMA channels
• Provide zero-overhead data transfers
• Operates independently and invisibly to
  the DSP’s core




                                            43
            TigerSHARC Processor
• Processor Architecture
• Instruction Parallelism and SIMD Operation
• Integer ALU
• Computational blocks
    −   X and Y Register File
    −   X and Y ALU
    −   Multiplier
    −   Shifter
    −   CLU
•   Program Sequencer
•   I J and K buses
•   DMA Controller
•   Applications <-

                                               44
TigerSHARC Applications




                          45
                           References

• ANALOG DEVICES
  − http://www.analog.com/processors/processors/tigersharc/index.html
  − http://www.analog.com/processors/processors/sharc/index.html
  − http://www.analog.com/processors/resources/teachingResources.html



• ECE-ADI-PROJECT HOME PAGE
  −   http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.html
  −   http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.htm




                                                                                             46
               Summary

• What is Harvard Architecture?
• What is Super Harvard Architecture?
• TigerSHARC processor architecture
• How TigerSHARC is ‘faster’ for targeted
  DSP applications?


                                            47
Questions?



 Thank You.




              48

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:9/28/2013
language:Latin
pages:48
xiaocuisanmin xiaocuisanmin
About