Docstoc

Introduction to SHARC _13.03.11_

Document Sample
Introduction to SHARC _13.03.11_ Powered By Docstoc
					                                                                     ''




             Introduction to DSP processors

                                      Presented by:


3/13/2011                                                                     1




                                                                     ''




                                   Contents:
                     The modern processor’s classification;
                     The digital signal processing methods &
                    algorithms;
                    The D(igital) S(ignal) P(rocessing) algorithms
                    implementation
                     The SHARC processor - architecture; data
                    types & formats, C & Assembler;
                    Getting started.

        3/13/2011                                                         2




                                                                                  DSP lab
                                                                                    ''



              The modern processor’s classification.

                Today chips are distributed into three groups:




                              Chips with hardware
         ASIC’
         ASIC’s
                                   realization
      (Application
                               of data processing
        Specific
                                   algorithms             FPGA & EPLD
       Integrated
                               (microprocessors
        Circuits)
                              & microcontrollers)

3/13/2011                                                                                3




                                                                                    ''



              The modern processor’s classification.

              Microprocessors & microcontrollers



    General-
    General-purpose                  DSP                  Microcontrollers
    microprocessors.            microprocessors.
     This kind of processor                                     Very especial
                                    The processors are    processors are intended
        is intended for
                                 intended for Real-Time
                                               Real-       for embedded systems
      computer systems:
      PC, workstation &               Digital Signal           and in different
                                   Processing systems.      household devices.
    parallel supercomputer.




3/13/2011                                                                                4




                                                                                             DSP lab
                                                                                                     ''




                                           Review: Processor Classes
                  General Purpose - high performance         DSP processors
                   –   Pentiums, Alpha's, SPARC               – SHARC, BlackFin, TMS320C55x,
                   –   64-128 bit word size                     TMS320C67x, TMS320C64x
                   –   Used for general purpose software      – 16-32 bit word size
                   –   Heavy weight OS - UNIX, NT             – Single program
                   –   Multiply layers of cache memory        – Lightweight, often real-time OS
                   –   Workstations, PC's                     – Super Harvard Architecture Support,
                                                                MAC, Circular buffer, Dual-Port
                  Embedded processors and processor             RAM
                  cores
                                                              – Audio, Image and Video processing,
                   – ARM9, ARC, 486SX, Hitachi                  Coding and Decoding, Cellular Base
                     SH7000, NEC V800                           Station, Adaptive Filtering, Real Time
                   – 32 bit word size                           operations
                   – Single program                          Microcontrollers
                   – Lightweight, often real-time OS          –   PIC, AVR, HC11, ARM7, 8051,80251
                   – Code and Data memory cache, DSP          –   Extremely cost sensitive
                     support                                  –   Small word size - 8 bit common
                   – Cellular phones, consumer electronics    –   Highest volume processors by far
                     (e. g. CD players)
                  3/13/2011                                   –   Automobiles, toasters, thermostats, ...
                                                                                                       5




                                                                                                     ''




                                           Review: Processor Classes
Increasing Cost




                                           General Purpose - high performance



                                           DSP processors
                       Increasing Volume




                                           Embedded processors and processor cores



                                           Microcontrollers
                  3/13/2011                                                                               6




                                                                                                              DSP lab
                                                                        ''

                            The modern processor’s
                                classification.
                                                               GP – High
       Performance                                            performance




                                                  GP – Embedded
                                       DSP


                        Microcontrollers



                                           Cost
3/13/2011                                                                    7




                                                                        ''




                          Review: Processor Classes

                     General Purpose - high performance:
                       – Pentiums, Alpha's, SPARC
                         64-
                       – 64-128 bit word size
                       – Used for general purpose software
                       – Heavy weight OS - UNIX, NT
                       – Multiply layers of cache memory
                       – Workstations, PC's


3/13/2011                                                                    8




                                                                                 DSP lab
                                                             ''




                     Review: Processor Classes
   Embedded processors and processor cores:
            – ARM9, ARC, 486SX, Hitachi SH7000, NEC
              V800
            – 32 bit word size
            – Single program
                                 real-
            – Lightweight, often real-time OS
                                             build-
            – Code and Data memory cache, build-in Flash
            – DSP support
            – Cellular phones, consumer electronics (e. g.
              CD players)
3/13/2011                                                         9




                                                             ''




                     Review: Processor Classes
   DSP processors
            – SHARC, BlackFin, TMS320C55x,
              TMS320C67x, TMS320C64x
              16-
            – 16-32 bit word size
            – Single program
                                 real-
            – Lightweight, often real-time OS
            – Super Harvard Architecture Support, MAC,
                               Dual-
              Circular buffer, Dual-Port RAM
            – Audio, Image and Video processing, Coding and
              Decoding, Cellular Base Station, Adaptive
              Filtering, Real Time operations
3/13/2011                                                    10




                                                                      DSP lab
                                                             ''




                      Review: Processor Classes

       Microcontrollers:
            – PIC, AVR, HC11, ARM7, 8051,80251
            – Extremely cost sensitive
            – Small word size - 8 bit common
                                         built-
            – Poor SRAM data memory, built-in Flash code
              memory
            – Highest volume processors by far
            – Automobiles, toasters, thermostats, ...

3/13/2011                                                    11




                                                             ''


                    The digital signal processing
                      methods & algorithms.
     The analog signal processing example:
                                                  Rf
                                                       Cf

                                      x(t)   Ri
                                                  -         y(t)
                    fc            f
                                                  +
             R               
     y(t) = − f      1       
                             
     x(t)    Ri 1+ jwR C     
                
                       f f   
                              



3/13/2011                                                    12




                                                                   DSP lab
                                                                                                            ''


                               The digital signal processing
                                 methods & algorithms.
                       The digital signal processing system:
       Anti-aliasing
       Anti-                                 Digital filter or                                  Smoothing
           filter                    x’(n)       digital       y’(n)                              filter
x(t)                   x’(t)                                                            y’(t)               y(t)
                               A/D             transform             D/A
                                                      N
                                             y (n ) = ∑ C  k  x  n − k 
                                                              
                                                              
                                                                          
                                                                          
                                                     k =0 k



                                        |H(f)|



                                                                              fc




       3/13/2011                                                                                            13




                                                                                                            ''

                               The digital signal processing
                                 methods & algorithms.
                       Analog signal processing versus
                          Digital signal processing

       Analog signal processing: Digital signal processing:
            Cheaper;                                                           More accurate;
            More compact;                                                      More stable for different
            Power dissipation.                                                 environments.




       3/13/2011                                                                                            14




                                                                                                                   DSP lab
                                                                                             ''


                          The digital signal processing
                            methods & algorithms.
                        The basis concepts of DSP:
             Time sampling:                         Amplitude quantization:
    x(nT) – one sample at time T;                           x(nT) ~ x(n) ;

    T – sample rate (time);                                 Resolution:
    fD - sampling frequency:                                              R = 2N
                   1        1
            f D=     or T =                                 ε − quantization error:
                                                              Q
                   T        fD
    Niquest frequency:                                              max ε Q = 2 − ( N +1)
                                1
            f D≥ 2F or T ≤
                               2F                                           N– bit number.
              F – the highest frequency of
              signal.
3/13/2011                                                                                    15




                                                                                             ''


                          The digital signal processing
                            methods & algorithms.
                        The basis concepts of DSP:
                                     T − sample time;
                                     τ a − algorithm time.
                                                  T


                                           τa                     τ del
                                                                                1
                   τ del = T − τ a         τ a = τ op ⋅ N          τ op =
                                                                            f CPU CLK
                                 than τ op − operation time;
                           f CPU CLK - processor clk frequency;
                      N - number of operations in a lg orithm.
3/13/2011                                                                                    16




                                                                                                  DSP lab
                                                                  ''


              The D(igital) S(ignal) P(rocessing)
                 algorithms implementation
        The major tasks in DSP:
                                  Filtering)
            Filter design (Linear Filtering);
            Speech detection, image recognition (Spectral
            Analysis)
            Analysis);
                                             Timing-
            Image & Speech compression (Timing-Frequency
            Analysis)
            Analysis);
                                                     Filtering)
            Image & signal processing (Adaptation Filtering);
                                     Non-        processing)
            Coding, median filters (Non-Linear processing);
                                                    Processing)
            Interpolation & decimation (Multi Speed Processing).

3/13/2011                                                         17




                                                                  ''


              The D(igital) S(ignal) P(rocessing)
                 algorithms implementation
   The very usable DSP
       algorithms.                     FIR filter

                                       IIR filter

                                       FFT

                                       Polynomial equations
                                       solving

3/13/2011                                                         18




                                                                       DSP lab
                                                                                                        ''


             The D(igital) S(ignal) P(rocessing)
                algorithms implementation
                               FIR – filter
                      (Finite Impulse Response)
               x(n)

                                    b2                            b1                       b0
                               +                          +                        +
                                                      +
                                            Z-1               Σ          Z-1           Σ        y(n)



                                                          N −1
                                           y (n ) = ∑ bi x(n − i )
                                                          i =0

3/13/2011                                                                                               19




                                                                                                        ''


             The D(igital) S(ignal) P(rocessing)
                algorithms implementation
                                IIR – filter
                      (Infinite Impulse Response)
            x(n)

                               b2                         b1                       b0
                       +                        +                          +
                                            +
                           Σ        Z-1             Σ             Z-1          Σ                 y(n)
                        –                         –

                               a1                         a0



                                    N −1                          M −1
                        y (n ) = ∑ bi x(n − i ) − ∑ ak y (n − k )
                                    i =0                          k =1
3/13/2011                                                                                               20




                                                                                                             DSP lab
                                                                                ''


               The D(igital) S(ignal) P(rocessing)
                  algorithms implementation

                 Discrete Fourier Transform

                                               − 2 π ikn
                       1  N −1
            X ( k ) =   ⋅ ∑ x (n ) ⋅ e           N
                                                           , k = 0,..., N − 1
                       N  n=0

                                             −2πikn
                                 kn            N
                             W   N    =e
3/13/2011                                                                       21




                                                                                ''


               The D(igital) S(ignal) P(rocessing)
                  algorithms implementation
                        THE COMPLEX DFT
     Frequency Domain        ⇐⇐         DFT           ⇐⇐       Time Domain

                                  1 N −1
                          X (k ) = ∑ x(n) ⋅WN
                                            nk

                                  N n=0
  Frequency Domain         ⇒⇒         INVERSE DFT ⇒ ⇒ Time Domain

                                      N −1
                          x(n) = ∑ X (k ) ⋅WN nk
                                            −

                                      k =0
3/13/2011                                                                       22




                                                                                     DSP lab
                                                                                                     ''


                      The D(igital) S(ignal) P(rocessing)
                         algorithms implementation
                                                                              N −1
                                                                     1
            8-
        THE 8-POINT DFT:                                    X (k ) =
                                                                     N
                                                                              ∑ x(n) ⋅ W
                                                                              n=0
                                                                                               N
                                                                                                nk



     X(0) =      x(0)W80    + x(1)W80 + x(2)W80 + x(3)W80 + x(4)W80 + x(5)W80 + x(6)W80 + x(7)W80

     X(1) =      x(0)W80    + x(1)W81 + x(2)W82 + x(3)W83 + x(4)W84 + x(5)W85 + x(6)W86 + x(7)W87

     X(2) =      x(0)W80    + x(1)W82 + x(2)W84 + x(3)W86 + x(4)W88 + x(5)W810 + x(6)W812 + x(7)W814

     X(3) =      x(0)W80    + x(1)W83 + x(2)W86 + x(3)W89 + x(4)W812 + x(5)W815 + x(6)W818 + x(7)W821

     X(4) =      x(0)W80    + x(1)W84 + x(2)W88 + x(3)W812 + x(4)W816 + x(5)W820 + x(6)W824 + x(7)W828

     X(5) =      x(0)W80    + x(1)W85 + x(2)W810 + x(3)W815 + x(4)W820 + x(5)W825 + x(6)W830 + x(7)W835

     X(6) =      x(0)W80    + x(1)W86 + x(2)W812 + x(3)W818 + x(4)W824 + x(5)W830 + x(6)W836 + x(7)W842

     X(7) =      x(0)W80    + x(1)W87 + x(2)W814 + x(3)W821 + x(4)W828 + x(5)W835 + x(6)W842 + x(7)W849

     3/13/2011                                                                                       23




                                                                                                     ''


                      The D(igital) S(ignal) P(rocessing)
                         algorithms implementation
X(7) = x(0)W88+x(1)W87+x(2)W814+x(3)W821+x(4)W828+x(5)W835+x(6)W842+x(7)W849

                   Direct computation of the DFT is basically
              inefficient because it does not exploit the symmetry
              and periodicity properties of the phase factor WN. In
                       particular, these two properties are:

                           Symmetry property:              WN + N / 2 = −WN
                                                            k             k



                           Periodicity property:               k        k
                                                              WN + N = WN
  X(7) = x(0)W80+x(1)W87+x(2)W86+x(3)W85+x(4)W84+x(5)W83+x(6)W82+x(7)W81
     3/13/2011                                                                                       24




                                                                                                          DSP lab
                                                                                                                ''

                        The D(igital) S(ignal) P(rocessing)
X(7) = x(0)W80 + x(1)W87 + x(2)W86 + x(3)W813 + x(4)W84 + x(5)W811 + x(6)W810 + x(7)W817
 X(7) = x(0)W88 + x(1)W87 + x(2)W86 + x(3)W85 + x(4)W84 + x(5)W83 + x(6)W82 + x(7)W81
                           algorithms implementation
                (W88= W80, W813= W85, W811= W83, W810= W82, W817= W81)

x(0)                                                                                                            X(0)
                             WN0    X(0) + X(4)WN4           WN0                                          WN0
x(4)                                                                                                            X(1)
                       WN4
                                                              WN2
                                                                                                          WN1
x(2)                                                                                                            X(2)
                                                        WN4                   X(0) + X(4)WN4+
                             WN0    X(2) +   X(6)WN4                         X(2)WN6+ X(6)WN10            WN2
x(6)                                                                                                            X(3)
                       WN4                              WN6
                                                                                                          WN3
x(1)                                                                                                            X(4)
                                                                                              WN4
                             WN0    X(1) + X(5)WN4           WN0

x(5)                                                                                                            X(5)
                       WN4                                                                        WN5
                                                              WN2

x(3)                                                                                                            X(6)
                                                            WN4                                   WN6
                              WN0
                                                                        X(1) + X(5)WN4+
                                    X(3) +   X(7)WN4                   X(3)WN6+ X(7)WN10
x(7)                                                                                                            X(7)
                       WN4                                  WN6                                         WN7
       3/13/2011                                                                                                25




                                                                                                                ''
                   0
                              The D(igital) S(ignal) P(rocessing)
                               2         4
X(2) = x(0)W8 + x(1)W8 + x(2)W8 + x(3)W8 + x(4)W8 + x(5)W8 + x(6)W8 + x(7)W814
                                                        6                8

 X(2) = x(0)W80 + x(1)W82 + x(2)W84 + x(3)W86 + x(4)W80 + x(5)W82 + x(6)W84 + x(7)W86
                                                                                        10               12




                                 algorithms implementation
                     (W88= W80, W810= W82, W812= W84, W814= W86)

x(0)                                                                                                            X(0)
                             WN0    X(0) + X(4)WN0           WN0                                          WN0
x(4)
                                             000 – 000
                                                                                                                X(1)
                       WN4                   100 – 001
                                                     W            N
                                                                   2
                                                                                                          WN1
x(2)                                         010 –W010       4                                0
                                                                                                                X(2)
                                                        X(0) + X(4)W +
                                                            N                                N
                             WN0                   0
                                    X(2) + X(6)W – 011
                                             110  N    X(2)W + X(6)W               N
                                                                                    4
                                                                                                  N
                                                                                                   4      WN2
x(6)                                                                                                            X(3)
                       WN4                   001 –W100      N
                                                             6
                                                                                                          WN3
x(1)                                         101 – 101               W                             N
                                                                                                    4           X(4)
                             WN0                    0
                                    X(1) + X(5)W – 110
                                             011 W N             N
                                                                  0

x(5)                                                                                                            X(5)
                       WN4                   111 - 111
                                                     W             2
                                                                     W                             N
                                                                                                    5

                                                                  N
x(3)                                Bit reverse operations    4                                                 X(6)
                                                   W         N          X(1) + X(5)WN0+           WN6
                              WN0   X(3) +   X(7)WN0                   X(3)WN4+ X(7)WN4
x(7)                                                                                                            X(7)
                       WN4                                  WN6                                         WN7
       3/13/2011                                                                                                26




                                                                                                                       DSP lab
                                                                                              ''




              The DSP processor ‘s architecture
   Requirement for DSP                                   DSP processors features
        processors:
   1. High speed input data,                     1.      Various interface highspeed ports
        different interface devices;                     and timers
   2.   Input data wide dynamic                  2.      Parallel access memory
        range;                                           architecture;
   3.   ADD, MULT & SHIFT                        3.      Three mathematical units: ALU,
        hardware implementation.                         barrel Shifter and Multiplier with
        Parallel processing;                             fast MAC operation (MBR = MBR
   4.   Flexible processing                              + Rx * Ry);
        (possibility to “jump” from
                         jump”                   4.      Cycles, branches & interrupt fast
        one process to another);                         handling. Addressing special
   5.   Algorithm’s regularity
        Algorithm’                                       modes;
        (Operation “come back”);
                            back”                5.      Circular buffer.
3/13/2011                                                                                     27




                                                                                              ''




              The DSP processor ‘s architecture.
                   “Traditional” fon Neiman architecture
                    Traditional”

                                                 Address bus
                                 Memory
                                    data &                          CPU
                                  instruction         Data bus




                                   Harvard architecture

                 Program        PM address bus                   DM address bus    Data
                 Memory                               CPU                         Memory
                  instruction                                                     data only
                     only
                                 PM data bus                      DM data bus



3/13/2011                                                                                     28




                                                                                                   DSP lab
                                                                                      ''




            The DSP processor ‘s architecture.

                        Super Harvard architecture

              Program       PM address bus             DM address bus    Data
              Memory
                                             CPU
                                                                        Memory
              instruction
                             PM data bus Instruction    DM data bus       data only
                 only
                                             Cache




                                                                           I/O
                                                                        Controller
               This is SHARC DSP
                processor structure
                                                                          Data

3/13/2011                                                                             29




                                                                                      ''




            The DSP processor ‘s architecture.
             SHARC DSP processor structure




3/13/2011                                                                             30




                                                                                           DSP lab
                                                                                                                                                                                       ''




                       The DSP processor ‘s architecture.
                                 ADSP-
                             The ADSP-21160 hardware structure.
                           Core Processor     Dual-Ported SRAM
                                                          INSTRUCTION
                                               TIMER                                            Two Independent,




                                                                                                                                    BLOCK 0
                                                             CACHE
                                                              32 x 48-Bit                       Dual-Ported Memory                                                 JTAG




                                                                                                                                                    BLOCK 1
                                                                                                Blocks                                                                        7
                                                                                    PROCESSOR                              I/O                                    Test &
                                                                                      PORT                                PORT                                   Emulation
                                                                                  ADDR          DATA          DATA               ADDR

               DAG 1       DA G 2                       PROGRAM                          ADDR          DATA          DATA                     ADDR
              8 x 4 x 32   8 x 4 x 32                  SEQUENCER



                                                                                                                     IOD                      IOA             External Port
                                                                                                                     64                       18
                                        PM Address Bus 32
                                                                                                                                                               ADDR BUS           32
                                          DM Address Bus 32                                                                                                      MUX

                                                                                                                                                              MULTIPROCESSOR
                                                                                                                                                                INTERFACE
                 Bus                        PM Data Bus 16/32/40/48/64
               Connect                                                                                                                                         DATA BUS           64
                 (PX)                       DM Data Bus 32/40 64                                                                                                 MUX

                                                                                                                                                                HOST PORT



                                                                                                P      D                                         DMA                    4
                             DATA                                                                                E        I
                                                                                                M      M         P        O                   CONTROLLER
                           REGISTER                                                             D      D         D        D                                             6
                              FILE
                                                                                                                                              SERIAL PORTS
                                                                                                  IOP                                                                   6
                            16 x 40-Bit            BARREL                                                                                          (2)
            MULTIPLIER                                                      ALU                   REGISTERS
                                                   SHIFTER
                                                                                                                                               LINK PORTS              6x10
                                                                                                                                                    (6)

                                                                                                                     I/O Processor

3/13/2011                                                                                                                                                                              31




                                                                                                                                                                                       ''




                       The DSP processor ‘s architecture.
                                                       ADSP-21160 Features
                                    100 MHz - 600 MFLOPS- SIMD Core
                                                1024 point, complex FFT benchmark: 90 us
                                    4 Mbits on chip SRAM
                                    14 zero overhead DMA channels
                                    Sustained 700 Mbyte/sec over IOP bus
                                    Two 50 mbit/sec Synchronous Serial Ports
                                    Six 100 Mbyte/sec link ports
                                    64 bit synchronous external port
3/13/2011                           Cluster multiprocessing support        32




                                                                                                                                                                                            DSP lab
                                                                     ''


                   The digital signal processing
                     methods & algorithms.
        The methods for computer performance
                     measurement
       Peak (technical) performance of microprocessor:
    Maximum theoretical microprocessor’s speed in ideal conditions. It’s
                        microprocessor’                             It’
                                                                  some
     defined by number of calculating operation which had done in some
                                    time.


      Real (sustained) performance of microprocessor:
  Real microprocessor’s speed in real conditions. The real performance is
       microprocessor’
             calculated by execution of some popular programs.
                          (like FIR,IIR or FFT).

3/13/2011                                                            33




                                                                     ''




              The DSP processor ‘s architecture.
       Pipe-
       Pipe-Line command execution:
            Instruction fetching (a);
            Decoding (b);
            Execution (c).
                    n-1 operation
               a          b          c
                                n operation
                         a           b              c
                                              n+1 operation
                                    a              b          c

3/13/2011                                                            34




                                                                            DSP lab
                                                                          ''




                  The DSP processor ‘s architecture.
   DSP processors with fixed and flouting point.
       Flouting point advantages:
              Increases accuracy;         Fixed versus Flouting:
              Wide dynamic range;
                                            Fixed point arithmetic
              Doesn’t have problem with
              Doesn’
                                            operations are more simple
              data overflow;
                                            for hardware realization;
              Friendly for C compiler.

                                            Flouting point DSP
       Fixed point advantages:              processor has more data
                                            types and commands;
              Cheaper;
              Compact.
  3/13/2011                                                               35




                                                                          ''




                         Data types & formats.
Data types in DSP processors              Data format in DSP
  algorithms:                               processors :

   Integer (cycles, coefficients and         Byte – 8 bit;
  arrays numbers);
                                             Short word – 16 bit;
  Real (input & output data);
                                             Normal word – 32 bit;
  Complex (applications in frequency
                                             Instruction word – 48 bit;
  domain);
                                             Extended normal word – 40
  Logic (bitwise operation).                 bit;

                                             Long word – 64 bit.
  3/13/2011                                                               36




                                                                               DSP lab
                                                                ''




                       Data types & formats.
                                            max volue
Dynamic range:                   DynR =
                                          min volue ≠ 0

                                               max volue       
or in db:                  DynR (db ) = 20 log
                                               min volue ≠ 0
                                                                
                                                                
                                                               
maximum linearity error                  max value         
                             log 2                         
                                    max quantization error 
(b – data width):
                                                           

max. precision bits:                       2 −b

   3/13/2011                                                    37




                                                                ''




                       SHARC instruction set

                SHARC programming model.

                SHARC assembly language.

                SHARC data operations.

                SHARC flow of control.


   3/13/2011                                                    38




                                                                     DSP lab
                                                             ''




                SHARC programming model

                Register files:
                  R0-                F0-
                  R0-R15 (aliased as F0-F15 for floating point)


                Status registers.

                Loop registers.

                Data address generator registers.




                                                             ''




                 SHARC assembly language

            Algebraic notation terminated by semicolon:

              R1=DM(M0,I0), R2=PM(M8,I8); // comment
              label: R3=R1+R2;




   data memory access               program memory access


3/13/2011                                                    40




                                                                  DSP lab
                                                            ''




                     SHARC instruction set

Hardware realization of
 program functions:

     ALU (32 bits);
     Multiplier (32 bits);
     MAC (80 bits);
     Shifter (32 bits);
     Register file.



 3/13/2011                                                  41




                                                            ''




                  Sample ALU Instructions
             Rn = Rx + Ry                   Fn = Fx + Fy
             Rx = Rx – Ry                   Fn = Fx - Fy
             Rn = Rx + Ry + CI (Carry In)   Fn = ABS(Fx + Fy)
             Rn = Rx - Ry + CI - 1          Fn = ABS(Fx – Fy)
             Rn = (Rx + Ry)/2               Fn – (Fx + Fy)/2
             COMP(Rx, Ry)                   COMP(Fx, Fy)
             Rn = Rx + CI – 1               Fn = - Fx
             Rn = Rx + 1                    Fn= ABS Fx
             Rn = Rx – 1                    Fn= PASS Fx
             Rn = -Rx                       Fn = RND Fx
             Rn = ABS Rx                    Fn = SCALB Fx BY Ry
             Rn = PASS Rx                   Rn = MANT Fx
             Rn = Rx AND Ry                 Rn = LOGB Fx
             Rn = Rx OR Ry                  Rn = FIX Fx BY Ry
             Rn = NOT Rx                    Fn = FLOAT Rx BY Ry
             Rn = MIN(Rx, Ry)               Rn = TRUNC Fx
             Rn = MAX(Rx, Ry)               Fn = RECIPS Fx

 3/13/2011                                                  42




                                                                  DSP lab
                                                                  ''


           MAC instructions -- mainly INTEGER
                Multiply and Accumulate

               Rn = Rx * Ry                 MRF = Rx * Ry
               MRB = Rx * Ry                Rn = MRF + Rx * Ry
               Rn = MRB + Rx * Ry           MRF = MRF + Rx * Ry
               MRB = MRB + Rx * Ry          Rn = MRF – Rx * Ry
               Rn = MRB – Rx * Ry           MRF = MRF – Rx * Ry
               MRB = MRB – Rx * Ry          Rn = SAT MRF
               Rn = SAT MRB                 MRF = SAT MRF
               MRB = SAT MRB                Rn = RND MRF
               Rn = RND MRB                 MRF = RND MRF
               MRB = RND MRB                MR = Rn
               Rn = MR                      FLOAT – Fx * Fy




   3/13/2011                                                      43




          Shifter Instructions -- mainly integer                  ''




Rn = LSHIFT Rx BY Ry/<dataa8>
Rn = Rn OR LSHIFT Rx BY Ry/<data8>
Rn = ASHIFT Rx BY Ry/<data8>
Rn = ROT Rx BY Ry/<data8>
Rn = BCLR Rx BY Ry/<data8>
Rn = BSET Rx BY Ry/<data8>
Rn = BTGL Rx BY
Rx/<data8>
BTST Rx BY Ry/<data8>
Rn = Rn OR FDEP Rx BY Ry/<bit6>:<len6> (SE)
Rn = Rx BY Ry/<bit 6>:<len6> (SE)
Rn = EXP Rx (EX)           Rn = LEFTZ Rx
Rn = LEFT0 Rx              Rn = FPACK Fx
Fn = UNPACK Rx

             FPACK is a cast and means (32bit -> 16bit) Fx
            UNPACK is a cast and means (16bit -> 32bit) Rx
   3/13/2011 BUT WITH A LOT OF HIDDEN STUFF TOO!                  44




                                                                       DSP lab
                                                                         ''




                                        Flag operations

            ALU operations set:                Fixed-
                                               Fixed-point: -1 + 1 = 0:
               AZ (zero),
               AN (negative),
                                                  AZ = 1, AN = 0, AV = 0,
               AV (overflow),
                                                  AC = 1, AI = 0, AF = 0.
                   (fixed-
               AC (fixed-point carry),
               AI (floating-point invalid),
                  (floating-                   Fixed-
                                               Fixed-point: -2*3=-6:
               AF (last ALU operation).           MN = 1, MV = 0, MU = 1,
            Multiplier operations set:            MI = 0.
               MN (negative),
               MV (overflow),
               MU (flouting point overflow),   LSHIFT 0x7fffffff BY 3:
               MI (floating-point invalid).
                  (floating-                     SV=1, SZ=0, SS=0.
            Shifter operations set:
               SV (overflow),
               SZ (zero),
               SS (sign).




3/13/2011                                                                45




                                                                         ''




                     Multifunction computations

        Can issue some computations in parallel:

                       add-
                – dual add-subtract;

                  fixed-
                – fixed-point multiply/accumulate and add, subtract

                  floating-
                – floating-point multiply and ALU operation




                                                                              DSP lab
                                                                         ''


                 Example Multi-Function
                      Instruction
             f11=f1*f7, f3=f9+f14, f9=f9-f14, dm(i2,m0)=f13,
             f7=pm(i8,m8);

                 In a SingleCycle the SHARC Performs:
                   1(2) Multiply
                   1 (2) Addition
                   1 (2) Subtraction
                   1 (2) Memory Read
                   1 (2) Memory Write
                   2 Address Pointer Updates
                 Plus the I/O Processor Performs:
                   Active Serial Port Channels (2 Transmit, 2 Receive)
                   Active Link Ports (6)
                   Memory DMA
                   2 DMA Pointer Updates
3/13/2011                                                                47




                                                                         ''




                        SHARC load/store
                                          memory-
              Load/store architecture: no memory-direct
              operations.
              Two data address generators (DAGs):
                data memory.
                program memory;
              Must set up DAG registers to control
              loads/stores.

                                     bit-
            Provide indexed, modulo, bit-reverse indexing.


3/13/2011                                                                48




                                                                              DSP lab
                                                   ''




                        BASIC addressing


       Immediate value:
            r0 = DM(0x20000000);
       Direct load:
            r0 = DM(_a); // Loads contents of _a
       Direct store:
            DM(_a)= r0; // Stores R0 at _a




3/13/2011                                          49




                                                   ''




               The DSP processor ‘s architecture.
                         Circular buffer




3/13/2011                                          50




                                                        DSP lab
                                                                      ''




                            DAGs registers

            I0             M0               L0              B0
            I1             M1               L1              B1
            I2             M2               L2              B2
            I3             M3               L3              B3

            I4             M4               L4              B4
            I5             M5               L5              B5
            I6             M6               L6              B6
            I7             M7               L7              B7

3/13/2011                                                             51




                                                                      ''




                 SHARC assembly language
            I register holds start address.
            M register/immediate holds modifier value.
                 r0 = DM(I3,M3) // Load
                 DM(I2,1) = r1 // Store

            Circular buffer: I register is buffer start index, B is
            buffer base address.
            Allows transmission two values of data to/from
            memory per cycle:
                 f0 = DM(I0,M0), f1 = PM(I9,M8);

            Compiler allows to programmer to define which
            memory values are stored in.
3/13/2011                                                             52




                                                                           DSP lab
                                                              ''




                  SHARC assembly language


    M6 = 1;
    R0 = dm(I4, M6); //             post-modify
    //      means: R0 = dm(I4), and then I4 = I4 + M6
    //      However:
    R0      = dm(M6, I4); //    offset index only
    //      means: R0 = dm(M6 +I4), and still keeps I4 = I4




3/13/2011                                                     53




                                                              ''




                  SHARC assembly language
                   Post-
                   Post-incrementing and Offset
   B4 = 4000;
   L4 = 0; // set to 0
   I4 = 4002;
   M6 = 1;

   R0 = dm(M6, I4); // offset   index only
   R1 = dm(M6, I4); // offset   index only
      // means R0 = dm(4002 +   1) and R1 = dm(4002 + 1)
      // with I4 = 4002 still   unchanged at the end of the
      code

   R0 = dm(I4, M6); // post-modify
   R1 = dm(I4, M6); // post-modify
      // means R0 = dm(4002) and R1 = dm(4003)
      // with I4 = 4004 at the end of the code
3/13/2011                                                     54




                                                                   DSP lab
                                                                 ''




                  SHARC assembly language
                    Circular buffer implementation
            B4 = 4000;
            L4 = 3;
            I4 = 4002;
            M6 = 1;

            R0   = dm(M6, I4); // offset index only
            R1   = dm(M6, I4); // offset index only
            //   means R0 = dm(4002 + 1) and R1 = dm(4002 + 1)
            //   with I4 = 4002 still

            R0   = dm(I4, M6); // post-increment
            R1   = dm(I4, M6); // post-increment
            //   means R0 = dm(4002) with I4 = 4003,
            //   however R1 = dm(4000) {4003 – 3} with I4 = 4001

3/13/2011                                                        55




                                                                 ''




                    Example: C assignments

                   C:
                    x = (a + b) - c;

                   Assembler:
                    r0 = DM(_a) // Load a
                    r1 = DM(_b); // Load b
                    r3 = r0+r1;
                    r2 = DM(_c); // Load c
                    r3 = r3-r2;
                    DM(_x) = r3; // Store result in x




3/13/2011                                                        56




                                                                      DSP lab
                                                        ''




               Example: C assignments

              C:
               y = a*(b+c);

              Assembler:
               r1 = DM(_b); //   Load b
               r2 = DM(_c); //   Load c
               r2 = r1 + r2;
               r0 = DM(_a); //   Load a
               r2 = r2*r0;
               DM(_y) = r2; //   Store result in y



3/13/2011                                               57




                                                        ''




               Example: C assignments

              Shorter version using pointers:
            // Load b, c
              r2 = DM(I1,M5), r1 = PM(I8,M13);
            // load a in parallel with multiplication
              r0 = r2+r1, r12 = DM(I0,M5);
              r8 = r12*r0;
              DM(I0,M5)= r8; // Store in y




3/13/2011                                               58




                                                             DSP lab
                                                                 ''




                          Example: C assignments

                        C:
                        z = (a << 2) |   (b & 15);

                        Assembler:
                        r0 = DM(_a); // Load a
                        r0 = LSHIFT r0 by 2; // Left shift
                        r1 = DM(_b), r3 = 15;// Load immediate
                        r1 = r1 AND r3;
                        r0 = r1 OR r0;
                        DM(_z) = r0;



3/13/2011                                                        59




                                                                 ''




                                SHARC jump

            Unconditional flow of control change:
            JUMP label;


            Three addressing modes:
            – direct;

            – indirect;

              PC-
            – PC-relative.



3/13/2011                                                        60




                                                                      DSP lab
                                                         ''




                   Example: C if statement
                           Assembler:
   C:                    // if condition
       if (a > b)           r0 = DM(_a);
                            r1 = DM(_b);
         y = c + d;         COMP(r0,r1); // Compare
       else y = c - d;      IF GT JUMP label;
                         // False block
                            r0 = DM(_c);
                            r1 = DM(_d);
                            r1 = r0 - r1;
                            DM(_y)= r1;
                            JUMP other; // Skip false block
                         // True block
                         label: r0 = DM(_c);
                            r1 = DM(_d);
                            r1 = r0 + r1;
                            DM(_y) = r1;
                         other: // Code after if
   3/13/2011                                             61




                                                         ''




                  The best if implementation
  C:                       Assembler:
if (a > b)               // Load values
                           r1 = DM(_a), r2 = PM(_b);
  y = c + d;
                           r3 = DM(_c), r4 = PM(_d);
else y = c - d;          // Compute both sum and
                           difference
                           r12 = r3 + r4, r0 = r3 - r4;
                         // Choose which one to save
                           comp(r2,r1);
                           if GT r0 = r12;
                           dm(_y) = r0 // Write to y




   3/13/2011                                             62




                                                              DSP lab
                                                                ''




                         DO UNTIL loops

  DO UNTIL instruction provides efficient looping:
                  LCNTR = 30, DO label UNTIL LCE;
                  r0 = DM(I0,M0), f2 = PM(I8,M8);
                  r1 = r0 - r15;
  label:          f4 = f2 + f3;




Loop length (16 bit) Last instruction in loop Termination condition

         The SHARC processor allows up to six nested loops




                                                                ''




                       Example: FIR filter

                              C:
                               for (i=0, y=0; i<N; i++)
                                 y = y + c[i]*x[i];




  3/13/2011                                                     64




                                                                      DSP lab
                                                             ''




                          FIR filter assembler
            // setup
              I0 = _c; I8 = _x;// c[0] (DAG0), x[0] (DAG1)
              r12 = 0;         // f = 0;
              M0 = 1; M8 = 1; // Set up increments
            // Loop body
              LCNTR = N, DO loopend UNTIL LCE;
            // Use post-increment mode
              r1 = DM(I0,M0), r2 = PM(I8,M8);
              r8 = r1 * r2 (uui);
            loopend: r12 = r12 + r8;



3/13/2011                                                    65




                                                             ''




                  Example: C main + ASM function
             C:
              int dm c[4] = {1,2,3,4};
              int pm x[7] = {1,2,3,4,5,6,7};

              int dm y;

              extern   int fir(int dm *,int pm *);

              //main
              void main()
              {
                y = fir(c,x);
              }




3/13/2011                                                    66




                                                                  DSP lab
             Example: C main + ASM                         ''




                    function
                 Assembler:
              #include <asm_sprt.h>
              .SEGMENT/PM seg_pmco;
              .global _fir;
              .extern _c, _x, _y;
              _fir: entry;
              // setup
                 I0=_c; I8=_x; // c[0](DAG0),x[0](DAG1)
              // or I0 = r4, I8 = r8
                 r12 = 0;             // f = 0;
                 M0=1; M8=1; // Set up increments
              // Loop body
                 LCNTR = 4, DO loopend UNTIL LCE;
                 r1 = DM(I0,M0), r2 = PM(I8,M8);
                 r3 = r1 * r2 (ssi);
              loopend: r12 = r12 + r3;
              r0 = r12;       // or dm(_y)=r12;
                 exit;
              _fir.end:
              .endseg;
3/13/2011                                                  67




                                                           ''




            Example: Using MAC operation
                Assembler:
              #include <asm_sprt.h>
              .SEGMENT/PM seg_pmco;
              .global _fir;
              .extern _c, _x, _y;
              _fir:    entry;
               //setup
                  I0=_c; I8=_x; // c[0](DAG0),x[0](DAG1)
               //or I0 = r4, I8 = r8
                  r12 = 0;             // f = 0;
                  M0=1; M8=1; // Set up increments
               //Loop body
                  LCNTR = 4, DO loopend UNTIL LCE;
                  r1 = DM(I0,M0), r2 = PM(I8,M8);
              loopend: MRF = MRF + r1 * r2 (ssi);
                  r0 = MR0F;
                  exit;
              _fir.end:
              .endseg;
3/13/2011                                                  68




                                                                DSP lab
                                                              ''
                Example: C main + ASM function
                     (work with STACK)
                int a,b,c,d,e,f;

                extern int asm_proc( int a, int b, int c,
                  int d, int e );

                void main()
                {
                  a = 0xAAAAAA;
                  b = 0xBBBBBB;
                  c = 0xCCCCCC;
                  d = 0xDDDDDD;
                  e = 0xEEEEEE;
                  f = asm_proc(a,b,c,d,e);
                }
    3/13/2011                                                 69




                                                              ''

                Example: C main + ASM function
                     (work with STACK)
#include "asm_sprt.h"                  r0 = r4;
.SEGMENT/PM     seg_pmco;              r1 = r8;
.GLOBAL _asm_proc;
                                       r2 = r12;
_asm_proc:
                                       r3 = dm(i2,m6);
start:                              // C sp + 2 (fourth argument
// m7 = -1 (compiler definition)
                                       place)
// m6 = 1 (compiler definition)        r4 = dm(i2,m6);
   r15 = i6;                        // C sp + 3 (fifth argument
// i6 - save C sp (stack pointer)      place)
// i7 - asm sp (stack pointer)         r5 =0x555555;
   i2 = r15;                           r0 = r0 + r5;
   modify(i2,m6);                   // r0 = return()
                                    _asm_proc.end:
                                    .endseg;
                                    exit;


    3/13/2011                                                 70




                                                                   DSP lab
                                                                              ''




                  Special instructions to handle “C”
Cjump -- getting to “C” compatible subroutine
 – Processor architecture customized for C
 – Replaces 3 instructions for faster operations
 – Difficult to use in ENCM515
   • Will not be having assembly code calling other subroutines (95%) -- Why
     bother since slow!
RFRAME -- returning to “C” environment
 – Processor architecture customized for C
 – Part of MAGIC lines of code
 – See reference card

  3/13/2011                                                                   71




                                                                              ''




                      “C” interface to assembly code
       C/ASSEMBLY LANGUAGE INTERFACE
              Special Purpose Registers – usage predetermined by compiler
              I7 – C runtime stack pointer – next empty place – NOT last used
              I6 – C runtime frame pointer – start of frame of current function
              (cdefines.i -- I7 = CTOPstack, I6 = FP)
              L6/L7 – must remain as zero – controls stack memory
              characteristics
                                                  (-
              DAG1 registers – M5 (0), M6 (1), M7 (-1) – in “C” runtime header
              (cdefines.i -- zeroDM, plus1DM, minus1DM)
                                                     (-
              DAG2 registers – M13 (0), M14 (1), M15 (-1) – in “C” header
                   (cdefines.i – zeroPM, plus1PM, minus1PM)
              LENGTH registers MUST RETURN to 0 – don’t touch L6/L7
                                                  don’
  3/13/2011                                                                   72




                                                                                   DSP lab
                                                                       ''


              21k Volatile registers when using
                             “C”

       R4 (INPAR1), R8 (INPAR2), R12 (INPAR3), R0 (retvalue)
       Scratch or Volatile registers (cdefines.i definitions)
       Don’t keep useful values in them across subroutine calls
       Don’
       R0, R1, R2 (cdefines.i -- retvalue, scratchR1, scratchR2)
       R4, I4, M4 (cdefines.i -- INPAR1, scratchDMpt, scratchMDM)
       R8 (cdefines.i -- INPAR2)
       R12, I12, M12 (cdefines.i -- INPAR3, scratchPMpt, scratchMPM)




3/13/2011                                                              73




                                                                       ''




              Important programming reminders

            Registers for parameters transfer: r4,r8,r12,r0;

            Interrupt does not occur until 2 instructions after
            delayed branch (needs 2 NOPs);

            Some DAG register transfers are disallowed in
            assembler routine;

            It is preferable not use the following couples in all
            combinations: (M7,I6), (M14,I12), (M6,I5),
            (M5,I6).




                                                                            DSP lab
                              ''




            Getting started




3/13/2011                     75




                              ''




3/13/2011                     76




                                   DSP lab
            ''




3/13/2011   77




            ''




3/13/2011   78




                 DSP lab
            ''




3/13/2011   79




            ''




3/13/2011   80




                 DSP lab
            ''




3/13/2011   81




            ''




3/13/2011   82




                 DSP lab
            ''




3/13/2011   83




            ''




3/13/2011   84




                 DSP lab
            ''




3/13/2011   85




            ''




3/13/2011   86




                 DSP lab
            ''




3/13/2011   87




            ''




3/13/2011   88




                 DSP lab
            ''




3/13/2011   89




            ''




3/13/2011   90




                 DSP lab
            ''




3/13/2011   91




            ''




3/13/2011   92




                 DSP lab
            ''




3/13/2011   93




            ''




3/13/2011   94




                 DSP lab
            ''




3/13/2011   95




            ''




3/13/2011   96




                 DSP lab
            ''




3/13/2011   97




            ''




3/13/2011   98




                 DSP lab
            ''




3/13/2011    99




            ''




3/13/2011   100




                  DSP lab
                                                                  ''




3/13/2011                                                        101




                                                                  ''




                         Paths to examples


            C:\        Files\       Devices\ VisualDSP\ 211xx\
            C:\Program Files\Analog Devices\VisualDSP\211xx\examples
            C:\        Files\       Devices\ VisualDSP\ 21k\
            C:\Program Files\Analog Devices\VisualDSP\21k\examples
            or
            D:\        Files\       Devices\ VisualDSP\ 211xx\
            D:\Program Files\Analog Devices\VisualDSP\211xx\examples
            D:\        Files\       Devices\ VisualDSP\ 21k\
            D:\Program Files\Analog Devices\VisualDSP\21k\examples




3/13/2011                                                        102




                                                                       DSP lab
                                         ''




            Introduction to DSP processors



              The END

3/13/2011                               103




                                              DSP lab

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:9
posted:8/16/2011
language:English
pages:52