To DSP or Not to DSP .ppt

Document Sample
To DSP or Not to DSP .ppt Powered By Docstoc
					To DSP or Not to DSP?
Chad Erven
Words to Bits – Your
Options
   ASIC
   FPGA
   DSP
   Embedded RISC
   General Purpose Processor (GPP)
Why Go Programmable?

1.       Building the chip wrong
     –     Systems are increasingly too complex to efficiently be
           described by RTL designers
     –     Errors are orders of magnitudes more difficult to find in
           hardware than software
     –     Defects are extremely costly in hardware

2.       Building the wrong chip
     –     Only software is flexible enough to adapt during and after
           system design

              HARDWARE IS TO HARD!
So Software and
Processors, Right?
   Using processors has its drawbacks – especially in
    SOC designs

    – Never a perfect match between the application and the
      hardware

    – Performance costs, power penalties, wasted silicon will
      ALWAYS happen to some extent

    – Integrating multiple disparate cores with each other
Splitting the Difference –
ASIPs
   Ever wish you were the processor designer?

   Now you are! Write the exact instructions
    you need and nothing more.

   An Application Specific Integrate Processor
    (ASIP) offers the best of both worlds
Back Up!
   Isn’t hardware too much work?
    – Yes
   So doesn’t an ASIP defeat the
    purpose?
    – No
   Why not?
    – Extending a base processor is much easier
    – Readily amiable to automation
    – You only have to verify the instruction description, integration
      into the processor is guaranteed
Cool, Show Me How It
Works
    ASIPs derive their performance from
     three problems for a processor
    1.   Operations that are innately parallel must be expressed
         serially
            –   Somewhat solved by SIMD or MIMD processors


    2.   Memory space is addressed as one continuous space
            –   Somewhat solved by modifiers and/or pragmas (dm/pm)


    3.   Applications are complicated by their expression as
         operations on C types
            –   Somewhat alleviated by powerful instructions in hardware
Working with the Innate
Nature of the Algorithm
   Example – byte swap (common telecom task)
int *a, *b ;
  …
for(int i= 0        ; i < 4096 ; i++ )
{
  a[i] =(
      ((b[i]        &   0x000000ff)     <<      24)   |
      ((b[i]        &   0x0000ff00)     <<      8)    |
      ((b[i]        &   0x00ff0000)     >>      8)    |
      ((b[i]        &   0xff000000)     >>      24)   );
}
        Working with the Innate
        Nature of the Algorithm
           Write your own instruction:

        operation swap {in AR x, out AR y}{}
        {y = {x[7:0],x[15:8],x[23:16],x[31:24]};}


           Making the C Code:
        for(int i = 0 ; i < 4096 ; i++) a[i] = swap(b[i]) ;


Execution Cycles without TIE Extension   Execution Cycles With TIE Extension

              4,915,300                                1,638,524

                                  5X SPEED UP!!!
Instruction Fusion
                       reg5 (output)                                 reg5 (output)



                          op2                                           op2




          reg3 (input)          reg4 (input)                                  reg4 (input)


         reg3 (output)



                op1                                           op1




 reg1 (input)         reg2 (input)             reg1 (input)         reg2 (input)


     Unfused operation                               Fused operation
   Example
for(i=0 ; i<n ; i++ ) c[i] = (a[i] * b[i]) >> 4 ;

Assembly:

loop:
   l8ui       a12,a11,0
   l8ui       a13,a10,0
   addi       a11,a11,1
   addi       a10,a10,1
   mull6u     a8,a12,a13
   srai       a8,a8,4
   s8i        a8,a9,0
   addi       a9,a9,1
Example
   1          a11                                                      1
                           0         0                    a10

       addi         l8ui                       l8ui             addi



                           mull6u
                                         4
                            srai
                                         a9
                                                      1

                                   s8i        addi



                                              a9
Example

1          a11                                              1
                        0         0            a10

    addi         l8ui                  l8ui          addi
                            a9




                 fusion.mull6u.srai.s8i.addi



                             a9
Example

New assembly code:
loop:
  l8ui      a12,a11,0
  l8ui      a13,a10,0
  addi      a10,10,1
  addi      a11,a11,1
  fusion.mull6u.srai.s8i.addi a9,12,a13
         Benchmarking




EEMBC ConsumerMarks (performance). From [Rowen] .   EEMBC Summary (Performance/MHz). From [Rowen]

  • Hand coded assembly for the other processors
And I Haven’t Even
Gotten To…
   Sharing input operands

   Substituting variables with constants

   Replacing memory tables with logic

   Limiting immediate values to the minimum required width

   Placing operands in special registers

   Creating SIMD instructions

   Reducing the size of operand specifiers

   Custom input/output queues
Ok, Let Me Have It Dr.
Smith
(The rest of you can ask questions too)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:23
posted:12/3/2011
language:English
pages:17