Fabric

Document Sample
Fabric Powered By Docstoc
					  An Architectural Space Exploration Tool for
  Domain Specific Reconfigurable Computing




Gayatri Mehta                    Alex Jones
Electrical Engineering             Electrical & Computer Engineering
University of North Texas          University of Pittsburgh
email: gayatri.mehta@unt.edu       email: akjones@ece.pitt.edu
                                                                       1
                          April 19, 2010
    Outline

 Motivation
 A domain-specific fabric
 Design space exploration case studies
 Automation of design space exploration
 Results
 Conclusions


                                           2
Motivation
   Designing a complex SoC design
    requires the evaluation of many
    potential architectural options

   Exploring the design space manually
    would be very time consuming and may
    not be feasible for complex designs
    One approach is…
   To develop design space exploration tools
    – Allow application developers to explore
      architectural tradeoffs efficiently and reach
      solutions quickly
    – Consider the applications of interest that will
      run on the device
Signal and image processing applications

   Core signal processing benchmarks from
    MediaBench benchmark suite

   Edge detection benchmarks from image
    processing domain




                                             5
Mapping of a benchmark onto the fabric
 temp1 = x * y;
 temp2 = z *temp1;
 if (temp2 < 255)
   result1 = temp2<<16 ;
 else
   result1 = temp1;
 result1 += (((temp1)>>16)-0xffff);




                                      6
  Domain specific fabric
The   fabric is comprised of
   Power-optimized  Arithmetic and Logic Units
   Configurable interconnect

                                           Characteristics
                                                  Combinational structure
                                                  ASIC-like power
                                                  Programmable




                                                                             7
Different architectural implementations of functional units

                   Inputs                   Output
                            Hardware




          Inputs                                     Output




                              ALU




                                                     Output
          Inputs



                            Optimized ALU
Interconnect Options




   4:1 cardinality     5:1 cardinality
    interconnect        interconnect


                                         9
Implementation of the fabric
    Implemented the fabric in parameterized VHDL
      – same base VHDL description for the fabric model
      – by varying different parameters, generate different architectural instances

                 Global Parameters
                 Data width                DW={8,16,32}
                 Fabric Parameters
                 Width of the fabric       W
                 Height of the fabric      H
                 Arithmetic and Logic Unit Parameters
                 Number of Operands        O={3}
                 Number of Operations      OP={8,10,16, 23}
                 Interconnect Parameters
                 Multiplexer Cardinality   C={2,4,8,16,32}

                                                                                      10
    Design space exploration case studies

 ALU granularity
 Multiplexer cardinality
 Dedicated pass gates
 Heterogeneous ALUs


    Conclusion: 10 ops per ALU (32-bit) with 8:1
    interconnect and 33% dedicated pass gates


                                               11
Automation of generation of Domain-Specific Fabric
                 Applications
                                                                  Final Domain-
                                                                     Specific
                                            FIM                       Fabric
                 Design Space                         Fabric
                  Exploration                        Generator
           •Width/height of the fabric
           •Number and Type of operators
                                           FIM
           •Data width
           •Dedicated pass gates
           •Interconnect cardinality




                     Fabric
                     Mapper                            Synthesis
                                                       Simulation
                                                  Power/Performance/Area
                                                         Analysis

                   Applications
                                                       Physical        12
                                                     Characteristics
 Fabric Instance Model (FIM)
<ftudefine                                          <rowpattern repeat="forever">
name="alu0“ noop="10111" useic="false">                         <row>
     <op code="00001"> + </op>                                  <ftupattern repeat="forever">
     <op code="00010"> - </op>                                  <FTU type="alu0">
     <op code="00011"> * </op>
     <op code="10011"> == </op>                                 <operand number="0">
     <op code="00111"> ^ </op>
     <op code="01110"> &gt; </op>                               <range left ="-3" right ="4"/>
     <op code="10000"> &gt;= </op>
     <op code="01111"> &lt; </op>                               </operand>
     <op code="10001"> &lt;= </op>
     <op code="10010"> != </op>                                 <operand number="1">
     <op code="00100"> &amp; </op>
     <op code="00101"> | </op>                                  <range left ="-3" right ="4"/>
     <op code="01001"> &lt;&lt; </op>
     <op code="01011"> &gt;&gt; </op>                           </operand>
     <op code="00000"> pass </op>
     <op code="10100" order="reverse"> pass </op>               <operand number="2">
     <op code="11111"> mux </op>
     <op code="01000"> ! </op>                                  <range left ="-3" right ="4"/>
</ftudefine>
                                                                </operand>
                                                                </FTU>
                                                                </ftupattern>
                                                                </row>
                                                    </rowpattern>
Energy Consumption
   Two factors that affect the energy
    consumption of the device
    – Increase in total path length of the
      mapped application onto the device
    – Number of ALUs used as pass gates




                                             14
Design Space Exploration
 Observation: 11 ops per ALU with 8:1 interconnect & 33%
dedicated pass gates is the best candidate
                                10


                                 9


                                 8
 Average path length increase




                                 7


                                 6


                                 5


                                 4


                                 3


                                 2


                                 1


                                 0
                                     33:1 No 32:1 No 17:1 No 16:1 No 9:1 No 8:1 No 5:1 No    8:1    8:1    8:1     8:1     8:1     8:1     8:1     8:1
                                      DPs-    DPs-    DPs-    DPs-    DPs-   DPs-   DPs-    25%    33%    50%     33%     33%     33%     33%     33%
                                      homo homo homo homo homo homo homo                    DPs-   DPs-   DPs-    DPs-    DPs-    DPs-    DPs-    DPs-
                                      ALUs    ALUs    ALUs    ALUs    ALUs   ALUs   ALUs    homo   homo   homo   14ops   13ops   12ops   11ops   10ops
                                                                                            ALUs   ALUs   ALUs                                           15
                                                                                      Architecture
Design Space Exploration




                           16
Comparison of manual and tool DSE results
                450


                400


                350


                300
  Energy (pJ)




                250


                200


                150


                100


                 50


                  0
                       adpcm   adpcm idct row   idct col   gsm     sobel   laplace   dwt   average
                      encoder decoder
                                                       Benchmark
                                                                                                     17
                                   10ops-8:1-33%DP (M)      11ops-8:1-33%DP (T)
Design Space Exploration
 Observation: 11 ops per ALU with 8:1 interconnect & 50%
dedicated pass gates is the best candidate
                                 10

                                 9

                                 8
  Average path length increase




                                 7

                                 6

                                 5

                                 4

                                 3

                                 2

                                 1

                                 0
                                      33:1 No 32:1 No 17:1 No 16:1 No 9:1 No 8:1 No 5:1 No     8:1     8:1     8:1     8:1     8:1     8:1     8:1     8:1     8:1
                                       DPs-    DPs-    DPs-    DPs-    DPs-   DPs-   DPs-     25%     33%     50%     66%      50%     50%     50%     50%     50%
                                       hom o hom o hom o hom o hom o hom o hom o              DPs-    DPs-    DPs-    DPs-     DPs-    DPs-    DPs-    DPs-    DPs-
                                       ALUs    ALUs    ALUs    ALUs    ALUs   ALUs   ALUs     hom o   hom o   hom o   hom o   14ops   13ops   12ops   11ops   10ops
                                                                                              ALUs    ALUs    ALUs    ALUs                                        18
                                                                                             Architecture
Design Space Exploration




                           19
Comparison of manual and tool DSE results
                 450


                 400


                 350


                 300
   Energy (pJ)




                 250


                 200


                 150


                 100


                  50


                   0
                        adpcm   adpcm idct row   idct col     gsm       sobel   laplace   dwt   average
                       encoder decoder
                                                            Benchmark
                                                                                                          20
                                    10ops-8:1-33%DP (M)        11ops-8:1-50%DP (T)
Reconfigurable Fabric Comparison
Observation: Optimized fabric is 3X of ASIC energy
                1000000



                 100000



                  10000
  Energy (pJ)




                   1000



                   100



                    10



                     1
                          adpcm     adpcm idct row idct col     gsm          sobel   laplace average
                           enc       dec
                                                        Benchmark

                   Xscale (0.18um)              Virtex-2P (0.13um)            Fabric (initial)(0.16um)
                   Fabric (optimized)(0.16um)   Fabric (optimized)(0.13um)    ASIC (0.16um)              21
                   ASIC (0.13 um)
Thank You

       Final thoughts and discussion



                                       22

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:12
posted:9/10/2011
language:English
pages:22