ASIP Design Space Exploration: Survey and Issues by ijcsiseditor


More Info
									                                                               (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                         Vol. 9, No.4, 2011

   ASIP Design Space Exploration: Survey and Issues
                       Deepak Gour                                                                 Dr. M. K. Jain
            Assistant Professor – Dept. of CSE                                           Assistant Professor – Dept. of CS
            Sir Padampat Singhania University                                            Mohan Lal Sukhadia University
                      Udaipur, India                                                               Udaipur, India

Abstract— An Application Specific Instruction set Processor                                            GPP                  ASIP                 ASIC
(ASIP) is a processor designed for a particular application or for             Performance         Low              High                    Very High
a set of applications. An ASIP exploits special characteristics of              Flexibility        Excellent        Good                    Poor
application(s) to meet the desired performance, cost and power                HW design effort     Nil              Large                   Very Large
requirements. The main steps involved in ASIP Design                          SW design effort     Small            Large                   Nil
Methodology include application analysis, design space                            Power            Large            Medium                  Small
exploration, instruction set generation, code synthesis and                       Reuse            Excellent        Good                    Poor
hardware synthesis. This paper is an attempt to survey the design                Markets           Very large       Relatively large        Small
space exploration of ASIP. Important contributions made by
various researchers are also highlighted. A list of explored design                TABLE I.       COMPARISON AMONG GPP, ASIP AND ASIC
space parameters is included in this paper.
                                                                                                   II.    RELATED WORK
   Keywords- Application Specific Instruction set Processor
(ASIP), Design Space Exploration (DSE), Performance estimation,                 This section highlights the major work carried out in the
Simulator based approach.                                                   ASIP design space explorations. The main contributors are
                                                                            Gloria et al [2] who defined some major requirements of the
                                                                            design of application specific architectures. Liem et al [1]
                       I.    INTRODUCTION                                   described the differentiation between the ASIC, ASIP and
    An Application Specific Instruction set Processor (ASIP) is             GPP. MK Jain et al [3, 4, 5, 6, 7] had surveyed ASIP design
a processor designed for a particular application or for a set of           methodologies and identified various steps involved in it. Since
applications. An ASIP exploits special characteristics of                   this survey was published in early 2001 and significant
application(s) to meet the desired performance, cost and power              contributions are made by various researchers in due course of
requirements. According to Liem et al [1], ASIPs are a balance              time. Sato et al [8] has developed an application program
between two extremes: ASICs (Application Specific Integrated                analyzer which is very useful in the application analysis. The
Circuit) and GPP (General Programmable Processors). Since                   methodology suggested by Gupta et al [9] takes the application
an ASIC is specially designed for one behavior, it is difficult to          as well as the processor architecture as inputs. Using SUIF [10]
make any changes at a later stage. In such a situation, the                 as an intermediate format a number of application parameter is
ASIPs offer the required flexibility at lower cost than GPP.                extracted.
   ASIP can be easily used in many embedded systems such                        Apart from that Swarnalatha Radhakrishnan et al [11]
as automotive control, household appliances, cellular phones,               explores the DSE on heterogeneous multiple pipelines. Ascia et
avionics etc. GPP are designed for general use. Many times it               al [12] explores the DSE using genetic algorithms on
happens that specific applications need a certain mix which                 parameterized SOC platforms. Kwon et al [13] explores cache
does not match the GPP resource mix. If we plan to design an                misses and memory architecture issues. Lilian Gogniat et al
ASIC to meet the given performance, power and area                          [14] explores DSE using special tool called Design Trotter.
constraints for the given application, deign becomes rigid. In              Kyeong et al [15] explore the DSE on issues related to Bus
the ASIP design, it is important to search for a processor                  Architecture. Kim et al [16] explores the DSE on the issues of
architecture that matches target application. To achieve this               Area, Critical path delays. Kunzil et al [17] explores the DSE
goal, it is essential to estimate design quality of various                 on the issues like # of cache lines, block size and replacement
candidate architecture in terms of area, performance, and power             strategy. Catania et al [18] explores the DSE on the issues
consumption. Table 1 shows the comparison among GPP,                        related on Register File size (GPR, FPR, PR, CR, BTR) and L1
ASIP and ASIC.                                                              and L2 caches. Pasricha et al [19] explores the DSE on the
                                                                            issues related to the Bus architecture.

                                                                                           III.   ASIP DESIGN METHODOLOGY
                                                                                Gloria et al [2] defined some main requirements of the
                                                                            design of application-specific architectures. Important among
                                                                            these are as follows:

                                                                                                         ISSN 1947-5500
                                                              (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                        Vol. 9, No.4, 2011
                                                                           defined, keeping in view the parameters extracted during
   •    Design starts with the application behavior.                       application analysis and the input constraints. Architecture is
                                                                           defined using some standard Architecture Definition Language
   •    Evaluate several architectural options.                            (ADL) as EXPRESSION [20] and LISA [21, 22, 23].
   •    Identify hardware functionalities to speed up the
        application.                                                       C. Instruction Set Generation
                                                                               Instruction set is to be generated for that particular
   •    Introduce hardware resources for frequently used
                                                                           application and for the architecture selected. This instruction
        operations only if it can be supported during
                                                                           set is used during the code synthesis and hardware synthesis
    ASIP fits in between these two and provides flexibility at
lower cost than general programmable processors. According                 D. Code Synthesis
to MK Jain et al [3, 4, 5, 6, 7] design of ASIP can be typically              Compiler generator or retargetable code generator is used to
divided in five steps which is shown in Figure 1:                          synthesize code for the particular application or for a set of
   •    Application Analysis                                               application.

   •    Architecture design space Exploration.                             E. Hardware Synthesis
   •    Instruction-set generation                                            In this step the hardware is synthesized using the ASIP
                                                                           architecture template and instruction set architecture starting
   •    Code synthesis                                                     from a description in VHDL/VERILOG using standard tools.
   •    Hardware synthesis
                                                                                           IV.    DESIGN SPACE EXPLORATION
                                                                               Architecture exploration starts with the application analysis.
                                                                           We need to input the parameters of application analysis along
                                                                           with the identified architecture design space to the process
                                                                           block which is responsible for performance estimation. Then
                                                                           we need to do the performance estimation for the inputted
                                                                           architecture along with the search control and then the
                                                                           architecture will be selected. Figure 2 explains the procedure
                                                                           of architecture explorer.

         Figure 1. Flow Diagram of ASIP design Methodology
                                                                                     Figure 2. Block Diagram of an Architecture Explorer
A. Application Analysis
                                                                               Performance estimation which drives the design space
    ASIP design starts with analysis of application, analysis of           exploration is done by simulator based approach (e.g. Gloria et
test-data and design constraints. An application written in any            al [2], Kienhuis et al [24], Imai , Binh et al [25]). The
high level language is analyzed both statically and dynamically            architectural design space is to be explored usually defined in
which is then stored in some suitable intermediate format,                 terms of a parameterized architectural model.
which is then used in the subsequent steps.
                                                                              The main focus points are as follows:
B. Architecture Design Space Exploration
                                                                              •    The parameterized architectural model suggested by all
   It involves identifying the broad architectural features of the                 the researchers includes the number of functional units
ASIP. First of all, the architectural space to be explored is                      of different types.

                                                                                                       ISSN 1947-5500
                                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                   Vol. 9, No.4, 2011
   •      Architectures considered are different researchers also                 component selection and mapping of the function blocks to the
          differing in terms of the instruction level parallelism                 processing components and 2) Communication DSE loop for
          they support.                                                           communication architecture optimization.
   •      Most of these approaches consider only flat memory.                         Lilian Gogniat, Phillipe et al [14] explores DSE using
                                                                                  special tool called Design Trotter. This tool allow for the
    The most popular approach for ASIP design space                               exploration of their design space to choose the best architecture
exploration is simulator based approach. In the simulator based                   characteristics. They proposed an original approach based on a
approach, a simulation model of architecture based on the                         high-level representation of the application and on a
selected features is generated and the application is simulated                   hierarchical functional model for the architecture. This
on this model to compute the performance. Figure 3 explains                       approach targets fine-grain, coarse-grain, and heterogeneous
the functioning of simulator based approach.                                      architectures.
                                                                                      Kyeong, Mooney et al [15] explore the DSE on issues
                                                                                  related to Bus Architecture where they propose Bus Synthesis
                                                                                  tool to generate the five different bus systems. This paper
                                                                                  presents a methodology to generate a custom bus system for a
                                                                                  multiprocessor System-on-a-Chip (SoC). Our bus synthesis
                                                                                  tool (BusSyn) uses this methodology to generate five different
                                                                                  bus systems as examples: Bi-FIFO Bus Architecture (BFBA),
                                                                                  Global Bus Architecture Version I (GBAVI), Global Bus
                                                                                  Architecture Version III (GBAVIII), Hybrid bus architecture
       Figure 3. Architecture exploring using simulator based approach            (Hybrid) and Split Bus Architecture (SplitBA). They verified
                                                                                  and evaluate the performance of each bus system in the context
                                                                                  of two applications: an Orthogonal Frequency Division
 V. PARAMETERS EXPLORED IN DESIGN SPACE EXPLORATION                               Multiplexing (OFDM) wireless transmitter and an MPEG2
   In the recent past the major work carried out in Design                        decoder. This methodology gives the designer a great benefit in
Space Exploration is by using Simulator based approach. The                       fast design space exploration of bus architectures across a
major contributions are as follows:                                               variety of performance impacting factors such as bus types,
                                                                                  processor types and software programming style.
    Swarnalatha Radhakrishnan et al [11] explores the DSE on
heterogeneous multiple pipelines. She proposed Application                            Kim, Keimh, Choi et al [16] explores the DSE on the issues
Speci_c Instruction Set Processors with heterogeneous multiple                    of Area, Critical path delays. The optimization is based on
pipelines to efficiently exploit the available parallelism at                     pipelining and sharing of functional resources in the PE of the
instruction level. We have developed a design system based on                     array. They proposed efficient design space exploration flow
the Thumb processor architecture. Given an application                            with two optimization techniques. The optimization is based on
specified in C language, the design system can generate a                         pipelining and sharing of functional resources in the processing
processor with a number of pipelines specifically suitable to the                 elements of the array. For fast architecture exploration,
application, and the parallel code associated with the processor.                 optimization techniques are applied to SystemC model. They
Each pipeline in such a processor is customized, and                              estimated entire performance at early stage by transaction level
implements its own special instruction set so that the                            simulation and this feature enables early detection of optimal
instructions can be executed in parallel with low hardware                        architecture specification. With proposed design space
overhead.                                                                         exploration, one can effectively reduced the hardware cost
                                                                                  without any performance degradation for a specific application
    Ascia, Vincenz Catania, Palesi et al [12] explores the DSE                    domain.
using genetic algorithms on parameterized SOC platforms. The
basic idea is to avoid designing a chip from scratch. They                            Kunzil, Thiele et al [17] explores the DSE on the issues like
proposed an approach based on genetic algorithms for                              # of cache lines, block size and replacement strategy. A generic
exploring the design space of parameterized system-on-a-chip                      approach is described based on multi-objective decision
(SOC) platforms. The strategy focuses on exploration of the                       making, black-box optimization and randomized search
architectural parameters of the processor, memory subsystem                       strategies. The interface between problem-specific and generic
and bus, making up the hardware kernel of a parameterized                         parts of the exploration framework is made explicit by defining
SOC platform for the design of embedded systems with strict                       an interface called PISA. This specification and
power consumption and performance constraints. The approach                       implementation interface, and the availability of a wide range
has been validated on two different parameterized                                 of randomized multi-objective search methods, makes the
architectures: one based on a RISC processor and another                          proposed framework accessible to a wide range of exploration
based on a parameterized very long instruction word                               problems. It resolves the problem that existing optimization
architecture.                                                                     methods cannot be coupled easily to the problem specific part
                                                                                  of a design exploration tool.
  Kwon, Lee, Kim, Ha et al [13] explores cache misses and
memory arschitecture issues using Y-Chart approach to DSE.                            Ascia, Catania et al [18] explores the DSE on the issues
Y chart consists of two loops as 1) Co-synthesis loop for                         related on Register File size (GPR, FPR, PR, CR, BTR) and L1

                                                                                                            ISSN 1947-5500
                                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                          Vol. 9, No.4, 2011
and L2 caches. They presented EPIC-Explorer, a framework                    26      Memory Mapping [Kwon, Lee, Kim, Ha et al [13]]
for the simulation of a parameterized SOC platform based on a               27      Pipelined function [Swarnalatha Radhakrishnan et al [11]]
VLIW processor. The main use the platform has been designed                 28      Latency of functional units [Kim, Keimh, Choi et al [16]]
for is to provide a powerful, flexible simulation and estimation            29      Number of operational slots [Kim, Keimh, Choi et al [16]]
framework that can be used to develop design space
exploration algorithms. The high degree of parameterization of                    TABLE II.      PARAMETERS OF DESIGN SPACE EXPLORATION USING
the platform generates an enormous configuration space,                                            SIMULATOR BASED APPROACH
exhaustive exploration of which would be computationally
unfeasible, and so it is an excellent testbed for comparison                                            VI.    CONCLUSION
between different design space exploration algorithms.
                                                                                In this paper, we have surveyed this art of new processor
    Pasricha, Dutta et al [19] explores the DSE on the issues               technology. This paper laid down all the issues related to the
related to the Bus architecture. They proposed an automated                 design space exploration in detail using the simulator based
application specific co-synthesis framework for memory and                  approach which is one of the popular approaches. Paper also
communication architectures (COSMECA) in MPSoC designs.                     highlighted the important contributions made by various
The primary objective is to design a communication                          researchers with the list of explored design space parameters.
architecture having the least number of busses, which satisfies
performance and memory area constraints, while the secondary                   This paper also list down two major issues of the design
objective is to reduce the memory area cost.                                space exploration as the unexplored design space parameters
                                                                            and the inability to map the large design space using the
   Table 2 list down the parameters explored using simulator                simulator based approach. There is a strong need felt in this
based approach.                                                             survey is to use some another approach rather than simulator
Sr.    Explored Design Space Exploration Parameters using Simulator         based approach for the effective design space exploration.
No.                          based approach
1     Instruction cache size [Kunzil, Thiele et al [17]]                                               VII. REFERENCES
2     Data cache size [Kunzil, Thiele et al [17]]                           [1]  Liem, C.; May, T.; Paulin, P., “Instruction-set matching and selection for
3     Processor to address bus encoding [Pasricha, Dutta et al                   DSP and ASIP code generation.”, In Proc. EURODAC-94, 28 Feb.-3
      [19]]                                                                      March 1994, pp. 31-37.
4     Processor to data bus width [Pasricha, Dutta et al [19]]              [2] Gloria A. D.; Faraboschi, P., “An evaluation system for application
                                                                                 specific architectures.”, In Proc. Micro-23, 27-29 Nov. 1990, pp. 80-89.
5     Processor to data bus encoding [Pasricha, Dutta et al [19]]
                                                                            [3] M.K. Jain, M. Balakrishnan, and A. Kumar, “ASIP Design
6     Processor to address bus width [Pasricha, Dutta et al [19]]                Methodologies: Survey and Issues”, In Proceedings of the IEEE / ACM
7     Cache to memory address bus width [Pasricha,                               International Conference on VLSI Design. (VLSI 2001), pages 76–81,
      Dutta et al [19]]                                                          January 2001.
8     Cache to memory address bus encoding [Pasricha, Dutta                 [4] M.K. Jain, L. Wehmeyer, S. Steinke, P. Marwedel, and M. Balakrishnan,
                                                                                 “Evaluating Register File Size in ASIP Design”, In Proceedings of the
      et al [19]]                                                                Ninth International Symposium on Hardware/ Software Co-
9     Cache to memory data bus width [Pasricha, Dutta et                         design,(CODES 2001), pages 109–114, April 2001.
      al [19]]                                                              [5] Manoj Kumar Jain, M. Balakrishnan and Anshul Kumar, “An Efficient
10    Cache to memory data bus encoding [Pasricha, Dutta                         Technique for Exploring Register File Size in ASIP Design”, In
                                                                                 Proceedings of the Fifthth International Conference on Compilers,
      et al [19]]                                                                Architecture and Synthesis for Embedded Systems, (CASES 2002).
11    GPR (General Purpose Register) File size [Ascia,                      [6] Manoj Kumar Jain, Lars Wehmeyer, Peter Marwedel, M. Balakrishnan,
      Catania et al [18]]                                                        “Register File Synthesis in ASIP Design”, Technical Report #746,
12    FPR (Floating Point Register) File size [Ascia, Catania                    07.12.2000, Lehrstuhl Informatik XII, University of Dortmund,
      et al [18]]
                                                                            [7] Manoj Kumar Jain, M. Balakrishnan and Anshul Kumar, “Exploring
13    PR (Predicate Register) File size [Ascia, Catania et al [18]]              Storage Organization in ASIP Synthesis”, In Digital System Design,
14    CR (Control Register) File size [Ascia, Catania et al [18]]                2003.        Proceedings.          Euromicro        Symposium           on
15    BR (Branch Register) File size [Ascia, Catania et al [18]]                 Volume , Issue , 1-6 Sept. 2003 Page(s): 120 – 127.
16    # of IU (Integer Units) [Kim, Keimh, Choi et al [16]]                 [8] J. Sato, M. Imai, T. Hakata, A. Y. Alomary, N. Hikichi,, An integrated
                                                                                 design environment for application specific integrated processor, In
17    # of FPU (Floating Point Units) [Kim, Keimh, Choi et                       Proc. ICCD-91, pages 414-417, October 1991.
      al [16]]                                                              [9] T. V. K. Gupta, P. Sharma, M. Balakrishnan, S. Malik,, Processor
18    # of MU (Memory Units) [Kim, Keimh, Choi et al [16]]                       evaluation in an embedded systems design environment, In Proc. VLSI
19    # of cache lines [Kunzil, Thiele et al [17]]                               Design 2000, pages 98-103, January 2000.
20    Block size [Kunzil, Thiele et al [17]]                                [10] SUIF Homepage.
21    Associativity [Kunzil, Thiele et al [17]]                             [11] Radhakrishnan Swarnalatha: “Customization of application specific
                                                                                 heterogeneous multi pipeline processors”, In Proc. EDAA 2006, pp. 746
22    Replacement strategy (LRU / FIFO) [Kunzil, Thiele                          – 751.
      et al [17]]                                                           [12] Giuseppe Ascia, Vincenzo Catania, and Maurizio Palesi, “A GA-Based
23    Bus speed [Kunzil, Thiele et al [15]]                                      Design Space Exploration Framework for Parameterized System-On-A-
24    Arbitration Speed [Kunzil, Thiele et al [15]]                              Chip Platforms”, In IEEE TRANSACTIONS ON EVOLUTIONARY
                                                                                 COMPUTATION, VOL. 8, NO. 4, AUGUST 2004, pp. 329 – 346.
25    OO Buffer size [Kwon, Lee, Kim, Ha et al [13]]

                                                                                                            ISSN 1947-5500
                                                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                 Vol. 9, No.4, 2011
[13] Seongnam Kwon, Choonseung Lee, Sungchan Kim, Youngmin Yi,                             Aided Design of Integrated Circuits and Systems, IEEE Transactions on
     Soonhoi Ha, “Fast Design Space Exploration Framework with an                          Volume 26, Issue 3, March 2007 Page(s):408 – 420.
     Efficient Performance Estimation Technique”, In Embedded Systems for           [20]   A. Halambi, P. Grun, A. Khare, V. Ganesh, N. Dutt, A. Nicolau,
     Real-Time Multimedia, 2004. ESTImedia 2004. 2nd Workshop on                           EXPRESSION: A Language for Architecture Exploration through
     Volume , Issue , 6-7 Sept. 2004 Page(s): 27 – 32.                                     Compiler/Simulator Retargetability, In Proceedings of the Design
[14] Lilian Bossuet, Guy Gogniat, and Jean-Luc Philippe, “Communication-                   Automation and Test in Europe (DATE), pages 485–490, March 1999.
     Oriented Design Space Exploration for Reconfigurable Architectures”,           [21]   S. Pees, V. Zivojnovic, H. Mey, LISA- Machine Description Language
     In EURASIP Journal on Embedded Systems, Volume 2007, Article ID                       for Cycle Accurate Models of Programmable DSP Architectures, In
     23496, 20 pages.                                                                      Proceedings of the Design Automation Conference (DAC), pages 933–
[15] Kyeong Keol Ryu and Vincent J. Mooney III, “Automated Bus Design                      938, June 1999.
     Space Exploration for Multiprocessor SoC”, In Design, Automation and           [22]   A. Hoffmann, T. Kogel, A. Nohl, G. Braun, O. Schliebusch, O. Wahlen,
     Test     in     Europe      Conference      and    Exhibition,   2003                 A. Wieferink, H. Meyr, A Novel Methodology for the Design of
     Volume , Issue , 2003 Page(s): 282 – 287.                                             Application-Specific Instruction-Set Processors (ASIPs) Using a
[16] Yoonjin Kim, Mary Kiemb, Kiyoung Choi, “Efficient Design Space                        Machine Description Language, In IEEE Transactions on Computer
     Exploration for Domain-Specific Optimization of Coarse-Grained                        Added Design of Integrated Circuits and Systems, 20(11) pages 1338–
     Reconfigurable Architecture”, In Design, Automation and Test in                       1354, November 2001.
     Europe, 2005. Proceedings Volume , Issue , 7-11 March 2005 Page(s):            [23]   O. Schliebusch, A. Hoffmann, A. Nohl, G. Braun, H. Meyr, Architecture
     12 - 17 Vol. 1.                                                                       Implementation Using the Machine Description Language LISA, In
[17] S. Kunzli, L. Thiele and E. Zitzler, “Modular design space exploration                Proceedings of the IEEE / ACM International Conference on VLSI
     framework for embedded systems”, In Computers and Digital                             Design and ASP Design Automation Conference. (VLSI/ ASPDAC
     Techniques,                IEE               Proceedings             -                2002), pages 239–244, January 2002.
     Volume 152, Issue 2, Mar 2005 Page(s): 183 – 192.                              [24]   B. Kienhuis, E. Deprettere, K. Vissers, The Construction of a
[18] Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and David Patti                     Retargetable     simulator    for   an   architecture   template    In
     “EPIC Explorer: A parameterized VLIW based Platform Framework for                     Hardware/Software Codesign, 1998. (CODES/CASHE apos;98)
     Design Space Exploration”, In First workshop on Embedded Systems for                  Proceedings of the Sixth International Workshop on Volume, Issue, 15-
     Real time Multimedia (ESTIMedia), Newport Beach, California, USA,                     18 pages 125 – 129, March 1998.
     Oct. 3-4, 2003.                                                                [25]   N. N. Binh, M. Imai, A. Shiomi, A new HW/SW partitioning algorithm
[19] Sudeep Pasricha and Nikil Dutt, “A Framework for Memory and                           for synthesizing the highest performance pipelined ASIPs with multiple
     Communication Architecture Co-synthesis in MPSoCs”, In Computer-                      identical FUs, In Proc. DAC-96, pages 126-131, September 1996.

                          Deepak Gour, Assistant Professor
                      – Dept. of Computer Science &
                      Engineering, School of engineering,
                      Sir Padampat Singhania University,
                      Udaipur did his B.Sc. (Computer
                      Science) in 1998 & Master in Computer
                      Application (MCA) in 2001. Currently
                      he is Perusing Ph.D. from Department of
Computer Science, Mohan Lal Sukhadia University, Udaipur.
His research area is in ASIP Design Space Exploration. His
Area of Specialization is in Embedded Systems and his
Research interest lies in Application Specific Instruction set
                          M.K. Jain received the M.Sc. degree
                       from     M.L. Sukhadia University,
                       Udaipur, India, in 1989. He received
                       M.Tech.      Degree      in     Computer
                       Applications and PhD in Computer
                       Science & Engineering from IIT Delhi,
                       India in 1993 and 2004 respectively. He
                       is Assistant Professor in Computer
                       Science at M.L. Sukhadia University
                       Udaipur, India since 1993. His current
research interests include application- specific-instruction- set
processor design and embedded systems.

                                                                                                                    ISSN 1947-5500

To top