hands on project

Document Sample
hands on project Powered By Docstoc
han roje
                                   By Robert Cravotta, Technical Editor

   PERFORMANCE                                    Illustration by Doug Fraser

               The options and development tools available to designers for

               implementing custom hardware acceleration are evolving.
               Custom hardware acceleration is an increasingly viable
               method for implementing parallel processing that balances
               performance, power consumption, and cost and brings these
               features to a new threshold. Albert Wang, chief technical of-
    ficer of Stretch, explains that the processor vendor recently introduced
    a software-programmable processor with an integrated reconfigurable
    hardware-acceleration technology, “because time to performance is
    critical in meeting system design requirements for today’s leading-edge
    compute-intensive applications.”
       This first of a two-part hands-on series provides an overview of the
    project effort, options, and tools for hardware acceleration, as well as
    the process for analyzing the software, identifying what to accelerate,
    and implementing the hardware acceleration. Part two, which will ap-
    pear in EDN’s Dec 7, 2004, issue, will focus on hardware-acceleration
    tools that provide an additional layer of abstraction by operating di-
    rectly on system models or the software source.
       Design efficiency and development productivity are paramount
    concerns for embedded-system designers. Design efficiency, in the con-     At a glance ..................................52
    text of this article, encompasses how well a design handles its pro-
                                                                               Comparing performance ........52
    cessing-performance requirements and meets all of its timing, power,
    and cost constraints. Embedded designs that can surmount complex           Reconfigurable acceleration....54
    performance and constraint requirements present an opportunity for         For more information ..............56
    a development team to identify and deliver differentiated, value-added
    features, because the complexity can act as a barrier to competitors. It

  50 edn | November 11, 2004                                                                                            
BENEFIT FROM EVOLVING DEVELOPMENT TOOLS.                         November 11, 2004 | edn 51
    hands-onproject Hardware acceleration

is difficult to offer unique, value-added,                                                              efficiency and reusability. For example,
                                                      AT A GLANCE
differentiating features when a design                                                                  you might redesign hardware to accom-
does not tackle and incorporate suffi-                    The strength of software is to the            modate a different but functionally
ciently complex requirements, because                 strength of hardware as sequential opera-         equivalent hardware component, but
competitors can quickly duplicate and                 tions are to parallel operations.                 software for such a system may have to
incorporate the best ideas into their own                                                               operate on both the old and the new con-
product offerings.                                       Custom hardware-acceleration logic is          figurations. As a result, the software ab-
   However, that complexity barrier is                but one of many ways to accelerate system         stracts a small variation in the hardware
temporary, because competitors will fo-               performance.                                      that is not functionally obvious.
cus their efforts to compete with your                                                                     The development tools for software-
best ideas. You will lose your first-to-                  Hardware-acceleration tools are target-       and hardware-programmable systems
market advantage if you cannot quickly                ing and becoming friendlier to designers          fundamentally differ. The tools that soft-
and adequately adjust your product’s fea-             lacking a hardware background.                    ware and hardware engineers use provide
tures and costs to stay ahead of your                                                                   appropriate but different visualizations
competitor’s efforts. Development pro-                   Choosing between instruction exten-            for the system behavior because of the
ductivity balances out design efficiency.             sions or custom coprocessor logic depends         predominant sequential or parallel na-
It encompasses not only the time and re-              on a design’s hardware needs.                     ture of each type of programmability. Jim
sources required to complete your cur-                                                                  Hwang, director of DSP-design tools and
rent design and implementation effort,            development cycle and substantial non-                methodologies at Xilinx, contends that
but also the reusability of the results in        recurring engineering costs that are dif-             HDL designers represent a small portion
the inevitable follow-up projects. If your        ficult to recover when there are rapidly              of the number of developers working on
development processes and tools cannot            evolving feature requirements.                        signal-processing applications. Most de-
abstract and automate nondifferentiat-               Programmable platforms reside at the               signers in the DSP world, he explains, use
ing portions of your design effort better         other end of the spectrum. They empha-                C and Matlab. According to Hwang, de-
than your competitors’ processes and              size productivity over efficiency and al-             velopment tools, such as System Gener-
tools can, you may open the door for              low both reusability and the flexibility to           ator for DSP, “lower the barrier to use
those competitors to outdo you in inno-           quickly modify or substitute code or log-             FPGA-based products by application
vation. You may have to concentrate               ic to implement a new feature or func-                programmers and systems architects who
more valuable development time and re-            tion. Software-programmable systems                   do not know VHDL.” Many of the com-
sources than your competitors on those            can efficiently implement sequential pro-             panies that support custom hardware ac-
nondifferentiating details.                       cessing but can be inefficient for parallel           celeration are investing significant devel-
   An ASIC is a semiconductor device              processing. Programmable hardware sys-                opment resources to offer tools and
that you can optimize to an application           tems excel at implementing parallel op-               devices that allow nonhardware engi-
by implementing only those perform-               erations but are generally less efficient             neers, such as system designers and soft-
ance features an application needs to             than software-programmable systems at                 ware engineers, to implement parallelism
meet requirements. An ASIC can best               performing sequential operations. Soft-               in software as hardware acceleration.
balance performance, cost, and power,             ware also excels at incorporating layers of              Early on in this hands-on project, EDN
but its implementation incurs a longer            abstraction that hardware cannot for cost             had to make a decision about its scope.

  During the early planning stages    accelerating the CRC (cyclic-           worst-case software implementa-      processor architecture. For
  of this hands-on project, EDN       redundancy-check) algorithm             tion with a best-case hardware       example, whether the processor
  considered comparing the accel-     with Atmel, the hardware-imple-         acceleration. When measuring         includes a MAC (multiply/accu-
  eration difference of the same      mented algorithm performed 23           hardware performance, it is a        mulate) unit or appropriate data-
  algorithm across each architec-     times faster than the original          stand-alone execution unit, but      bus structure can significantly
  ture. But we dumped the idea,       software implementation. The            software performance is part of      affect the software performance
  because no application space is     software implementation was a           a greater whole. For example,        of a signal-processing algorithm.
  common to every architecture,       loop-driven CRC, and comparing          the memory organization or size      Our experience at Tensilica
  and it would quickly become an      the acceleration difference with a      of the processor cache or regis-     drove home this point when we
  apples-to-oranges comparison.       table-driven CRC would have             ter files can affect the perform-    implemented a five-operand
     We next considered compar-       reduced the performance com-            ance of an algorithm; therefore,     adder that required more data
  ing and discussing the before-      parison. With more time and             some systems support config-         than the bus structure could
  and-after performance metrics       effort, we could have improved          urability of these types of archi-   support in one clock cycle.
  for each system. However,           the performance of the hard-            tectural features.                   Another example is implement-
  although this comparison might      ware implementation. Analogous             It is important to avoid imple-   ing streaming DSP functions on
  be useful, it would be subject to   conditions exist for each acceler-      menting hardware acceleration        a microprocessor that includes a
  incomplete and inappropriate        ation effort.                           to compensate for executing the      hardware MAC unit but cannot
  conclusions. For example, when         Be careful not to compare a          software on an inappropriate         handle the streaming-data load.

52 edn | November 11, 2004                                                                                                      
     hands-onproject Hardware acceleration

The project could have a nar-                                                                             perience and that would
row focus and deeply explore                                                                              demonstrate how the com-
the process of accelerating an                             EXTENSION-                                     pany’s architecture and
algorithm on a single plat-                               INSTRUCTION                                     tools supported the analy-
form, or it could approach                                                                                sis and implementation for
hardware acceleration broad-                                                                              each acceleration. By dis-
ly (and consequently spend                                                                                couraging common algo-
less time on each architecture)                                                                           rithm acceleration be-
by examining several architec-                            INSTRUCTION                                     tween companies, the
                                   INSTRUCTION               DECODE
ture and tool offerings. We                                                                               project effort could better
chose the broader and shal-                                                                               focus on the process, tools,
lower approach to better high-                                                                            and architectural mecha-
light the methods of the grow-     OPERAND A                                                              nisms and avoid the temp-
ing number of companies                                                                                   tation to compare the effi-
offering or supporting hard-                                                                              ciency of each acceleration
ware acceleration. Just within                                                                            effort (see sidebar “Com-
                                   OPERAND B
the last few months, two com-                                                                             paring performance”).
panies announced their first                                                                                 Altera provides pro-
products. Poseidon Design                                                    INSTRUCTION                  grammable-logic devices,
Systems introduced                                                                             RESULT     associated software tools,
                               Figure 1
tools that synthesize                                                                                     and IP (intellectual-prop-
application-specific hardware                                                 INSTRUCTION                 erty) software blocks. Its
accelerators directly from                                                   CRITICAL PATH                Excalibur devices integrate
standard ANSI C source code,                                                                              an ARM922T processor
and Stretch rolled out its cata- A custom instruction is an extension of the processor’s ALU (arithmetic- (that can operate at 200
log processor family and tool logic unit) and instruction decoder. The custom instruction logic may be a  MHz) with the Apex 20KE
set, which enables hardware critical path that can affect the processor clock rate unless you implement   FPGA architecture. For
acceleration from ANSI C pipeline-stall controls or instruction pipelining.                               this project, the program-
source code.                                                                                              ming and acceleration tar-
   We spent time at Altera, ARC, Atmel, ment environment; it also meant an en- gets were the 32-bit Nios and Nios II soft-
MIPS, Tensilica, and Xilinx, implement- gineer could direct us through the process. processor cores, which can reside in
ing hardware acceleration of algorithms        Each company chose an algorithm to Altera’s Stratix, Cyclone, and HardCopy
of their choice on their workbenches. Per- accelerate without knowing what the FPGA devices. The Nios II core is avail-
forming the task at each site alleviated other companies chose. Therefore, each able in three configurations, all of which
time constraints and simplified our effort company could choose the algorithm ac- support custom instructions. The stan-
to license and configure each develop- celeration with which it had the most ex- dard-configuration Nios II/s balances

  A reconfigurable platform is one     This effort addresses not only           To access and use each of the      while the rest of the device con-
  way to reduce a design’s size        the actual dynamic reconfigura-       coprocessors (in this case, differ-   tinues to operate normally.
  and cost, and it allows designers    bility of an FPGA device, but also    ent floating-point operations),          The Reconf demonstration
  to realize the benefits of using     the necessary development-tool        the application software calls a      and the Stretch product demon-
  hardware accelerators with soft-     support. Atmel’s AT94K FPSLIC         CallFPGA function. Because the        strate how an application pro-
  ware in an embedded design.          (field-programmable system-           FPGA is too small to simultane-       grammer can use many acceler-
  Some systems, including one          level IC) is the demonstration        ously contain all the operations,     ated operations in a program-
  from Morpho Technologies, can        device for the current project. It    the FPGA reconfigures the             mable-logic fabric tied to a
  reconfigure their logic in one       supports runtime reconfiguration      coprocessor as necessary within       processor without being familiar
  clock cycle to accelerate applica-   by relying on its ability to recon-   50 msec, using an infrastructure      with the digital-circuit design.
  tion-specific tasks. Some FPGA-      figure part of the FPGA without       that includes a reconfiguration
  based systems can support run-       affecting the operation of the        controller in the FPGA.               Reference
  time reconfiguration of the sys-     rest of the device. Atmel’s Figaro       Analogously, Stretch’s ISEF           A. Danek, M, P Honzik, J
  tem; however, their reconfigura-     design-implementation tool does       (instruction-set-extension fabric)    Kadlec, R Matsousek, Z Pohl,
  tion requires many clock cycles.     not natively support partial          supports runtime reconfiguration      “Reconfigurable System-on-a-
     The Reconf project (www.          reconfiguration, so the project       by partitioning the ISEF into two     Programmable-Chip Platform,” is attempting to         defines a special implementation      dynamically loadable sections.        Seventh IEEE Workshop on
  demonstrate a transparent infra-     procedure to generate the bit         This setup allows the device to       Design and Diagnostics of
  structure for dynamic FPGA           streams for all necessary             reconfigure one section of the        Electronic Circuits and Systems,
  reconfiguration (Reference A).       coprocessor contexts.                 ISEF over many clock cycles           April 2004.

54 edn | November 11, 2004                                                                                                      
     hands-onproject Hardware acceleration

performance and size (cost); the                                                               processor or peripheral to the AVR.
economy-configuration Nios II/e                                                                Atmel’s System Designer environ-
implements the smallest core—                                                   CUSTOM         ment includes FPGA-development
                                                     CPU                      HARDWARE
roughly half the size of the Nios                                                              tools, AVR studio tools, and co-ver-
II/s standard core—at the ex-                                                                  ification tools to allow concurrent
pense of performance and main-                                                                 hardware and software develop-
tains compatibility with                                                                       ment and debugging.
                                 Figure 2
the Nios II ISA (instruc-                                                                         MIPS Technologies licenses 32-
tion-set architecture); and the                                          ARBITRATOR            and 64-bit processor architectures
Nios II/f configuration maxi-                                                                  and cores. Its Pro Series processor
mizes the core for processing per-                                                             core family features the CorExtend
formance. Altera’s SOPC (system-       PERIPHERALS      PROGRAM               DATA             capability, which allows designers
                                                        MEMORY              MEMORY
on-programmable-chip) Builder                                                                  to add and integrate proprietary
system wizard-driven develop-                                                                  instructions and tightly couple
ment tool enables you to work Custom acceleration as a coprocessor or peripheral allows the    hardware to the core. We focused
with the MathWorks’ Matlab and custom logic to operate loosely coupled to or independently     on the MIPS32 24Kc Pro processor
Simulink to develop, implement, of the processor architecture. The custom logic may also       core, which includes the CorEx-
and port algorithmic designs to access system resources, such as memory, without processor     tend capability. The session in-
HDL files for use with the Quar- intervention.                                                 volved implementing and verifying
tus II design software.                                                                        a UDI (user-defined instruction)
   ARC licenses configurable and pre- struction extensions to the pipeline. The that performed computations on a block
configured processor and IP cores along tool automates the insertion of all con- of data. To implement the UDI, design-
with development software, profiling trol signals and structures to integrate ers build a CorExtend block in Verilog
tools, and a real-time operating system. your instruction to the pipeline and cre- RTL and integrate it with the processor
Its configurable processor cores support ates a library that the ARChitect tool core. The MIPS software-tool kit in-
optional DSP extensions and scale from suite can use.                                    cludes the software-development envi-
the four-stage-pipelined ARCtangent-A4        At nearly all of the companies, we ronment and the MIPSsim software sim-
architecture to the seven-stage-pipelined worked with a 32-bit architecture. How- ulator. MIPSsim uses an API for creating
ARC 700 architecture. The cores are syn- ever, at Atmel, we explored an 8-bit AVR customized CorExtend libraries, so you
thesizable and configurable, and design- microcontroller core integrated with an can add any UDIs to MIPSsim for func-
ers can extend the ISA.                     FPGA on an FPSLIC (field-programma- tional and cycle-accurate simulations.
   This project targeted the 32-bit ARC- ble system-level IC) device. Hardware ac-          Tensilica’s configurable, extensible,
tangent-A4 processor but also focused celeration with an 8-bit device is more and synthesizable processor cores em-
on how the project would differ using an about lowering your processing costs phasize configuring predefined elements
ARC 600 processor (most notably a than tackling a high-performance state- of the architecture and building new in-
longer instruction pipeline and addi- of-the-art algorithm. This project fo- structions and hardware-execution units.
tional ALU/DSP extensions). The accel- cused on implementing a CRC (cyclic- The Xtensa Processor Generator pro-
eration effort focused on implementing redundancy-check) function on a block duces a software-development environ-
a packet-header-processing function as of data as a fire-and-forget peripheral ment, including operating-system sup-
a custom instruction. The ARChitect processor. The AVR core is not config- port, for each processor configuration. Its
configuration tool configures the cores urable, and the ISA is not extensible, but Xtensa V processor core was the object of
and the Extension Instruction Automa- the FPGA enables designers to imple- interest. (The Xtensa LX core was not
tion tool suite to integrate Verilog in- ment hardware acceleration as a co- available in time to support the project.)

   For more information on products such as those discussed in this article, contact any of the following manufacturers directly, and please let them know you read about their
   products in EDN.
   AccelChip                   ARM                          Critical Blue                Morpho Technologies          QuickLogic                   Texas Instruments
   1-408-943-0700                44-01223-400400            1-408-467-5091               1-949-475-0626               1-408-990-4000               1-800-336-5236           

   Altera                      Atmel                        The MathWorks                Pentek                       Stretch                      Xilinx
   1-408-544-7000              1-408-441-0311               1-508-647-7000               1-201-818-5900               1-650-864-2700               1-408-559-7778                  

   ARC                         Celoxica                     MIPS Technologies            Poseidon Design              Tensilica
   1-408-437-3400                44-0-1235-863656           1-650-567-5000               Systems                      1-408-986-8000                           1-770-937-0611     

56 edn | November 11, 2004                                                                                                                               
    hands-onproject Hardware acceleration

The hands-on session at Ten-                                                                            of functions but increases
silica focused on implement-                                                                            complexity. ASSPs imple-
ing computational instruction                                                                           ment optimized peripheral
extensions, including one that                                                                          and hardware-acceleration
used more data than the data-                                                                           features for a specific appli-
flow structure could support                BUILD/COMPILE             UPDATE CO-DEVELOPMENT             cation. However, using them
in a single cycle. To create in-              APPLICATION                         TOOLS                 can make differentiating a
struction extensions, engineers                                                                         product challenging, because
use the TIE (Tensilica Instruc-                                                                         they are narrowly targeted,
tion Extension) language and           SIMULATE/DEBUG/PROFILE                     MODIFY                and competitors can access
                                              APPLICATION              APPLICATION SOFTWARE
compiler. The TIE language is                                                                           them. Licensing IP cores to
a hybrid of Verilog and C that                                                                          accelerate algorithms can
describes custom instructions,                                                                          make sense, but, because
including multicycle and                         MEETS           NO                                     these cores are licensable,
                                             PERFORMANCE                     BUILD/MODIFY
pipelined. The TIE compiler                      GOALS?               HARDWARE ACCELERATION             competitors can also access
generates the files to                                                                                  them. Therefore, they are not
customize the soft-           Figure 3                                                                  a differentiating feature.
ware-tool chain, extend the in-                                                                            Custom hardware acceler-
struction-set-simulator and C-          VERIFY IMPLEMENTATION                                           ation is an increasingly viable
                                          AND LAYOUT TIMING
modeling environment, and                                                                               method of implementing
estimate the hardware re-                                                                               parallel processing. And, for
sources for processor configu-                                                                          some parallel operations, it
rations and custom instruc-                      DESIGN
                                                                 NO                                     can improve performance by
tions.                                           GOALS?                                                 an order of magnitude or two
   At Xilinx, which provides                                                                            over a software implementa-
programmable-logic devices,                          YES                                                tion.You can implement cus-
advanced ICs, IP, and soft-                                                                             tom hardware acceleration
                                         PERFORM REMAINING
ware-design tools, we worked               HARDWARE FLOW
                                                                                                        with a discrete FPGA inter-
with the 32-bit MicroBlaze soft                                                                         faced to a standard processor
processor core loaded in Virtex                                                                          device. Several processor
and Spartan FPGA devices. A generalized design flow for accelerating software highlights the per-        providers, including Altera,
Xilinx also offers the 8-bit Pi- formance-simulation and implementation-verification loops when build-   Atmel, QuickLogic, Stretch,
coBlaze soft core and the Pow- ing custom hardware acceleration.                                         and Xilinx, offer a processor
erPC 405 hard core. The accel-                                                                           integrated with an FPGA on
eration effort focused on implementing hand-coded assembly. They can use a single device. A growing number of
functions for MP3 as a custom logic faster processors, repartition the problem software providers, such as AccelChip,
block or peripheral. The MicroBlaze does over multiple processing engines, or use Altera, Celoxica, Critical Blue, Poseidon
not support custom instructions exten- application-optimized processor archi- Design Systems, Stretch, and Xilinx, are
sions, but it does support user-defined tectures, such as ASSPs (application-spe- offering software tools to make hardware
hardware acceleration as a peripheral or cific standard products) that integrate acceleration approachable for designers
coprocessor through an FSL (Fast Sim- application-specific hardware accelera- lacking extensive hardware-design train-
plex Link) interface or the CoreConnect tors. They may even consider creating ing. Incorporating the acceleration into
On-chip Peripheral Bus. The FSL inter- their own custom instruction or hard- the application software often consists of
face provides a low-latency dedicated in- ware accelerator.                                  substituting the source code with com-
terface between the MicroBlaze register          Throwing processor resources, such as piler intrinsics or inline assembly.
file and the custom acceleration logic. memory, at an algorithm can increase the                Designers can implement custom
The embedded development kit and the algorithm’s speed but requires a trade-off hardware acceleration as a custom in-
Xilinx Platform Studio provide an inte- in hardware resources. Working with on- struction that is tightly coupled with the
grated environment for creating Mi- chip rather than off-chip memory can processor architecture. Custom instruc-
croBlaze and PowerPC designs.                 also make a material performance differ- tions are effectively extensions of the
                                              ence. Implementing the algorithm as processor’s ALU (arithmetic logic unit,
OPTIONS                                       hand-coded assembly allows a designer Figure 1). The custom instruction logic
   Sometimes, designers need a time-crit- to stay with a fixed-ISA processor but can couples to the processor clock rate and
ical algorithm to execute more quickly tie him closely to it. Using a faster proces- must provide control signals to stall the
than the software can handle. In such cas- sor can incur adverse side effects, such as pipeline or implement a custom pipeline
es, they can try to accelerate the per- increased power consumption. Reparti- when the critical path of the custom log-
formance of the algorithm by restruc- tioning the design over multiple pro- ic is too long. Designers may need to
turing it; throwing more data memory at cessing engines, such as by using a mi- multicycle or pipeline a custom instruc-
the problem, such as by using a look-up croprocessor with a DSP, allows designers tion if there is a data dependency. They
table; or implementing the algorithm in to use the device that best suits each set may be unable to efficiently implement
58 edn | November 11, 2004                                                                                       
    hands-onproject Hardware acceleration

a custom instruction for data-intensive            DEVELOPMENT TOOLS ARE                   ecuting, and profiling each configuration
algorithms if they need to access data be-         EVOLVING TO MAKE INTER-                 means using indirect support files, and
yond the capacity of the ALU datapath                                                      it is easy to forget a single change here or
(typically, two operands and one result),         CHANGING HARDWARE AND                    there. Script files ultimately made the dif-
unless the processor architecture sup-                                                     ference in keeping things straight.
ports local memories, registers, and data-     SOFTWARE MODULES AVAILABLE                      The time constraints of examining dif-
access mechanisms,                                 TO DESIGNERS WITHOUT A                  ferent architectures and tool sets didn’t
   Another approach to implementing                                                        allow the project to explore an entire
hardware acceleration is as a custom log-          HARDWARE BACKGROUND.                    EDA-tool chain. For example, we didn’t
ic block that loosely couples with the                                                     have time to wait for the physical-syn-
processor as a coprocessor or peripheral       bleshooting fun. During one session, the    thesis tool to complete its operations, so
via a data interface, such as a memory         workstation died, and none too grace-       we effectively talked through that portion
controller (Figure 2). The coprocessor         fully, either. However, the malfunction     of the process. The generalized hardware-
performs its logic outside of the proces-      did explain some of the intermittent be-    acceleration design flow appropriate for
sor, so critical path-timing issues need not   haviors we were experiencing. Another       this project focuses on the iterative loop
directly affect the clock-to-clock opera-      session included a mismatch between         for identifying, creating, and integrating
tion of the processor core. However, they      drivers and tool versions, but the blame    the hardware acceleration into the cosim-
can affect your application’s overall oper-    here lay with porting design code to a      ulation environment (Figure 3).
ation. A coprocessor may also be able to       new version of the tools and a different        Each session began with implement-
access other peripherals or memory in the      target.                                     ing an algorithm solely in software. Sim-
system without processor intervention.            The most common troubleshooting          ulation tools are essential components of
                                               problems arose from manually entering       any co-development-tool suite; by using
THE PROCESS                                    changes into code or configuration files.   the simulation and profiling tools, we
   Despite the fact that we operated in        Because these sessions dealt with “sim-     could identify software bottlenecks.
what could have been a highly controlled,      ple” examples, we did not enforce strict    Good candidates for hardware accelera-
lablike demonstration environment, each        version control. However, this approach     tion include operations that allow the
session offered some level of trou-            turned out to be a mistake. Building, ex-   merging of multiple sequential opera-

60 edn | November 11, 2004                                                                                      
    hands-onproject Hardware acceleration

tions to produce a single set of outputs compiler-intrinsic or inline assembly                  ware implementation over a software
from a single set of inputs and operations statement in the original code. To access            implementation with the application
that let you execute parallel computa- a custom coprocessor or peripheral, they                 needs. Designers may not want to incur
tions on a set of independent inputs to may need to add code to use a driver or                 the recurring hardware costs if simplify-
produce independent outputs.                      API. After rebuilding the system, they        ing the software algorithms is sufficient.
   After identifying what code to accel- cosimulate the system and profile the                  For example, if they have unused mem-
erate and deciding whether to implement performance and behavior. They repeat                   ory in their systems, building a look-up
the acceleration as a custom instruction this process, possibly improving on pre-               table may prevent additional costs for re-
or coprocessor, designers need to create viously implemented acceleration logic,                sources to implement a hardware accel-
the hardware design. Some tools, which until the system design can meet the per-                erator. On the other hand, they may be
Part Two of this project will explore, as- formance goals. The next steps in the                able to reduce the total system cost with
sist with or automate the creation of the process are validating the functioning of             hardware acceleration, and implement-
hardware acceleration straight from the the acceleration-logic RTL with a verifi-               ing a runtime-reconfigurable system may
source code. If designers manually im- cation-test program, followed by gate-                   yield even larger cost benefits (see side-
plement acceleration blocks, they will level timing verification. If the testing                bar “Reconfigurable acceleration”).
create VHDL or Verilog files. However, if shows that the design meets all of the
they are using Tensilica’s tool                                functional and design re-        Acknowled gment
suite, they will create the acceler- You can reach Technical quirements, the designer can       Special thanks to the people at Altera,
ation instructions/blocks via TIE Editor Robert Cravotta1- proceed with the rest of the
                                       1-661-296-5096, fax
                                                                                                ARC, Atmel, MIPS, Tensilica, and Xilinx
modules.                                661-296-1087, e-mail   hardware flow.                   for coordinating their teams and hosting
   To use the new (or modified)                                   Development tools are         a hands-on session with us at each of their
acceleration logic, designers need                              evolving to make interchang-    sites.
to instantiate the acceleration log-                            ing hardware and software
ic in the cosimulation tools and                                modules available to design-
modify the algorithm software.                                  ers without a hardware back-    Talk to us
Using a custom instruction can                                  ground. Balance decisions to    Post comments via TalkBack at the online
involve inserting or substituting a                             use a better performing hard-   version of this article at

62 edn | November 11, 2004