reconfigarable-processors
Document Sample


Reconfigurable Processors
Author: Co-Author:
Shailesh Kulkarni, Naveen Bhat,
6th sem, E&C, 6th sem, E&C,
GIT, Belgaum. GIT, Belgaum.
kulkarni.shailesh@gmail.com bhatnaveen2005@yahoo.com
Abstract
The computational evolution has been civilizing form the day transistors were first
fabricated. As research progressed in the field of computation, there have been two
directions of thought. On the one extreme we use general-purpose processors that are
totally programmable but are expensive (and relatively slow) and on the other we use
custom circuits called as application specific integrated circuits that are fast and cheap
but are not flexible.
To bridge this gap we investigate reconfigurable processors as one potential
solution. A reconfigurable processor is a microprocessor with ‘erasable hardware’ that
can rewire itself dynamically. This allows the chip to adapt effectively to the
programming tasks demanded by the particular software they are interfacing with at
any given time. Ideally, the reconfigurable processor can transform itself from a video
chip to a central processing unit to a graphics chip, for example, all optimized to
allow applications to run at the highest possible speed. Its key feature is the ability to
perform computations in hardware to increase performance, while retaining much of
the flexibility of a software solution.
In particular we focus on the architectural aspects of reconfigurable
processors and their applications in this paper.
Reconfigurable Processors Page 1
Contents
1. Introduction 02
2. Reconfigurable Computing Paradigm 04
3. Reconfigurable Systems 06
4. Research Challenges 07
5. Integration of Computing Elements 07
5.1. Coupling 08
5.2. Instructions 08
5.3. Operands 09
6. Reconfiguration Unit 10
6.1. Granularity 10
6.2. Interconnect 10
6.3. Reconfiguration time 11
7. Programming model and program transformation example 12
8. Applications 16
9. Conclusions 17
10. References 18
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 2
1. Introduction
Advances in technology over the past few years have brought us to the brink of an
exciting new discovery - a completely new type of computer with characteristics quite
unlike anything that has been seen before.
For many years the field of computing has been centered on general-purpose
processors (GPPs). Great advances in technology have been made, with current
general purpose Central Processing Units (CPUs) being many orders of magnitude
more powerful than the first ones. The great flexibility of these processors encouraged
investigation of widely varying applications and fostered great advances in software
engineering, with many new leaps being made possible by the continual introduction
of ever-more-powerful processors. They are flexible due to their versatile instruction
sets that allow the implementation of any computation task. Yet all this time both
hardware and software fields were being guided by a Von Neumann-derived approach
to computing and naturally the people involved with computing have developed a way
of thinking which corresponds to the technology, which they are accustomed to using.
Designers of digital systems face a fundamental trade-off between flexibility and
efficiency when selecting computing elements. The available alternatives span a wide
spectrum with general-purpose microprocessors and application-specific integrated
circuits (ASICs) at opposite ends.
ASICs are dedicated hardware circuits tuned to a very small number of applications or
even to just one task. ASICs are mainly used in high-volume embedded system
markets such as telecommunications, consumer electronics, or the automotive
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 3
industry. For a given task, dedicated circuits execute faster, require less silicon area,
and are more power efficient than general-purpose architectures. The drawback of
such highly specialized architectures is their lack of flexibility– if the applications
change, a redesign of the ASIC is required.
Besides speed and flexibility, the used area is also an interesting point of comparison.
General-purpose processors (GPPs) try to get a lot of parallelism out of the software.
They utilize notions of super-scalar and pipelined. However, in order to be able to
support these features a lot of control overhead is necessary. But even with a lot of
control the flushing of a pipeline and idle execution units cannot be avoided due to
data dependencies in the instructions. The size of the latest GPPs is simply huge. They
cover a die area of more than 40 millions transistors. On the other hand ASICs are
usually small, since they are application specific and therefore do not need the large
control overhead of GPPs. Thus it is interesting to ponder as to whether, it is now
possible to combine those features to a single device? Or is it possible to have a
flexible architecture on a device that implements some application specific
algorithms? And will this solution be faster than a GPP?
In the last decade, the new classes of reconfigurable computing devices have
emerged, which promises to combine the flexibility of processors with the efficiency
of ASICs. The hardware of reconfigurable devices is not static but adapted to each
individual application. Through hardware customization, reconfigurable devices
potentially achieve a higher efficiency than microprocessors, while the dynamics of
the customization process allow a higher level of flexibility than ASICs.
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 4
Figure 1 Tradeoffs between flexibility and performance
Figure 1 outlines the trade-off between flexibility and efficiency as well as the
position of reconfigurable devices compared to processors and ASICs.
2. Reconfigurable Computing Paradigm
Figure 2 sketches the computing paradigms of processors and ASICs, respectively.
Processors have a general, fixed architecture that allows tasks to be implemented by
temporally composing atomic operations, which are provided for example by the
arithmetic and logic unit (ALU) or the floating-point unit. In contrast, ASICs
implement tasks by spatially composing operations, which are provided by dedicated
computational units like adders or multipliers. Reconfigurable computing combines
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 5
these computing paradigms by means of reconfigurable hardware structures, which
allow tasks to be implemented both ‘in time’ and ‘in space’.
Figure 2 Computation of the expression y=Ax2 + Bx + C
The characteristics of the computing paradigms are also reflected in the respective
system compositions, outlined in Figure 3. In the processor case, the instructions
(composed into the program code) define the behavior of the computing element. The
behavior of a reconfigurable device is specified by its configuration. The behavior of
an ASIC is typically hard-wired and does not allow for any dynamic adaptation,
except maybe for some adjustable coefficients.
Figure 3 System outlines corresponding to the computing paradigms of
processors, reconfigurable devices, and ASICs
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 6
Within the domain of reconfigurable computing two fundamental kinds of (re-)
configurability are distinguished.
Static or compile-time reconfiguration (CTR) – where the configuration of the
device is loaded once at the outset, after which it does not change during the
execution of the task at hand, and
Dynamic or run-time reconfiguration (RTR) – where the configuration of the
device may change at any arbitrary moment during run time.
This paper focuses on dynamically reconfigurable devices. Static reconfiguration is
thus not further considered and discussed.
3. Reconfigurable Systems
The enabling technology for building reconfigurable systems was the field-
programmable gate array (FPGA). FPGAs were introduced to the market at the high-
end of programmable logic devices (PLDs) in the mid 1980s. FPGAs consist of an
array of logic blocks, routing channels to interconnect the logic blocks, and
surrounding I/O blocks. SRAM based FPGAs use static RAM (SRAM) cells to
control the functionality of the logic, I/O blocks, and routing. They can be
reprogrammed in-circuit arbitrarily often by downloading a bit stream of
configuration data to the device. While early FPGA generations were quite limited in
their capacity but today’s devices feature millions of gates of programmable logic,
dense enough to host complete computing systems.
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 7
4. Research Challenges
The main challenges in designing a reconfigurable processor are
1. The integration of computing elements, processor core and reconfigurable unit,
2. The design of the reconfigurable unit itself, and
3. The hybrid programming model that utilizes both the static and reconfigurable
units
Projects that rely on fine-grained reconfigurable elements target general-purpose
computing (GP) as application domain, the coarse-grained elements target
multimedia (MM) applications. We will now describe in further detail the different
design challenges involved in the above three steps.
5. Integration of Computing Elements
The interaction between a processor core and reconfigurable unit is a critical aspect
and needs a lot of design space exploration. The architectural integration of these
elements concerns the coupling between logic core and reconfigurable unit, the way
instructions are issued to the reconfigurable unit, and the way operands are
transferred from and to the reconfigurable unit.
Figure 4 Possible couplings of core and reconfigurable units
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 8
5.1. Coupling
The relative position of processor core and reconfigurable unit determines the type of
applications that benefit most from the hybrid architecture. Generally, a tighter
coupling leads to a smaller communication overhead. Loose couplings thus require
bigger amounts of computation assigned to the reconfigurable unit. Couplings can be
classified into the following three main categories that are illustrated in Figure 4:
1. Reconfigurable functional unit (RFU): The reconfigurable unit is integrated into
the processor core as any other functional unit. Examples are OneChip [3], and
Chimaera [4].
2. Coprocessor: The reconfigurable unit is part of the processor and placed next to
the core. Examples are Garp [5], Stretch S5000 series [7].
3. Attached processing unit: The reconfigurable unit is placed outside the processor
and connected to a memory or I/O bus. Contrary to RFUs and coprocessors, there
is no extension to the core processor’s instruction set. Example is Triscend.
Most of the past reconfigurable computers use attached processing units and connect
a processor to a number of FPGAs via an I/O bus, e.g., the PCI bus. There is no
instruction set extensions for the reconfigurable unit. In the remaining part of this
paper, we concentrate on RFU and coprocessor approaches.
5.2. Instructions
Both RFU and coprocessor approaches extend the core's instruction set with
customized instructions. The processor core fetches and decodes instructions, and
issues the new instructions to the corresponding units. For RFUs two types of
instructions exist:
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 9
1. Instructions that start the reconfiguration of the RFU and,
2. Instructions that actually execute the RFU function.
Instructions for coprocessors also include reconfiguration and execution instructions,
but additionally instructions that transfer data and synchronize the core with the
reconfigurable unit are required. Synchronization is required whenever two
computing elements operate concurrently. RFUs can operate concurrently with other
functional units, because the core’s control logic synchronizes activities and controls
access to the register file. Simple approaches for coprocessors force the core to stall
until the execution of the reconfigurable unit has completed. More advanced
techniques allow concurrent operation and synchronize by semaphore-like
mechanisms.
While the RFU approach delivers the fastest interaction between core and
reconfigurable unit, it requires a major redesign of the core. Coprocessors need less
core redesign but can require more effort for synchronization. RFUs are presently
gaining interest for embedded very-long-instruction-word (VLIW) architectures;
where optimized compilers extract parallelism and schedule customized functional
units at compile time.
5.3. Operands
An RFU – as any other functional unit – uses the core's register file to read and write
data. Coprocessors can use several options: First, data may be transferred between
the coprocessor and the core via registers. Second, coprocessors can have access to
the same memory hierarchy as the core including several levels of caches, on-chip
memories, and the external memory interface. Third, to increase the overall memory
bandwidth some approaches equip the reconfigurable units with dedicated memory
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 10
ports. While this certainly increases bandwidth, it can also lead to data consistency
problems.
6. Reconfiguration Unit
We will now discuss the design aspects of the reconfiguration unit.
6.1. Granularity
The granularity can be fine-grained or coarse-grained. Fine-grained arrays use logic
blocks with 2-bit to 4-bit inputs and single flip-flops. These structures are well suited
to implement bit-manipulation operations and random logic. Coarse-grained
architectures accommodate 8-bit to 16-bit ALUs and registers and are better suited to
implement regular arithmetic operations on byte and word-sized data found in most
multimedia applications. There is obviously a trade-off involved as most real-world
workloads contain both types of applications. Researchers currently investigate
multi-granular elements that are well suited to implement bit manipulation operands,
but can also be efficiently arranged to suite byte operations. A parameter strongly
related to granularity is configuration size. Given a certain silicon area for the
reconfigurable unit, one can implement many fine-grained elements or less coarse-
grained ones. A large number of fine-grained elements require more configuration
data than a smaller number of coarse-grained elements.
6.2. Interconnect
The structure of the reconfigurable unit is not only determined by the granularity of
the computing elements, but also by their interconnect. Reconfigurable elements are
placed in 2-D arrays. The simplest interconnect connects each element to its four
neighbours horizontally and vertically. Additional buses may exist that connect all
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 11
elements in a row and in a column. Today, most interconnects are hierarchically
structured. The reconfigurable unit is divided into compounds, which themselves
consist of several computing elements. Both the compounds that form the unit and
the elements that form a compound use their own interconnect systems. Compounds
can also contain specialized resources, e.g., memory blocks.
6.3. Reconfiguration time
Reconfiguration time is an important parameter that should be kept as small as
possible. It depends on the configuration size and on the location from where the
configuration data has to be read. The clear goal is single-cycle reconfiguration, i.e.,
the whole reconfigurable unit is reprogrammed in a single clock cycle. This requires
the configuration data to be stored on the processor, near the reconfigurable
elements. The data required to configure the reconfigurable unit is commonly
denoted as context. Multi-context reconfigurable processors are able to store several
contexts on the chip. The simplest context fetching mechanism is load on demand.
Single-context as well as multi-context units use this mechanism when a
configuration is required which is not present in the context memory. For multi-
context architectures there are more sophisticated fetching mechanisms. The context
memory can be used as cache, where recently used contexts may be found.
Alternatively, a context can be pre-fetched concurrently to the execution of a
different context. We will now look at the last but important challenge of
programming model.
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 12
7. Programming model and program transformation example [6]
Programming models for reconfigurable processors have not yet received sufficient
attention. This will certainly change as the success of these hybrid architectures
strongly depends on reasonable programming models that allow for the construction
of automated code generation tools.
Current commercial programming environments consist of two separate tool flows,
one for software and one for hardware. Processor code and configuration data for the
reconfigurable units are handcrafted and wrapped into library functions that are linked
with the user code. This approach is also used to develop applications for most
research processors. The next steps are compilers that automatically generate code
and configurations from a general-purpose programming language such as ‘C’. Such a
compiler constructs a control flow graph from the source program and then decides
which operations will go into the reconfigurable unit. Generally, inner loops of
programs are good candidates for reconfigurable implementation since these loops are
responsible for a large amount of the total execution time of that application.
In this section, we present the general concept of transforming an existing program to
one that can be executed on a reconfigurable computing platform. The conceptual view
of how program P is transformed into program P’ is depicted in Figure 5. The
purpose is to obtain a functionally equivalent program P’ from program P which
(using specialized instructions) can initiate both the configuration processes and
execution processes on the reconfigurable hardware.
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 13
Figure 5 Program transformation example.
The steps involved in this transformation are the following:
1. Identify code ‘α’ in program P to be mapped in reconfigurable hardware
2. Eliminate the identified code and add code to have ‘equivalent’ code (A)
assuming that A ‘calls’ the hardware with functionality ‘α’
3. Show hardware feasibility of ‘α’ in a current technology (e.g., field-
programmable gate array (FPGA)) and map ‘α’ into reconfigurable hardware
4. Execute program P’ with original code plus code having functionality A
(equivalent to functionality ‘α’) on the reconfigurable processor
The mentioned steps illustrate the new programming paradigm in which both
software and hardware descriptions are present in the same program. It should also
be noted that because the only constraint on ‘α’ is ability to implement, it is also
implied that the micro architecture has to support emulation. This implies the
utilization of microcode. We have termed this as reconfigurable microcode (ρ-µcode)
as it is different from that traditional microcode. The difference is that such
microcode does not execute on fixed hardware facilities. It operates on facilities that
itself ‘designs’ to operate upon.
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 14
The methodology in obtaining a program for the reconfigurable computing platform
is depicted in Figure 6. First, the code to be run on the reconfigurable hardware must
be determined. This is achieved by high-level source code to high-level source code
instrumentation and benchmarking. This results in several candidate pieces of code.
Second, we must determine which piece of code is suitable for implementation on the
reconfigurable hardware. The suitability is solely determined by whether the piece of
code is ‘hardware implementable’. This can be determined manually or
automatically. The end result will be a new program that comprises the following
elements:
Repair code is inserted in order to communicate parameters and results to/from
the reconfigurable hardware from/to the general-purpose processor cores.
VHDL code and emulation codes are inserted to configure the reconfigurable
hardware to perform the functionality that is initialized by the ‘execute code’.
Instead of inserting explicit code into the new program, each piece of code can be
initialized by special ‘ultra complex’ instructions. It should be noted that in the
programming paradigm, software code co-exists in the program with hardware
(implemented in reconfigurable fabric) descriptions.
Now if the code is general-purpose code this leads to several problems: First, it is
quite difficult to extract a set of operations with matching granularity at a sufficient
level of parallelism. Second, inner loops of general-purpose programs often contain
excess code, i.e., code that must be run on the core such as exceptions, function calls,
and system calls. These problems are also being faced and tackled by researchers in
the areas of compiler construction for VLIW architectures and hardware/software co-
design.
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 15
Figure 6 Program transformation methodology for reconfigurable computing
An issue that has not been investigated yet is software levels above the compiler.
Current research uses programming languages such as ‘C’ as specification models.
For embedded systems, code generation starting from more formal and domain-
specific models of computation is a viable alternative. These restricted models of
computation may compile efficiently to reconfigurable processors. Operating
systems aware of the underlying reconfigurability of hardware could context-
specifically control the use of reconfigurable units. This is of special interest for
systems that operate in modes. For example, a handheld device may be used heavily
to encrypt data at a certain time, putting the device computationally in a ‘bit-
manipulation mode’. The next time, the same handheld streams multimedia data,
requiring ‘arithmetic-array mode’.
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 16
8. Applications
The ongoing evolution of the Internet has led to a series of new communication
standards. These changing standards have made it difficult to provide timely
investment and cost planning to design new network equipment. Using reconfigurable
processor enables multiplexing of hardware, which can be instantly reconfigured to
support multiple protocols and communication standards and thereby meet a wide
range of market needs. The telecommunications industry mainly depends on
mathematical calculations of some specific type, and this gives reconfigurable
processors a very big chance to find customers. The following are main areas of usage
Video and Image Processing: This is general 2-dimensional image processing,
heavily used by Discrete Cosine Transform (DCT) in wavelet modeling theory in
cellular phones and base stations, making CDMA available.
FIR Filtering: All telecommunication chips filter signals before sending so that
minimal power consumption can be achieved, heavily used in all areas in
telecommunications.
High speed networking: Network routers and switches operating at multi-gigabit
rates cannot be implemented using existing GPPs due to lower-performance.
ASICs are not deployable since standards keep changing too often. Thus
reconfigurable processors are ideally suited for such applications.
Some other important areas using reconfigurable processors are cryptographic
applications and automatic target recognition (ATR). Hardware cryptography uses
reconfigurable processors, when a cheap solution is needed, still giving results much
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 17
faster than software. ATR is to automatically detect, classify, recognize and identify
an object. It can still be used in many applications.
9. Conclusions
In this paper we have presented an introduction to reconfigurable processors as a
solution to the need for cheaper and flexible computing elements at low power
consumption. In particular we have focussed on the architectural details of how these
processors function, the challenges in designing such computing elements, and the
required software infrastructure to deploy such processors. Lastly we have presented a
brief survey of three different application domains where such processors are highly
useful. In summary we believe that reconfigurable processors are a very promising
alternative to existing processor design paradigm and they are the potential to
transform the way computation is thought about and implemented both scientifically
and business-wise.
Gogte Institute of Technology, Belgaum.
Reconfigurable Processors Page 18
10. References
[1] Marco Platzner and Rolf Enzler, Dynamically Reconfigurable Processors,
page 3-4, 2000.
[2] Rolf Enzler, Architectural Trade-offs in Dynamically Reconfigurable Processors.
Swiss Federal Institute of Technology, Zurich. pages 2-9, 2004.
[3] R. D. Wittig and P. Chow. OneChip: An FPGA processor with reconfigurable
logic. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing
Machines (FCCM'96), pages 126-135, 1996.
[4] S. Hauck, T. W. Fry, M. M. Hosler, and J. P. Kao. The Chimaera reconfigurable
functional unit. In Proceedings of the IEEE Symposium on FPGAs for Custom
Computing Machines (FCCM'97), pages 87-96, 1997.
[5] J. R. Hauser and J. Wawrzynek. Garp: A MIPS processor with a reconfigurable
coprocessor. In Proceedings of the IEEE Symposium on FPGAs for Custom
Computing Machines (FCCM'97), pages 24-33, 1997.
[6] S. Vassiliadis, S. Wong, G. Gaydadjiev and K. Bertels.: Polymorphic Processors:
How to Expose Arbitrary Hardware Functionality to Programmers, IEEE
proceedings, pg (2-3).
[7] http://www.stretchinc.com
Gogte Institute of Technology, Belgaum.
Get documents about "