reconfigarable-processors

Shared by: ashrafp
Categories
Tags
-
Stats
views:
5
posted:
8/25/2011
language:
English
pages:
20
Document Sample
scope of work template
							                     Reconfigurable Processors




Author:                               Co-Author:

Shailesh Kulkarni,                    Naveen Bhat,

6th sem, E&C,                         6th sem, E&C,

GIT, Belgaum.                         GIT, Belgaum.

kulkarni.shailesh@gmail.com           bhatnaveen2005@yahoo.com
                                          Abstract

The computational evolution has been civilizing form the day transistors were first

fabricated. As research progressed in the field of computation, there have been two

directions of thought. On the one extreme we use general-purpose processors that are

totally programmable but are expensive (and relatively slow) and on the other we use

custom circuits called as application specific integrated circuits that are fast and cheap

but are not flexible.

        To bridge this gap we investigate reconfigurable processors as one potential

solution. A reconfigurable processor is a microprocessor with ‘erasable hardware’ that

can rewire itself dynamically. This allows the chip to adapt effectively to the

programming tasks demanded by the particular software they are interfacing with at

any given time. Ideally, the reconfigurable processor can transform itself from a video

chip to a central processing unit to a graphics chip, for example, all optimized to

allow applications to run at the highest possible speed. Its key feature is the ability to

perform computations in hardware to increase performance, while retaining much of

the flexibility of a software solution.


        In particular we focus on the architectural aspects of reconfigurable

processors and their applications in this paper.
Reconfigurable Processors                                 Page 1


Contents



1. Introduction                                           02

2. Reconfigurable Computing Paradigm                      04

3. Reconfigurable Systems                                 06

4. Research Challenges                                    07

5. Integration of Computing Elements                      07

   5.1. Coupling                                          08

   5.2. Instructions                                      08

   5.3. Operands                                          09

6. Reconfiguration Unit                                   10

   6.1. Granularity                                       10

   6.2. Interconnect                                      10

   6.3. Reconfiguration time                              11

7. Programming model and program transformation example   12

8. Applications                                           16

9. Conclusions                                            17

10. References                                            18




Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                   Page 2


1. Introduction




Advances in technology over the past few years have brought us to the brink of an

exciting new discovery - a completely new type of computer with characteristics quite

unlike anything that has been seen before.

For many years the field of computing has been centered on general-purpose

processors (GPPs). Great advances in technology have been made, with current

general purpose Central Processing Units (CPUs) being many orders of magnitude

more powerful than the first ones. The great flexibility of these processors encouraged

investigation of widely varying applications and fostered great advances in software

engineering, with many new leaps being made possible by the continual introduction

of ever-more-powerful processors. They are flexible due to their versatile instruction

sets that allow the implementation of any computation task. Yet all this time both

hardware and software fields were being guided by a Von Neumann-derived approach

to computing and naturally the people involved with computing have developed a way

of thinking which corresponds to the technology, which they are accustomed to using.



Designers of digital systems face a fundamental trade-off between flexibility and

efficiency when selecting computing elements. The available alternatives span a wide

spectrum with general-purpose microprocessors and application-specific integrated

circuits (ASICs) at opposite ends.



ASICs are dedicated hardware circuits tuned to a very small number of applications or

even to just one task. ASICs are mainly used in high-volume embedded system

markets such as telecommunications, consumer electronics, or the automotive

Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                     Page 3


industry. For a given task, dedicated circuits execute faster, require less silicon area,

and are more power efficient than general-purpose architectures. The drawback of

such highly specialized architectures is their lack of flexibility– if the applications

change, a redesign of the ASIC is required.



Besides speed and flexibility, the used area is also an interesting point of comparison.

General-purpose processors (GPPs) try to get a lot of parallelism out of the software.

They utilize notions of super-scalar and pipelined. However, in order to be able to

support these features a lot of control overhead is necessary. But even with a lot of

control the flushing of a pipeline and idle execution units cannot be avoided due to

data dependencies in the instructions. The size of the latest GPPs is simply huge. They

cover a die area of more than 40 millions transistors. On the other hand ASICs are

usually small, since they are application specific and therefore do not need the large

control overhead of GPPs. Thus it is interesting to ponder as to whether, it is now

possible to combine those features to a single device? Or is it possible to have a

flexible architecture on a device that implements some application specific

algorithms? And will this solution be faster than a GPP?



In the last decade, the new classes of reconfigurable computing devices have

emerged, which promises to combine the flexibility of processors with the efficiency

of ASICs. The hardware of reconfigurable devices is not static but adapted to each

individual application. Through hardware customization, reconfigurable devices

potentially achieve a higher efficiency than microprocessors, while the dynamics of

the customization process allow a higher level of flexibility than ASICs.




Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                Page 4




             Figure 1 Tradeoffs between flexibility and performance

Figure 1 outlines the trade-off between flexibility and efficiency as well as the

position of reconfigurable devices compared to processors and ASICs.



2. Reconfigurable Computing Paradigm

Figure 2 sketches the computing paradigms of processors and ASICs, respectively.

Processors have a general, fixed architecture that allows tasks to be implemented by

temporally composing atomic operations, which are provided for example by the

arithmetic and logic unit (ALU) or the floating-point unit. In contrast, ASICs

implement tasks by spatially composing operations, which are provided by dedicated

computational units like adders or multipliers. Reconfigurable computing combines




Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                  Page 5


these computing paradigms by means of reconfigurable hardware structures, which

allow tasks to be implemented both ‘in time’ and ‘in space’.




             Figure 2 Computation of the expression y=Ax2 + Bx + C

The characteristics of the computing paradigms are also reflected in the respective

system compositions, outlined in Figure 3. In the processor case, the instructions

(composed into the program code) define the behavior of the computing element. The

behavior of a reconfigurable device is specified by its configuration. The behavior of

an ASIC is typically hard-wired and does not allow for any dynamic adaptation,

except maybe for some adjustable coefficients.




     Figure 3 System outlines corresponding to the computing paradigms of

                  processors, reconfigurable devices, and ASICs

Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                Page 6




Within the domain of reconfigurable computing two fundamental kinds of (re-)

configurability are distinguished.

   Static or compile-time reconfiguration (CTR) – where the configuration of the

    device is loaded once at the outset, after which it does not change during the

    execution of the task at hand, and

   Dynamic or run-time reconfiguration (RTR) – where the configuration of the

    device may change at any arbitrary moment during run time.

This paper focuses on dynamically reconfigurable devices. Static reconfiguration is

thus not further considered and discussed.



3. Reconfigurable Systems

The enabling technology for building reconfigurable systems was the field-

programmable gate array (FPGA). FPGAs were introduced to the market at the high-

end of programmable logic devices (PLDs) in the mid 1980s. FPGAs consist of an

array of logic blocks, routing channels to interconnect the logic blocks, and

surrounding I/O blocks. SRAM based FPGAs use static RAM (SRAM) cells to

control the functionality of the logic, I/O blocks, and routing. They can be

reprogrammed in-circuit arbitrarily often by downloading a bit stream of

configuration data to the device. While early FPGA generations were quite limited in

their capacity but today’s devices feature millions of gates of programmable logic,

dense enough to host complete computing systems.




Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                 Page 7


4. Research Challenges

The main challenges in designing a reconfigurable processor are

1. The integration of computing elements, processor core and reconfigurable unit,

2. The design of the reconfigurable unit itself, and

3. The hybrid programming model that utilizes both the static and reconfigurable

   units

Projects that rely on fine-grained reconfigurable elements target general-purpose

computing (GP) as application domain, the coarse-grained elements target

multimedia (MM) applications. We will now describe in further detail the different

design challenges involved in the above three steps.



5. Integration of Computing Elements

The interaction between a processor core and reconfigurable unit is a critical aspect

and needs a lot of design space exploration. The architectural integration of these

elements concerns the coupling between logic core and reconfigurable unit, the way

instructions are issued to the reconfigurable unit, and the way operands are

transferred from and to the reconfigurable unit.




           Figure 4 Possible couplings of core and reconfigurable units

Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                      Page 8




5.1. Coupling

The relative position of processor core and reconfigurable unit determines the type of

applications that benefit most from the hybrid architecture. Generally, a tighter

coupling leads to a smaller communication overhead. Loose couplings thus require

bigger amounts of computation assigned to the reconfigurable unit. Couplings can be

classified into the following three main categories that are illustrated in Figure 4:

1. Reconfigurable functional unit (RFU): The reconfigurable unit is integrated into

   the processor core as any other functional unit. Examples are OneChip [3], and

   Chimaera [4].

2. Coprocessor: The reconfigurable unit is part of the processor and placed next to

   the core. Examples are Garp [5], Stretch S5000 series [7].

3. Attached processing unit: The reconfigurable unit is placed outside the processor

   and connected to a memory or I/O bus. Contrary to RFUs and coprocessors, there

   is no extension to the core processor’s instruction set. Example is Triscend.

Most of the past reconfigurable computers use attached processing units and connect

a processor to a number of FPGAs via an I/O bus, e.g., the PCI bus. There is no

instruction set extensions for the reconfigurable unit. In the remaining part of this

paper, we concentrate on RFU and coprocessor approaches.



5.2. Instructions

Both RFU and coprocessor approaches extend the core's instruction set with

customized instructions. The processor core fetches and decodes instructions, and

issues the new instructions to the corresponding units. For RFUs two types of

instructions exist:


Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                   Page 9


   1. Instructions that start the reconfiguration of the RFU and,

   2. Instructions that actually execute the RFU function.

Instructions for coprocessors also include reconfiguration and execution instructions,

but additionally instructions that transfer data and synchronize the core with the

reconfigurable unit are required. Synchronization is required whenever two

computing elements operate concurrently. RFUs can operate concurrently with other

functional units, because the core’s control logic synchronizes activities and controls

access to the register file. Simple approaches for coprocessors force the core to stall

until the execution of the reconfigurable unit has completed. More advanced

techniques allow concurrent operation and synchronize by semaphore-like

mechanisms.

While the RFU approach delivers the fastest interaction between core and

reconfigurable unit, it requires a major redesign of the core. Coprocessors need less

core redesign but can require more effort for synchronization. RFUs are presently

gaining interest for embedded very-long-instruction-word (VLIW) architectures;

where optimized compilers extract parallelism and schedule customized functional

units at compile time.



5.3. Operands

An RFU – as any other functional unit – uses the core's register file to read and write

data. Coprocessors can use several options: First, data may be transferred between

the coprocessor and the core via registers. Second, coprocessors can have access to

the same memory hierarchy as the core including several levels of caches, on-chip

memories, and the external memory interface. Third, to increase the overall memory

bandwidth some approaches equip the reconfigurable units with dedicated memory


Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                     Page 10


ports. While this certainly increases bandwidth, it can also lead to data consistency

problems.



6. Reconfiguration Unit

We will now discuss the design aspects of the reconfiguration unit.

6.1. Granularity

The granularity can be fine-grained or coarse-grained. Fine-grained arrays use logic

blocks with 2-bit to 4-bit inputs and single flip-flops. These structures are well suited

to implement bit-manipulation operations and random logic. Coarse-grained

architectures accommodate 8-bit to 16-bit ALUs and registers and are better suited to

implement regular arithmetic operations on byte and word-sized data found in most

multimedia applications. There is obviously a trade-off involved as most real-world

workloads contain both types of applications. Researchers currently investigate

multi-granular elements that are well suited to implement bit manipulation operands,

but can also be efficiently arranged to suite byte operations. A parameter strongly

related to granularity is configuration size. Given a certain silicon area for the

reconfigurable unit, one can implement many fine-grained elements or less coarse-

grained ones. A large number of fine-grained elements require more configuration

data than a smaller number of coarse-grained elements.



6.2. Interconnect

The structure of the reconfigurable unit is not only determined by the granularity of

the computing elements, but also by their interconnect. Reconfigurable elements are

placed in 2-D arrays. The simplest interconnect connects each element to its four

neighbours horizontally and vertically. Additional buses may exist that connect all


Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                    Page 11


elements in a row and in a column. Today, most interconnects are hierarchically

structured. The reconfigurable unit is divided into compounds, which themselves

consist of several computing elements. Both the compounds that form the unit and

the elements that form a compound use their own interconnect systems. Compounds

can also contain specialized resources, e.g., memory blocks.




6.3. Reconfiguration time

Reconfiguration time is an important parameter that should be kept as small as

possible. It depends on the configuration size and on the location from where the

configuration data has to be read. The clear goal is single-cycle reconfiguration, i.e.,

the whole reconfigurable unit is reprogrammed in a single clock cycle. This requires

the configuration data to be stored on the processor, near the reconfigurable

elements. The data required to configure the reconfigurable unit is commonly

denoted as context. Multi-context reconfigurable processors are able to store several

contexts on the chip. The simplest context fetching mechanism is load on demand.

Single-context as well as multi-context units use this mechanism when a

configuration is required which is not present in the context memory. For multi-

context architectures there are more sophisticated fetching mechanisms. The context

memory can be used as cache, where recently used contexts may be found.

Alternatively, a context can be pre-fetched concurrently to the execution of a

different context. We will now look at the last but important challenge of

programming model.




Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                     Page 12




7. Programming model and program transformation example [6]

Programming models for reconfigurable processors have not yet received sufficient

attention. This will certainly change as the success of these hybrid architectures

strongly depends on reasonable programming models that allow for the construction

of automated code generation tools.

Current commercial programming environments consist of two separate tool flows,

one for software and one for hardware. Processor code and configuration data for the

reconfigurable units are handcrafted and wrapped into library functions that are linked

with the user code. This approach is also used to develop applications for most

research processors. The next steps are compilers that automatically generate code

and configurations from a general-purpose programming language such as ‘C’. Such a

compiler constructs a control flow graph from the source program and then decides

which operations will go into the reconfigurable unit. Generally, inner loops of

programs are good candidates for reconfigurable implementation since these loops are

responsible for a large amount of the total execution time of that application.

In this section, we present the general concept of transforming an existing program to

one that can be executed on a reconfigurable computing platform. The conceptual view

of how program P is transformed into program P’ is depicted in Figure 5. The

purpose is to obtain a functionally equivalent program P’ from program P which

(using specialized instructions) can initiate both the configuration processes and

execution processes on the reconfigurable hardware.




Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                   Page 13




                    Figure 5 Program transformation example.




The steps involved in this transformation are the following:

   1. Identify code ‘α’ in program P to be mapped in reconfigurable hardware

   2. Eliminate the identified code and add code to have ‘equivalent’ code (A)

       assuming that A ‘calls’ the hardware with functionality ‘α’

   3. Show hardware feasibility of ‘α’ in a current technology (e.g., field-

       programmable gate array (FPGA)) and map ‘α’ into reconfigurable hardware

   4. Execute program P’ with original code plus code having functionality A

       (equivalent to functionality ‘α’) on the reconfigurable processor



The mentioned steps illustrate the new programming paradigm in which both

software and hardware descriptions are present in the same program. It should also

be noted that because the only constraint on ‘α’ is ability to implement, it is also

implied that the micro architecture has to support emulation. This implies the

utilization of microcode. We have termed this as reconfigurable microcode (ρ-µcode)

as it is different from that traditional microcode. The difference is that such

microcode does not execute on fixed hardware facilities. It operates on facilities that

itself ‘designs’ to operate upon.




Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                    Page 14


The methodology in obtaining a program for the reconfigurable computing platform

is depicted in Figure 6. First, the code to be run on the reconfigurable hardware must

be determined. This is achieved by high-level source code to high-level source code

instrumentation and benchmarking. This results in several candidate pieces of code.

Second, we must determine which piece of code is suitable for implementation on the

reconfigurable hardware. The suitability is solely determined by whether the piece of

code is ‘hardware implementable’. This can be determined manually or

automatically. The end result will be a new program that comprises the following

elements:

   Repair code is inserted in order to communicate parameters and results to/from

    the reconfigurable hardware from/to the general-purpose processor cores.

   VHDL code and emulation codes are inserted to configure the reconfigurable

    hardware to perform the functionality that is initialized by the ‘execute code’.



Instead of inserting explicit code into the new program, each piece of code can be

initialized by special ‘ultra complex’ instructions. It should be noted that in the

programming paradigm, software code co-exists in the program with hardware

(implemented in reconfigurable fabric) descriptions.

Now if the code is general-purpose code this leads to several problems: First, it is

quite difficult to extract a set of operations with matching granularity at a sufficient

level of parallelism. Second, inner loops of general-purpose programs often contain

excess code, i.e., code that must be run on the core such as exceptions, function calls,

and system calls. These problems are also being faced and tackled by researchers in

the areas of compiler construction for VLIW architectures and hardware/software co-

design.


Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                 Page 15




 Figure 6 Program transformation methodology for reconfigurable computing




An issue that has not been investigated yet is software levels above the compiler.

Current research uses programming languages such as ‘C’ as specification models.

For embedded systems, code generation starting from more formal and domain-

specific models of computation is a viable alternative. These restricted models of

computation may compile efficiently to reconfigurable processors. Operating

systems aware of the underlying reconfigurability of hardware could context-

specifically control the use of reconfigurable units. This is of special interest for

systems that operate in modes. For example, a handheld device may be used heavily

to encrypt data at a certain time, putting the device computationally in a ‘bit-

manipulation mode’. The next time, the same handheld streams multimedia data,

requiring ‘arithmetic-array mode’.


Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                  Page 16


8. Applications

The ongoing evolution of the Internet has led to a series of new communication

standards. These changing standards have made it difficult to provide timely

investment and cost planning to design new network equipment. Using reconfigurable

processor enables multiplexing of hardware, which can be instantly reconfigured to

support multiple protocols and communication standards and thereby meet a wide

range of market needs. The telecommunications industry mainly depends on

mathematical calculations of some specific type, and this gives reconfigurable

processors a very big chance to find customers. The following are main areas of usage


   Video and Image Processing: This is general 2-dimensional image processing,

    heavily used by Discrete Cosine Transform (DCT) in wavelet modeling theory in

    cellular phones and base stations, making CDMA available.


   FIR Filtering: All telecommunication chips filter signals before sending so that

    minimal power consumption can be achieved, heavily used in all areas in

    telecommunications.


   High speed networking: Network routers and switches operating at multi-gigabit

    rates cannot be implemented using existing GPPs due to lower-performance.

    ASICs are not deployable since standards keep changing too often. Thus

    reconfigurable processors are ideally suited for such applications.


Some other important areas using reconfigurable processors are cryptographic

applications and automatic target recognition (ATR). Hardware cryptography uses

reconfigurable processors, when a cheap solution is needed, still giving results much




Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                  Page 17


faster than software. ATR is to automatically detect, classify, recognize and identify

an object. It can still be used in many applications.




9. Conclusions

In this paper we have presented an introduction to reconfigurable processors as a

solution to the need for cheaper and flexible computing elements at low power

consumption. In particular we have focussed on the architectural details of how these

processors function, the challenges in designing such computing elements, and the

required software infrastructure to deploy such processors. Lastly we have presented a

brief survey of three different application domains where such processors are highly

useful. In summary we believe that reconfigurable processors are a very promising

alternative to existing processor design paradigm and they are the potential to

transform the way computation is thought about and implemented both scientifically

and business-wise.




Gogte Institute of Technology, Belgaum.
Reconfigurable Processors                                                Page 18


10. References

[1] Marco Platzner and Rolf Enzler, Dynamically Reconfigurable Processors,

page 3-4, 2000.



[2] Rolf Enzler, Architectural Trade-offs in Dynamically Reconfigurable Processors.

Swiss Federal Institute of Technology, Zurich. pages 2-9, 2004.



[3] R. D. Wittig and P. Chow. OneChip: An FPGA processor with reconfigurable

logic. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing

Machines (FCCM'96), pages 126-135, 1996.



[4] S. Hauck, T. W. Fry, M. M. Hosler, and J. P. Kao. The Chimaera reconfigurable

functional unit. In Proceedings of the IEEE Symposium on FPGAs for Custom

Computing Machines (FCCM'97), pages 87-96, 1997.



[5] J. R. Hauser and J. Wawrzynek. Garp: A MIPS processor with a reconfigurable

coprocessor. In Proceedings of the IEEE Symposium on FPGAs for Custom

Computing Machines (FCCM'97), pages 24-33, 1997.



[6] S. Vassiliadis, S. Wong, G. Gaydadjiev and K. Bertels.: Polymorphic Processors:

How to Expose Arbitrary Hardware Functionality to Programmers, IEEE

proceedings, pg (2-3).



[7] http://www.stretchinc.com




Gogte Institute of Technology, Belgaum.

						
Related docs
Other docs by ashrafp
08juneex
Views: 8  |  Downloads: 0
Blogger (DOC)
Views: 168  |  Downloads: 0
Todd_A_Eaton
Views: 214  |  Downloads: 0
169010
Views: 0  |  Downloads: 0
12-17-2009
Views: 1  |  Downloads: 0
AN ADDRESS READ AT THE PART II OF DAAD
Views: 15  |  Downloads: 0
13259-Stuart-Automatic-Flow-Switch-Datasheet
Views: 239  |  Downloads: 0
ManuelAntonioCostaRica
Views: 2  |  Downloads: 0