The CMU Reconfigurable Computing Project
Document Sample


The CMU Reconfigurable
Computing Project
April 9, 1999
Mihai Budiu
mihaib@cs.cmu.edu
SSS 4/9/99 CMU Reconfigurable Computing 1
Current Project Members
CS Department ECE Department
Herman Schmit
Seth Copen Goldstein Srihari Cadambi
Mihai Budiu Matt Moe
Robert Taylor
Ronald Laufer
SSS 4/9/99 CMU Reconfigurable Computing 2
Why Study Reconfigurable Hardware?
It is a nice computation paradigm
(wire your own computer)
SSS 4/9/99 CMU Reconfigurable Computing 3
Why Study Reconfigurable Hardware
Algorithm Year System Versus Speedup x
DNA matching 1992 SPLASH 2 SPARC 10 4300
FIR Filter 1998 PipeRench UltraSparc 90
300Mhz
IDEA Encryption 1998 PipeRench UltraSparc 61
300Mhz
SAT solver 1997 Pamette SPARC 5 17--1100
110Mhz
Ray Casting 1995 RIPP-10 Pentium 33.8
75Mhz
Hidden Markov 1996 1 Xilinx FPGA SPARC 10 24.4
Model
DES Encryption 1996 GARP UltraSparc 24
170Mhz
SPEC92 1994 MIPS+RC MIPS 1.22
SSS 4/9/99 CMU Reconfigurable Computing 4
Commercial Players
Source: In-stat April 1998
*Does not include software, hardwire or support EPROMs
SSS 4/9/99 CMU Reconfigurable Computing 5
What Is “Reconfigurable Hardware?”
Interconnection
network
Universal gates
and/or
storage elements
Switches
SSS 4/9/99 CMU Reconfigurable Computing 6
Basic Ingredient: RAM cell
0
a0 0 data a0
0 a1 & a2
a1
a1
1
Universal gate = RAM
SSS 4/9/99 CMU Reconfigurable Computing 7
Basic Ingredients (ctd)
1 0
1
1
A switch is controlled by a 1-bit RAM cell
SSS 4/9/99 CMU Reconfigurable Computing 8
Outline
• What is reconfigurable hardware
• RH vs other computation paradigms
• Challenges in RH research
• PipeRench: the CMU project:
– the hardware
– the software
• Conclusions
SSS 4/9/99 CMU Reconfigurable Computing 9
RH vs ASICs
• Generally Application-Specific Integrated Circuits
will be faster than RH:
– RH wires are slow & big
– RH bit-slices are costly to interconnect
– RH devices must store configuration on the chip
but
• RH can be reprogrammed
– new algorithms
– to fix bugs
• RH cheaper in small production
• RH tolerates faults better
• RH sometimes faster with staged computation
SSS 4/9/99 CMU Reconfigurable Computing 10
RH vs Microprocessors
• RH less flexible (like a VLIW with fixed
instructions)
but
• RH provides more (customized)
computation elements
• RH can decrease memory traffic
• RH can be tailored for specific algorithms
and data types
RH will not replace mP, but complement them
SSS 4/9/99 CMU Reconfigurable Computing 11
Types of RH
• FPGAs: bit-level logic functionality
(the basic processing elements compute on 1 bit)
• word-based architectures: PipeRench (CMU)
(basic PE operates on 8 bits)
(basic PE is a small ALU)
• coarse architectures: RAW (MIT)
(basic PE is a MIPS 2000 core)
SSS 4/9/99 CMU Reconfigurable Computing 12
RH In A System
Tit le:
(coupling)
Creator:
(FrameMak er 5.5 PowerPC: Las erWrit er 8 8. 5. 1)
Prev iew:
This EPS pict ure was not sav ed
wit h a prev iew inc luded in it .
Comment:
This EPS pict ure will print to a
Post Sc ript print er, but not t o
ot her t y pes of print ers.
SSS 4/9/99 CMU Reconfigurable Computing 13
Challenges In RC
• Software tools:
– Programming RC like software development
– Automatic compilation from HLL
– Automatic program partitioning
• Mapping efficiently algorithms (no ISA)
• System issues
– interfaces
– find “ideal” RC fabric
SSS 4/9/99 CMU Reconfigurable Computing 14
The CMU Reconfigurable
Computing Project
SSS 4/9/99 CMU Reconfigurable Computing 15
Hardware Goals
• To build a complete reconfigurable
hardware device
• To build the system integration hardware
• To host the device in a PC
SSS 4/9/99 CMU Reconfigurable Computing 16
Our Device:
• Word processing elements
• Pipelined architecture
• Virtualized hardware
• Local interconnection network
• Wide pipelined bus
SSS 4/9/99 CMU Reconfigurable Computing 17
Configuration
memory Data & Config
controller
Stripes
Processing
elements
SSS 4/9/99 CMU Reconfigurable Computing 18
Hardware Virtualization
Actual available
hardware
Instructions
currently in hardware
Instructions paged out
SSS 4/9/99 CMU Reconfigurable Computing 19
Hardware Virtualization (2)
Page out
compute
compute
Program in
compute
configuration
configure memory
Page in
hardware
Overlap configuration
with computation.
SSS 4/9/99 CMU Reconfigurable Computing 20
Processing Elements
a
b
Cin
PE2 PE1 PE0
out
• Look-up table
• Any 3-to-1 function
SSS 4/9/99 CMU Reconfigurable Computing 21
The Interconnection Network
P*B bits
Word-level cross-bar
0
B bits
PE N PE PE 1
Pass Registers
P*B*N bits
SSS 4/9/99 CMU Reconfigurable Computing 22
The PCI Board
Tit le:
chip. eps
Creator:
f ig2dev Vers ion 3.2 Pat chlev el 0-bet a3
Prev iew:
This EPS pict ure was not sav ed
wit h a prev iew inc luded in it .
Comment:
This EPS pict ure will print to a
Post Sc ript print er, but not to
ot her t y pes of print ers.
SSS 4/9/99 CMU Reconfigurable Computing 23
Software Goal
To program reconfigurable devices using the
standard software development processes:
Java
– Compile C or Java
– Do it quickly Partitioner
Data-flow Intermediate
Language
DIL
Built
Configuration
Reconfigurable HW CPU
SSS 4/9/99 CMU Reconfigurable Computing 25
Building Circuits From DIL
a = b + c * d; b c d
e = c - d;
*
• variables wires + -
• operators gates
a e
SSS 4/9/99 CMU Reconfigurable Computing 26
Mapping Circuits To
a b c
a b c +
a b c
-
+
+ -
-
a b c
+ -
SSS 4/9/99 CMU Reconfigurable Computing 27
The DIL Compiler Front-End
Circuit
Parser
Dil
Evaluator Backend
input file
Loader
Loader
component Component
library circuits
SSS 4/9/99 CMU Reconfigurable Computing 28
The DIL Compiler Backend
Circuit
Circuit
(expanded) Circuit (placed)
Placer-
Front-end Optimizer
Router
The whole compilation process is
Code generator
very fast (compared to classical
CAD tools).
We can compile two orders of xfig C++ Asm
magnitude faster.
SSS 4/9/99 CMU Reconfigurable Computing 29
Processing Element Size Tradeoffs
Small Big
Efficient usage Wasteful
Slower Faster bit-slice
Flexible interconnect Coarse routing
Bigger configuration Fewer configuration bits
Place and route easier Constrains the compiler
SSS 4/9/99 CMU Reconfigurable Computing 30
Stripe Width Tradeoffs
Wider Narrower
Fewer stripes More will fit
Virtualize more Fewer page-ins
Bandwidth waste Less bandwidth available
Placer freedom Placement constrained
SSS 4/9/99 CMU Reconfigurable Computing 31
Bus Width Tradeoffs
Wider Narrower
More area Less area
High bandwidth Time-mux bus
SSS 4/9/99 CMU Reconfigurable Computing 32
Clock Speed Tradeoffs
(run-time)
Faster Slower
Short critical path Big chains
Long pipeline built Compact circuits
Decomposition overhead Little decomposition
Virtualized more Less virtualized
24 24
24 24
8 8 + +
+ 24
+ 8
24
SSS 4/9/99 CMU Reconfigurable Computing 33
Configuration Bits per Stripe
PE bit width
2 4 8 16 32
1600
1400
Configuration Bits
1200
1000
800
600
400
200
0
64 80 96 112 128 144
Stripe Width
SSS 4/9/99 CMU Reconfigurable Computing 34
Title:
(fir-throughput.eps)
Creator:
Adobe Illus trator(TM) 7.0
Prev iew:
This EPS pic ture was not sav ed
with a prev iew inc luded in it.
Comment:
This EPS pic ture will print to a
PostSc ript printer, but not to
other ty pes of printers.
SSS 4/9/99 CMU Reconfigurable Computing 35
Project Status
• Operational:
– Behavioral and structural models of Piperench
in Verilog
– Assembler, simulator
– Tools for visualization and debugging
– One tile fabricated and tested
– Very fast compiler from intermediate language
• In work:
– Prototype PipeRench to be taped this summer
– PCI board to host PipeRench in a PC
SSS 4/9/99 CMU Reconfigurable Computing 36
Simulated Speed-up vs. UltraSparc @ 300Mhz
1000.0
328.8
90.9 76.1
100.0 61.8
29.0 26.0
20.6
10.0
1.0
ATR Cordic DCT FIR IDEA Nqueens Over
SSS 4/9/99 CMU Reconfigurable Computing 37
Future Work
• Build the PCI board
• Build the OS device drivers
• Start investigating HLL issues:
– automatic partitioning
– translation to DIL
– special code transformations
SSS 4/9/99 CMU Reconfigurable Computing 38
Conclusions
• A set of important applications can benefit from
RC devices
• RC offer potential for substantial performance
improvement at a low cost
• RC devices will soon be mainstream U
in the embedded computing world; V
perhaps in the future they will also R
permeate the desktop Pentium V
SSS 4/9/99 CMU Reconfigurable Computing 39
Get documents about "