SPREE Tutorial
Peter Yiannacouras April 13, 2006
Processors on FPGAs
You all used FPGAs (ECE241)
Adders 7-segment decoders Etc.
We are putting whole microprocessors on them
We call these soft processors
Hard Versus Soft Processors
Hard Processors
Soft Processor
Made of transistors Costs millions to make
Written in HDL Programmed onto chip
Verilog
Faster Smaller Less Power
Processors and FPGA Systems
FPGAs are a common platform for digital systems
UART
Custom Logic
Soft Processor
Memory Interface
Ethernet
Performs coordination and even computation
Better processors => less hardware to design
We aim to improve soft processors by customizing them
Our Research Problem
Soft processors have worse
Area Speed Power
use to counteract HOW???
Customize the processor’s architecture ie. Intel vs AMD ie. Motorola 68360 vs 68010 HOW????
But are
Flexible
Research Goals
1.
Understand tradeoffs in soft processors
Eg. A hardware multiplier is big but can perform multiplies fast Eg. Bubble sort doesn’t use multiplies, therefore remove hardware multiplier and save on area
2.
Customize it to the application
We developed SPREE, software to help us do both
SPREE System
(Soft Processor Rapid Exploration Environment)
Processor Description ISA Datapath
■ Input: Processor description ■ SPREE System
1. Verify ISA against datapath 2. Datapath Instantiation 3. Control Generation
SPREE
Verilog
■ Output: Synthesizable Verilog
Input: Instruction Set Architecture (ISA) Description
■ Graph of Generic Operations (GENOPs)
■ Edges indicate flow of data
■ ISA ■ Datapath
MIPS ADD – add rd, rs, rt
FETCH
SPREE
RFREAD
RFREAD
ADD
Verilog
RFWRITE
ISA currently fixed (subset of MIPS I)
Input: Datapath Description
■ Interconnection of hand-coded components
■ ISA ■ Datapath
■ Allows efficient synthesis
■ Described using C++
Ifetch Ifetch Reg File Reg File
Mul Ifetch Reg file
SPREE
Mul Mul
Data Mem
ALU
Write Back Data Mem
ALU ALU
Write Shifter Back
RTL
SPREE Component Library
Component Selection
Select by name
Names looked up in library
Stored in cpugen/rtl_lib
RTLComponent *ifetch=new RTLComponent("ifetch"); RTLComponent *reg_file=new RTLComponent("reg_file");
Datapath Wiring Example
Ifetch
rd rs rt offset
Regfile
dst a_reg a_data b_reg b_data writedata
ALU
opA result opB
proc.addConnection(ifetch,"rs",reg_file,"a_reg"); proc.addConnection(ifetch,"rt",reg_file,"b_reg");
SPREE System + Backend
(Soft Processor Rapid Exploration Environment)
SPREE generator (spegen)
Processor Description
Verilog
Benchmarks
Mint MIPS Simulator (simulator/run) Modelsim Verilog Simulator (spebenchmark) Quartus II CAD Software (specadflow)
4. Cycle Count
Compare traces
1. Area 2. Clock Frequency 3. Power
Walking through an Example (see README.txt)
Choose a pre-built processor
cpugen/src/arch lists all the processors
Let’s choose pipe3_serialshift
3-stage pipeline with serial shifter
Using SPREE on a Processor
Generate, benchmark, synthesize
% spegen pipe3_serialshift % spebenchmark pipe3_serialshift
← Generates Verilog
← Runs benchmarks
← Synthesizes processor ← Display results
% specadflow pipe3_serialshift
% specompare pipe3_serialshift
spegen – Generating Processors
Input: Processor description Syntax: spegen
Output:
A folder named after the processor Hand-coded Verilog modules system.v
Generated hookup and control stages per instruction Hazard window/branch penalty test bench for Modelsim simulation
OUT.cpugen
test_bench.v
Benchmarking
Run programs on the processor
Measure time taken till completion Verify functionality
Can do this without knowing anything about the benchmarks themselves
spebenchmark – Benchmarking
Input: Processor implementation Syntax: spebenchmark Output: (ideally)
Cycle counts of all benchmarks
******* Benchmarking pipe3_serialshift ******** Simulating bubble_sort ... Success! Cycle Simulating crc ... Success! Cycle Simulating des ... Success! Cycle Simulating fft ... Success! Cycle Simulating fir ... Success! Cycle ...
count=2994 count=112750 count=5129 count=5077 count=1214
Traces: /tmp/modelsim_trace.txt
Benchmarking – under the hood
C source benchmarks
Compiler (gcc - MIPS)
Binary Executable
Verilog
spebenchmark Mint MIPS Simulator (simulator/run) Compare traces Modelsim Verilog Simulator (spebenchmark)
Trace
applications//mint
Trace
Cycle Count
/tmp/modelsim_trace.txt /tmp/modelsim_store_trace.txt
specompiler - Setup compiler
Choose the path to your compiler (prebuilt)
Default: /jayar/b/b0/yiannac/spe/compiler
GCC 3.3.3, software division GCC 3.3.3, software division and software multiplication
Another: /jayar/b/b0/yiannac/spe/compiler-softmul
% specompiler /jayar/b/b0/yiannac/spe/compiler-softmul
specompiler will:
1.
2.
Compile all benchmarks (and store binaries) Simulate all benchmarks (and store traces) After this point, you can just run spebenchmark
spebenchmark - failure
Shows discrepancy between MINT and Modelsim
******* Benchmarking pipe3_serialshift ******** Simulating bubble_sort ... Error: Trace does not match, Cycle count=381 Discrepancy found at 6800000 ps Modelsim: PC=04000064 | IR=24090001 | 05: 00000000 Mint: PC=040000b8 | IR=8c47004c | 07: 00000064
Clues to where the error occurred
destination register
value being written
spebenchmark - waveforms
Can see any signal within the processor
% sim_gui bubble_sort pipe3_serialshift
Modelsim
LEARN
IT!!!
Quartus Simulator is vastly inferior, and even unusable for our purposes
The Testbench (test_bench.v)
What is it?
The stimulus and monitor for your circuit
And hence it works right away
SPREE automatically generates
Handcoding your own processor means
You have to interface with the test bench Once you have the testbench you can use spebenchmark
Manual Interfacing with the Testbench
Need only 6 wires
To track writes to register file and data mem
test_bench.v regfile_we regfile_dst regfile_data datamem_we datamem_addr datamem_data
Your soft processor
SPREE System + Backend
(Soft Processor Rapid Exploration Environment)
SPREE generator (spegen)
Processor Description
Verilog
Benchmarks
Mint MIPS Simulator (simulator/run) Modelsim Verilog Simulator (spebenchmark) Quartus II CAD Software (specadflow)
4. Cycle Count
Compare traces
1. Area 2. Clock Frequency 3. Power
specadflow – Synthesis
Input: Processor implementation Syntax: specadflow
Performs a “seed sweep”
Average several runs since results are noisy Run several instances of quartus Across several machines in parallel
specadflow Output
Output:
Synthesis results (hidden) Summary output
Started Tue 6:27PM, Waiting for processes: 10.0.0.61 10.0.0.57 10.0.0.56 10.0.0.55 10.0.0.54 10.0.0.51 Finished Tue 6:33PM 1081 Area (LEs or ALUTs) 75.7812 Clock Frequency (MHz) 0.99822 Estimated Energy/cycle dissipated (nJ/cycle) ... Waiting on eda writer
Any Questions?
Technical support, ask me
EXTRAS
Setup/Install
Copy and unpack the SPREE tarball:
/jayar/b/b0/yiannac/spree.tar.gz
Build all the SPREE software
% cd spree % make
Follow instructions in INSTALL.txt If there’s any errors, email me
SPREE Directory Structure
spree
applications
compiler binutils gcc newlib
cpugen
modelsim
simulator
quartus
Benchmarks C source
the cpu generator +
processor descriptions
Verilog simulator
MIPS simulator
synthesis
Setup cluster
Choose the cluster you’re using
aenao – high performance, limited access eecg – any eecg-connected machine
OR % specluster aenao
% specluster eecg
Edit quartus/machines.txt
Put a list of 11 or so good eecg machines