Docstoc

FPGA Implementation and Design Considerations

Document Sample
FPGA Implementation and Design Considerations Powered By Docstoc
					FPGA Implementation and Design Considerations

John Archambeault

Agenda

Whats in a FPGA FPGA as Emulation Platforms FPGA as Production FPGA Implementation FPGA vs ASIC Cost Benefit Analysis

Agenda Overview
1. What is in an FPGA
● Slices, Block RAMs, PLLs, IO

2. FPGA as Emulation Platform
●
● ●

Goals, Partitioning, Clock/Reset Issues
Implementation Tricks, FPGA Features Suggested layouts, Organization, Features

3. FPGA as Production Element

4. FPGA Implementation Issues
5. FPGA vs ASIC Cost Benefit Analysis

Agenda
1. What is in an FPGA
2. FPGA as Emulation Platform 3. FPGA as Production Element 4. FPGA Implementation Issues 5. FPGA vs ASIC Cost Benefit Analysis

FPGA Guts
●

●
●

●

●

●

●

● ●

Main unit is the CLB. 1 CLB contains 4 Slices, 1 Slice contains 2 4-input LUTs. 4-input LUT is realizes as 16x1 RAM. 4-input LUT can be any logic gates from 4 input. Half of LUTs in Slices can be used as RAM (distributed RAM) Some dedicated Carry logic (good for larger adders) 2 FF The MUXes are used to combine CLBs.

FPGA Guts
●

●

● ●

●

●

Block RAM  Dedicated RAM devices  Can save lots of space  Usually organized in 4Kb or 16Kb quanta DCMs  Digital Clock Manager (DLLs, Phase Aligner) Multipliers IOB: Many different configuration Tristate Buffers (some families) Long Lines

FPGA Guts (Two Families)

Agenda
1. What is in an FPGA
2. FPGA as Emulation Platform 3. FPGA as Production Element 4. FPGA Implementation Issues 5. FPGA vs ASIC Cost Benefit Analysis

FPGA as Emulation Platforms
●

●

●

●

Primary purpose is to emulate a design, as part of verification flow of an ASIC Excellent for FW development / flushing out Architectural Issues. To date, probably the most comon use of FPGAs. Usefulness is directly proportional to absolute similarity between FPGA and ASIC. Does FPGA = ASIC?
  

Design partition Clock Speed Size of FPGA

Design Partitioning
Smart partitioning of the design in FPGA will lend itself to work well for an ASIC. ● Break the design down to its component parts. ● For the most part if FPGA is big enough/ fast enough, you can just synthesize design if you organize correctly. ● If it is too big and you need multiple FPGAs, be aware that you’ll need another level of hierarchy.

FPGA Organization 1
• Keep Clock / Reset generation separate from rest of code
– Clocks require PLLs, Clock buffering resources. The availability composition of each differs widely between FPGA / ASIC (and even between different libraries in ASIC). – Resets also require custom buffering, with different implementation in ASIC / FPGA.
Clock Generation Reset Generation

Core Logic

FPGA Organization 2
• Separate pads from main design
– Pads also change from library to library (ASIC / FPGA). – Use generic pad signals:
• xxx_in (input) • xxx_out (output) • xxx_oe (output enable)

Core Logic

_oe _out _in

pad

FPGA Organization 3
• In some cases, it may make sense to use an external chip to emulate a piece of 3rd party IP:
– Large RAMs – Standard Processors (ARM, 8051) – Analog IF (Eth Phy)

Core Logic
AHB

ARM926

• Anything that is too big / fast / complicated for FPGA AND there is no value add in manually instantiating. • You must know what your product / value add are to define what you should emulate.

MII

Eth Phy

FPGA Organization 4
Stitch It All Together

FPGA Implementation
Clock Generation Reset Generation

Core Logic
AHB

ARM926

MII

Eth Phy

FPGA Organization 5
If you do it right, you can share 100% design with ASIC. If you do it wrong, you must redesign, doubling your work and negating purpose of emulation.

?
FPGA Implementation
Clock Generation Reset Generation

=
AHB

ASIC Implementation
Clock Generation Reset Generation

Core Logic
AHB

Core Logic

ARM926

ARM926

MII

MII

Eth Phy

Eth Phy

Clock Speed
●

You want to pick a speed that will allow you to test as much as you can while emulating the function of the chip. Things to keep in mind:




You want the same design that works in ASIC to work in FPGA. If you must change design to fit FGPA timing, make it permenant (i.e. change on ASIC too). Sometimes best to just div by 10.



Allows interfacing with prev gen of some interfaces (10/100Mhz, PCI, etc.) Easy to switch between emulation speed and designed reality.

Clock Speed Examples
Between FPGA and ASIC Designs
Simple designs, all Reg in, Reg out

FPGA
Virtex 4, XC4VLX25

130nm ASIC
NEC library

FPGA / ASIC (ratio)
5.79 2.85

32-bit Adder 32-bit Compare

6.66ns, 150Mhz
2.55ns, 281Mhz

1.15ns, 868Mhz
894ps, 1.12Ghz

32-1 Mux

5.67ns, 176 Mhz

952ps, 1.05Ghz

5.96

So, 10X is good enough with nice margin

Design Size
● ●

●

Best to get entire design into one FPGA. Biggest (currently) is Virtex 5 which coontains effectively > 100K Slices (i.e. ~200K LUTS/FF), or around 2Mil Gates. This is about enough for most basic applications. But, if your design is too big, we must use multiple FPGAs, which makes things very difficult.

Multiple FPGA Issue 1
Clock skew between FPGAs • Very similar to problem inside the chip, but no centralized database (board + FPGA) = ? • Can be solved using FPGA components to generate clocks (DLLs). • But, still can be tricky to sync.

FPGA clk xtal

FPGA skew_clk

Multiple FPGA Issue 2a
I/O Concerns • Recall that as FPGA size increases, relative I/O shrinks. Disk • Especially bad if have a Drive few “central” blocks like DMA, processors, etc. • May have to route “through” other FPGAs or MUX ports (shown in next slides).

FPGA0 (DMA)

FPGA1

FPGA3

Multiple FPGA Issue 2b
I/O Concerns • Using MUXes to increase pin IO can be done in two ways. On one hand, you can either slow down FPGA0 the FPGA core by same factor as mux (i.e. 8-1 mux requires a core Core Logic clock running 8 times slower, or more for overhead). • Or you make architectural decision that mux is acceptable and reduced throughput is OK Clk2 (i.e. restricting an interface to (1MHz) only one transaction every 8 cycles). This can be Clk1 advantageous. “Full” signals (10MHz) work better, self throttling reduces design requirements. But, muxes should stay in final product.

FPGA1 Core Logic

Clk2 (1MHz) Clk1 (10MHz)

Multiple FPGA Issue 2c
I/O Concerns • Routing through FPGAs can cause many problems:
– What was a short, combinatorial path is now long / with unknown / hard to predict setup/hold time. – You can break timing path with FF, but again, difficult to ensure timing and may break emulation. – Stealing from one interface to feed another.

FPGA0 tx_a

FPGA1 tx_a

FPGA2 rx_a Logical Actual

FPGA3 rx_b

Multiple FPGA Issue 3
FPGA Bringup • Several FPGAs are usually programmed serially, so one comes up before the others. Serial This can cause problems. PROM • Again, need to synchronize dout reset and ensure that FPGAs coming alive at different cclk times is handled correctly. • Depending on circumstances, there may be seconds between begin and end FPGA configuration

FPGA0

FPGA1

FPGA2

Emulation != Verification
Verification: ● Best way to verify functionality (does DUT perform as expected) ● Excellent at targeted corner cases (what happens when). ● Much easier to debug (trivial to look at new signals). ● Much easier to replicate (0 randomness). ● Requires support (verification team / good designers)

Emulation != Verification
Emulation: ● Best at finding out if long term issues exist (flow control, FIFO depths). ● Only practical way to determine if product will run for hours at a time. ● Good at discovering (very late in the game) architectual holes. ● Usefulness is still directly proportional to how similar the FPGA is to ASIC. ● Have all the difficulties of implementation of FPGA to deal with, along with design debug.

Simulation != Reality
●

Things are not always what the seem
 

RAM in simulation can have different timing compared to FPGA Because it works in sim, doesn't mean it will work in reality
●

●

Flowcontrol biggest differential Data stream can be different

●

Best verification practice:
  

Random Data Random Time Random Flowcontrol
●

Buy / Build small library of tools
  



Prove you can come out of reset cleanly!!!

Random Ranges Random edges (64 vs 1512) Self checking tests

FPGA Emulation Summary
● ●

● ●

Great addition to Verification Best way to detect issues that take a long time to trigger Excellent platform for FW/SW development Usefulness is proportional to similarity to final ASIC

Agenda
1. What is in an FPGA
2. FPGA as Emulation Platform 3. FPGA as Production Element 4. FPGA Implementation Issues 5. FPGA vs ASIC Cost Benefit Analysis

FPGA as Product
●

●

●

●

FPGA cost reductions are making them more and more attractive as the target for a final product, i.e. “taping out” with an FPGA. FPGA prices are getting competative with ASIC. FPGA allows for “tricks” that are horribly illegal in ASIC, but acceptable in FPGA. These can decrease design time, and increase functionality. Plus adding interesting features

FPGA as Product Features
• Allows for in field upgrades
– Allow for bug fixes after the fact (ala Microsoft) – Adding new features is possible – Depending on design, you might be able to have feature rich design on a per user basis
Proc image Flash Image 1 Flash Image 2

• Design Issue
– In field upgrade is tricky. If bad image gets loaded, product is dead, probably permenantly – Need 2x memory, ext processor and a way to check that FPGA image is good (is CRC enough? Need functional check?, etc.) – Next generation of FPGA will have both internal FLASH AND ability to boot from multiple images, i.e. if first fails.

Ext. Proc. load check reprog FPGA0 reprog

FPGA as Product Features
●

FPGAs have UIDs
 

Critical to Security / Encryption Next generation will have 64 bit unique IDs. PLLs Block RAMs Different IO configurations Without these, backend is much more complicated and requires senior experience level to ensure no problem.

●

Plus mainstay FPGA IP
   

FPGA as Product Available IP
FPGAs have prebuilt IP that can speed up development ● Built in Ethernet MAC, PCI IP ● Available IP includes
   
●

FEC: ReedSol,TurboCodes, etc. Memory Interfaces (DDR, SDRAM) ADPCM codes As well as generic blocks (RAMs, FIFOs, etc.)

●

●

They can be used to decrease design time or to simply experiment with. Remember: Recreating the wheel will not add value to your company or product. Typically, FPGA IP cannot be used in ASIC.

Agenda
1. What is in an FPGA
2. FPGA as Emulation Platform 3. FPGA as Production Element 4. FPGA Implementation Issues 5. FPGA vs ASIC Cost Benefit Analysis

FPGA Implementation
●

One choice is to design for ASIC, and implement in FPGA, let the synthesis tool organize everything for me. More reproduceable down the line, etc.
Another choice is to use unorthodox FPGA only implementations to further the amount of functionality you can squeeze in FPGA. However, these implementation tricks will never work in ASIC.

●

FPGA for ASIC? Pros vs. Cons
Pros ● FPGA development is quicker. ● FPGA chips are cheaper (for low quantity). ● Since designing for DFT/scan is not issue, don’t have to obey good design practices. ● FPGA synthesis tools are cheap (free) and don’t require additional backend tools.
Cons ● Code tends to be parasitic, hard to change later, making it very difficult to ever go to ASIC.

Unorthodox FPGA Implementations
●

MUXes
  

●

Chip Wide Reset for “Free”

 

FPGA support large muxes Large muxes can use tristate buffers for savings. Very useful for HWDBG.

●

Using many small, tiny RAMs (distributed RAMs in LUTs) instead of FF arrays
 

All FPGA FF start out with a known value defined in bitstream. Can save reset routing resources by hooking into GSR I've never gotten it to work
This gives you the equivalent of 16 FF / LUT, much less than the ~1 real FF / LUT you normally get. But, you must code a generic “regfile”

More Unorthodox FPGA Implementations
●

Throw memory at it
 

●

AsyncFIFO
  

FPGAs use block RAM in X quanta (4kb, 16kb, etc.) and designers tend to allocate in that quanta. Becomes a full rearchitecture if go to ASIC (or even change design). Just use dual port BRAM. Saves a lot of time Easy fix for cross clock timing issues.

FPGA Debug
● ●

Emulation requires significant debug It is hard to look at FPGA from outside and debug it (ergo simulation)
 

Having the ability to run same test in sim as in reality is great. Self testing RTL / Known test patterns can be life savers.

●

Need some kind of visibility into FPGA (too look at internal nodes)


HWDBG vs Chipscope
●

●

BUS is big, but works in ASIC Chipscope is FPGA only, theoretically get it for free (cost of gates).

●

Cannot have enough debug regs/LEDs
 

Can be very hard to look at registers real time RS232 & TTERM work well (cheap)

Better Design Practices
• One FPGA Image to Rule them All
– Add rev mux to support one image – Sits between core and “pads”, easy to remove / hardcode. – Allows backwards compatibility, without extra effort.
FPGA Rev 2.0 Func. Rev Detect board revid reset

dbg_out led1_out FPGA Rev 1.0 Func.

pin10_out

• Autogenerate REVID
– I like “mmddhhmm” in ~BCD.

Here we changed boards and had to sacrifice an led for an extra dbg pin. With simple block to detect board revision, we can have one FPGA image support multiple boards.

FPGA Board Organization
●

As part of Implementation, lets talk about some Board Issues / Suggestions.

FPGA Board Organization 1
• HWDBG
– A 32-bit or 64-bit bus that runs through all FPGAs. – Eventually connects to mictor or other high density port – Good for hooking up to logic analyzer and seeing state machine transactions. – Can be used in FPGA and ASIC
MICTOR HEADER

FPGA Board
FPGA0

FPGA1

block_a

Other FPGA block_b

block_c

FPGA Board Organization 2
●

Chipscope
   

Similar to hand generated HWDBG, but is Xilinx tool. Uses gates, RAM to store vectors internally so they can be piped out at a later time. Seems to use a relatively large number of gates to implement. Never used, but I’m not a big fan.

FPGA Board Organization 3
• RS232 is simple, quick, fast
– Connect to newer vesions of Terra TERM (free, with reasonably good scripting) – Build up a simple Read/Write bus inside chip.
• Simple Bus = Easy Debug = Easy adapting to other interfaces.
FPGA1

Simple, 2-wire Interface RS232 Chip (max232)

~ $0.90

– Definitely is the lower end of spectrum, but the ability to plug into any PC is pretty nice.

DB9 connector

~ $0.49

FPGA Board Organization 4
• Board Revision
– Very useful for visualizing the board revision and implementing “one image to rule them all” methodology based off of Board Rev. – Can be very cheap / simple. – If implemented as shown, you can indicate a board rev just by modifying the board (removing a resitor). – Using Grey Scale encoding can minimize resistor changes.
FPGA
vcc vcc vcc vcc

FPGA Board

rev[0] rev[1] rev[2] rev[3] By pop or nopop zero Ohm resistors, you can change the board rev id.

0 Ohm R (short)

FPGA Board Organization 5
• Debug Area
– Collect all Debug into one section, separated by perforation. – If demo / non-debug situation arises, break (or saw) off perforation, thus breaking off debug area. – Great for convincing bad managers to let designers add debug.
FPGA0

FPGA Board
FPGA1 Proc

LED Bank MICTOR HEADER

RS232 Chip

DIP Switches

DB9 connector

FPGA Board Organization 6
• Dog Bones
– Are an older idea, but still relevant. – Allow for both probing of midFPGA signals and re-routing of FPGA signals (with board mod). – It consists of a pair of vias and a surface trace. If needed, traces can be cut and wires dropped in via to reroute the signal. – Good for debug board, where size is not issue.
FPGA0

Via

Surface Trace

FPGA1

FPGA Board Organization 7
●

Off Board Support
 

Adding an additional mictor header or other generic header is a great way to provide extra interface to more FPGAs, etc. Basically, there is no reason NOT to route unused FPGA pins to some header. Especially on a debug board.

Agenda
1. What is in an FPGA
2. FPGA as Emulation Platform 3. FPGA as Production Element 4. FPGA Implementation Issues 5. FPGA vs ASIC Cost Benefit Analysis

FPGA vs ASIC Cost Benefit Analysis
●

Lets take a semi-nonsensical product
   

Spread Spectrum cordless phone system with special encryption, custom built for lawyers office. Need ~5K units, maybe more Use home built Encryption Algorithm (our Value Add) Use Reed Solomon Forward Error Correction (RS FEC)

●

●

●

We work for a larger company, which already has a SS Modulator and Demodulator in house, so no need to redesign. We’ll probably need to buy FEC. No need for onboard processor

Estimated Gate Count
Est. Gate Counts ● 30K for Mod, 70K for Demod ● Encryption/Decrypt is ~200K ● We don’t know anything about FEC
  
●

Total estimated gate count is ~350K gates (including ~20% overhead for routing/gates/etc.)

Turns out Xilinx IP exists, and we can put it in our FPGA. Xilinx FEC RS Enc/Dec = 122/826 Slices ~= 4K+7K gates ~=11K gates Purchase IP from Xilinx or ASIC vendor (Xilinx is about $5K, ASIC vendor is similar).

Manpower Count
●

Human resources:
   

●

Total of 8 man team
   

2 Encryption Algorithm Engineers 3 Design / Implementation 1 FPGA / Board Designer 2 Verification

●

Tools are expensive:

Verification (3 seats, couple years ~=$90K) About 10 high-end PCs ~= $15K Lab Equipment is to be provided by Company. Total cost =~ 1mil / year

FPGAvsASIC ASIC Costs1
●

Typical “cheap” process currently is 0.13um.
 

Costs about $520K to go from RTL to mask set Takes about 4-5 months without problems
● ●



Chip comes back and is ~$2 a part (2mmx2mm)

Timing closure can be tough Production is being done in parallel with verification and sometimes design!

FPGAvsASIC ASIC Costs2
●

Typically, ASIC will take more time

● ● ●

Typically one full engineer time spent interfacing with vendor / verify backend. Tool issues (my synth report != your synth report) Assume 1 person / year, then this is 100K. Tools are expensive, add another 50K even if using ASIC flow.
Verifying / tracking new builds Identifying timing paths (are they real?) Getting low level IP (Pads, PLL, eg.)

  

●

Also, since tape out is a one shot event, the Verification must be solid. It make financial sense to add another verification engineer to ensure it works. (if you dont get it right, chip is dead)


Assume verification, this is at least 1 person, and assume 1 year. ~$100K.

FPGAvsASIC FPGA Costs
●

We can easily fit in Spartan3e 1200
  

● ●

SPI Flash is cheap (~1$). So, per cost “Chip” is ~17$.

Spartan 3e 1200 = ~30K LUTs + 30K FFs Equivalent gate count = ~120K + 210K Gates Assume 100K+ quantities, get ~$16.

Final Costs
●

● ● ● ●

●

Extra Verification: 100K Backend Guy: 100K Backend Tool: 50K Chip NRE: 520K Total Extra for ASIC: 720K Get ASIC @ $2

●

Get FPGA @ $17

● ●

At ~50K units, they break even So how many are we making?

ASIC vs FPGA Pro/Con
ASIC ● Risk chip is a brick

●

Best case, can make metal fix, ~80K.

●

●

Never really sure about timing. Long term is always the cheapest solution. Can add exotic features (fused ids, power, analog integration)

FPGA ● Know exactly what you have ● Can upgrade in field ● Always more expensive in long run. ● Stuck with what is in FPGA (no integration of PHY)

ASIC vs FPGA Pro/Con2
ASIC FPGA ● Can take a lot longer ● FW/RTL dev can go to get an ASIC done on in parallel ● Initial NRE is large, ● Cheap NRE and requires a lot of ● FPGA tools can be cash on hand. free, but get what ● Requires toolset (at pay for. least synth) ● May not hit speed in ● Easily 10x faster FPGA than FPGA. ● Can get stuck with design for FPGA but want to ramp up to ASIC.

Summation
●

FPGA Emulation can be a very powerful, quick method of decreasing verification / design time.
FPGA Emulation is no substitute for ASIC verification.

●

●

FPGAs can be used in a final product, especially if you are cash poor OR have limited product numbers (i.e. startup)

Summation
●

FPGA targeted coding can increase utilization of FPGA, but may limit design in future. FPGAs Emulation should be architected correctly. Pay attention to critical details:
  

●

Organization of Design Size of designs and Number of FPGA Clock Speed Design

●

If done right, Emulation Design can be directly reused for ASIC Design.

Couple War Stories
●

Opencores
Easics
    

Worth every penny I spent on it Simple SPI, I2C to complex MAC Quickest, simpliest CRC gen ever. Great idea Perpetual “will work in next rev” Should be 100% automated Is never 100% automated Not worth the time

● ●

Partitioning / Incremental Synth
FPGA build process Assertions
  

●

●

Couple War Stories
●

Know the design
  

Power, Size, Cost, Functionality Simple usually works. Complicated is irritating / pointless. Assume year long design
●

Would you rather be
 

 

Remeber, functionality is paramount “The performance of a non functional system is zero.” - Eric Rose

50% smaller Done 3 months earlier


				
DOCUMENT INFO