Introduction to Reconfigurable Computing Introduction to ...

Reviews
Introduction to Reconfigurable Computing 4 Introduction to Reconfigurable Computing l l l Configurable Computing (CC) Attempts To Increase Performance And Silicon Utilization Efficiency Through Logic Recycling using FPGA and FPGA-like Devices Hardware Algorithms Can Be “Paged” Into/Out Of CC Modules Much As Operating Systems Perform Software Paging Factors Impacting the Performance Õ Logic Speed Õ Speed Of Reconfiguration Õ Flexibility Of Configuration 5 Resource Utilization l Standard Microprocessor Õ Specialized Unit For Each Essential Task Õ Unit Functionality Fixed Õ Idle Units Lower Silicon Utilization Õ Basic Algorithms Fixed Micro Code Address Generation Clock Gen. ALU Cache and I/O Registers FPU l Reconfigurable Processor Õ Each Unit Specialized To Fit Task Õ Unit Functionality Alterable At Run Time Õ Idle Units Reconfigured For New Tasks Õ Basic Algorithms Can Be Tailored To Application 6 FPGAs vs. DSPs l l FPGAs can support multiple memory ports FPGAs outperform DSPs: Õ Parallelism in the algorithm Õ Simple operations in a fixed sequence Õ FPGAs provide greater computational density using less power Õ Large data sets, low resolution (8 - 12 bits) Õ Simple control l DSPs outperform FPGAs Õ MAC operations Õ Complex arithmetic 7 Colt Integrated Circuit Colt Prototype HP 0.5um 3 Metal, PGA-132 (MOSIS) 16 FUs, XBar, DPs 5.5mm x 6.1mm 50 MHz Full-scale device: Stallion 8 2nd Generation Processor-The Stallion l l l l l Successor of the Colt chip Six data ports achieving basic pipelined dataflow control Smart crossbar for the purpose of passing programming and data words to and from data-ports and meshes Two IFU meshes and 4 multipliers Ready for fabrication 9 The Stallion Organization Allocable Resources Programmable Programmable Data Ports Data Ports IFU MESH IFU MESH (computational) (computational) Stream I/O “Smart” Crossbar “Smart” Crossbar Network Network Integer Integer Multipliers Multipliers (allocable) (allocable) 10 Example Sub-Mesh Mapping Port 1 1 Y Pass Valid if Y is valid ~0 Multiplier High Lo w Left Right 0 Pass Load 0 if F2=1 else load valid data 1 Dec Resul t >=0 Valid if Y Pass Output 1 if Select Y Y=0 if F1 Delay F1 Delay Y Y is vali d Delay F1 Y Y Y is valid Delay Y Pass Valid if F2=1 Select Y if  4x4 sub matrix of IFUs  Factorial computation  Demonstrates conditional execution capabilities  Configured in < 30 usec Valid if Delay F2 F2= 1 F2 Delay Port 3 Overflow Left Port 4 Result Right Factorial 11 System Board Layout Features • Each slot contains a single port • Clusters connected using a module to bridge adjacent slots • Bridging extendible to other system boards • System is inherently scalable 12 Crossbar Slot Slot Slot Slot Crossbar Crossbar Crossbar Core Computing Component l l l l XILINX FPGA (currently used in test-bed) Problem: Pipeline processing fast but not readily modified with current ASIC design practice Solution: Colt chip (fabricated and tested) Õ 0.8 um HP CMOS process fabricated by MOSIS Õ Run time configurable Õ 50 MHz clock l Stallion chip (designed but not yet fabricated) Õ Õ Õ Õ Õ 0.5 um HP CMOS process 64 functional units in mesh Dedicated multiplier Six data ports 100 MHz clock 13

Related docs
premium docs
Other docs by gregoria
MINUTES OF SPECIAL MEETING OF SHAREHOLDERS
Views: 901  |  Downloads: 60
Amendment of Real Estate Contract
Views: 393  |  Downloads: 12
Contract for Purchase of Corporate Stock
Views: 404  |  Downloads: 19
Withholdings from distributions
Views: 156  |  Downloads: 1
Anonyme Unternehmer Invitation
Views: 142  |  Downloads: 0
Sale of drugs etc
Views: 176  |  Downloads: 1
Brokerage
Views: 198  |  Downloads: 7
Dealer computer software license agreement
Views: 515  |  Downloads: 28
Ground rents
Views: 809  |  Downloads: 0
AccCrit_Proj3
Views: 157  |  Downloads: 1
Commercial Gross Lease
Views: 593  |  Downloads: 26
Special allocations of profits and losses
Views: 319  |  Downloads: 4