VIEWS: 26 PAGES: 81 POSTED ON: 12/11/2011
FPGA Implemented Transforms Final Report May05-31 Advisor and Client: Arun K. Somani Team Members: Christopher Miller ( CprE & EE ) Sean Casey ( CprE & EE ) Ibrahim Ali ( EE ) Chii Aik Fang ( EE ) REPORT DISCLAIMER NOTICE DISCLAIMER: This document was developed as a part of the requirements of an electrical and computer engineering course at Iowa State University, Ames, Iowa. This document does not constitute a professional engineering design or a professional land-surveying document. Although the information is intended to be accurate, the associated students, faculty, and Iowa State University make no claims, promises, or guarantees about the accuracy, completeness, quality, or adequacy of the information. The user of this document shall ensure that any such use does not violate any laws with regard to professional licensing and certification requirements. This use includes any work resulting from this student-prepared document that is required to be under the responsible charge of a licensed engineer or surveyor. This document is copyrighted by the students who produced this document and the associated faculty advisors. No part may be reproduced without the written permission of the senior design course coordinator. April 27, 2005 Table of Contents 1. List of Figures ......................................................................................... iv 2. List of Tables ........................................................................................... v 3. List of Symbols and Definitions ............................................................. vi 4. Introductory Material ............................................................................... 1 4.1 Project Description ..................................................................................... 1 4.2 Executive Summary ..................................................................................... 1 4.2.1 Need for the Project .................................................................................... 1 4.2.2 Actual Project Activities ............................................................................. 2 4.2.3 Final Results................................................................................................ 2 4.2.4 Recommendations for Follow-On Work .................................................... 2 4.3 Acknowledgements ...................................................................................... 2 4.4 Problem Statement ...................................................................................... 3 4.4.1 General Problem Statement ........................................................................ 3 4.4.2 General Solution Approach......................................................................... 3 4.5 Operating Environment .............................................................................. 3 4.6 Intended Users ............................................................................................ 3 4.7 Intended Uses .............................................................................................. 3 4.8 Assumptions ................................................................................................ 3 4.9 Limitations .................................................................................................. 4 4.10 Expected End product and Other Deliverables .......................................... 5 5. Project Approach and Results .................................................................. 6 5.1 Functional Requirements ............................................................................ 6 5.2 Design Requirements .................................................................................. 6 5.2.1 Design Constraints ...................................................................................... 7 5.3 Approach Used............................................................................................ 7 5.3.1 Technical Approach Considerations and Results ....................................... 7 5.3.1.1 Programming Language .......................................................................... 7 5.3.1.2 Software/hardware used for design ......................................................... 8 5.3.1.3 Transform chosen to implement ............................................................. 9 5.3.1.4 Design of FPGA .................................................................................... 10 5.4 Detailed Design ........................................................................................ 11 5.4.1 Discrete Fourier transform Algorithm ...................................................... 11 5.4.1.1 Complexity of the Direct DFT Computation ........................................ 11 5.4.2 Fast Fourier Transform Algorithm............................................................ 12 5.4.2.1 Complexity of FFT Algorithm .............................................................. 12 5.4.2.2 Frequency-decimated FFT Algorithm .................................................. 14 5.4.2.3 Inverse FFT Algorithm (IFFT) ............................................................. 16 5.4.3 Detailed Design of the FFT Algorithm on an FPGA Chip ....................... 17 i 5.4.3.1 Overall Description of System .............................................................. 17 5.4.3.2 Overall Design ...................................................................................... 19 5.4.3.3 Transform Control Design .................................................................... 21 5.4.3.4 Transform Control Pipeline Design Description .................................. 22 5.4.3.5 Memory Block ...................................................................................... 26 5.4.3.6 Address Generation Block .................................................................... 27 5.4.3.7 PC.......................................................................................................... 29 5.4.3.8 Address Generator ................................................................................ 30 5.4.3.9 Multiplier Block .................................................................................... 31 5.4.3.10 Adder/Subtractor Block .................................................................... 34 5.5 Implementation Process Description ........................................................ 37 5.6 Testing of the End Product and its Results ............................................... 38 5.7 End Results of the Project ......................................................................... 41 5.7.1 Research of Radon Transform .................................................................. 42 5.7.2 Final Status of Major Components ........................................................... 47 6. Estimated Resources and Schedules ...................................................... 48 6.1 Estimated Resources ................................................................................. 48 6.1.1 Personnel Effort Requirements ................................................................. 48 6.1.2 Other Resource Requirements .................................................................. 51 6.1.3 Financial Requirements ............................................................................ 52 6.2 Schedules................................................................................................... 54 7. Closing Materials ................................................................................... 60 7.1 Project Evaluation .................................................................................... 60 7.2 Commercialization .................................................................................... 62 7.3 Recommendations for Additional Work .................................................... 63 7.4 Lessons Learned........................................................................................ 63 7.4.1 What Went Well ....................................................................................... 63 7.4.2 What Did Not Go Well ............................................................................. 63 7.4.3 Technical Knowledge Gained ................................................................... 63 7.4.4 Non-technical Knowledge Gained ............................................................ 64 7.4.5 What Would Be Done Differently If Do Again ........................................ 64 7.5 Risk and Risk Management ....................................................................... 64 7.5.1 Anticipated Potential Risks and Planned Management Thereof............... 64 7.5.2 Anticipated Risks Encountered and Management Thereof ...................... 65 7.5.3 Unanticipated Risks Encountered and Management Thereof ................... 65 7.5.4 Resultant Changes in Risk Management Made ........................................ 65 7.6 Project Team Information ......................................................................... 65 7.6.1 Faculty Advisor and Client ....................................................................... 65 7.6.2 Student Team Members ............................................................................ 65 ii 7.7 Closing Summary ...................................................................................... 66 7.8 References ................................................................................................. 68 7.9 Appendices ................................................................................................ 68 A. Testing Forms ...................................................................................... A-1 A.1 Unit Testing Form ....................................................................................A-2 A.2 Integration Testing Form .........................................................................A-3 A.3 System Testing Form ................................................................................A-4 A.4 Acceptance Testing Form ........................................................................A-5 iii 1. List of Figures Figure 1: FPGA ................................................................................................................... 2 Figure 2: Functional Block Diagram .................................................................................. 6 Figure 3: The butterfly of a 2-point frequency-decimated FFT ........................................ 12 Figure 4: The butterfly of a 4-point frequency-decimated FFT ........................................ 13 Figure 5: Three stages in the computation of a 8-point frequency-decimated FFT.......... 15 Figure 6: FFT algorithm for computing 8-point input signal ........................................... 16 Figure 7: IFFT algorithm for computing 8-point input signal .......................................... 17 Figure 8: High-level block diagram of the FFT Implementation ..................................... 18 Figure 9: High Level Xilinx Chip Layout......................................................................... 19 Figure 10: Transform Wrapper Schematic Symbol .......................................................... 21 Figure 11: High Level Design of Transform Control ....................................................... 22 Figure 12: Memory Block Control ................................................................................... 26 Figure 13: PC Schematic Symbol ..................................................................................... 30 Figure 14: Address Generator Schematic Symbol ............................................................ 30 Figure 15: n-bit Complex Multiplier................................................................................. 32 Figure 16: 16-bit Complex Multiplier............................................................................... 33 Figure 17: n-bit Complex Carry-Lookahead Adder.......................................................... 34 Figure 18: 16-bit Complex Carry-Lookahead Adder........................................................ 35 Figure 19: Adder/Subtractor Block ................................................................................... 36 Figure 20: Testing Plan ..................................................................................................... 38 Figure 21: Representation of an image in x-y plane ......................................................... 42 Figure 22: Representation of Strips for Summation along a Single Direction, ө ............. 43 Figure 23: Overlap between Strips at Neighboring Angles is Depicted ........................... 44 Figure 24: Illustration of the Segments Computed in the First Three Passes ................... 45 Figure 25: Mapping DRT Algorithm into a Butterfly for N =16 Image ........................... 46 Figure 26: Project Schedules Part 1 .................................................................................. 55 Figure 27: Project Schedules Part 2 .................................................................................. 56 Figure 28: Project Schedules Part 3 .................................................................................. 57 Figure 29: Project Schedules Part 4 .................................................................................. 58 Figure 30 : Circuit Board .................................................................................................. 67 iv 2. List of Tables Table 1: Pros and Cons of Verilog ...................................................................................... 8 Table 2: Pros and Cons of VHDL ....................................................................................... 8 Table 3: Pros and Cons of Xilinx™ .................................................................................... 8 Table 4: Pros and Cons of Altera™ .................................................................................... 8 Table 5: Pros and Cons of Fast Fourier Transform............................................................. 9 Table 6: Pros and Cons of Radon Transform...................................................................... 9 Table 7: Pros and Cons of Pipeline Design ...................................................................... 10 Table 8: Pros and Cons of Combinational Design ............................................................ 10 Table 9: Timing of Transform Control Components ........................................................ 23 Table 10: Original Space-Time Diagram for Pipeline ...................................................... 24 Table 11: Revised Space-Time Diagram .......................................................................... 25 Table 12: Level and Memory Block Operation ................................................................ 27 Table 13: Address Generation for FFT/IFFT.................................................................... 28 Table 14: Twiddle Factor Addresses ................................................................................ 29 Table 15: Signals in Adder/Subtractor Block ................................................................... 36 Table 16: End Result of Project Components ................................................................... 47 Table 17 : Original Estimate of Personnel Effort Requirements ...................................... 48 Table 18 : Revised Estimate of Personnel Effort Requirements....................................... 49 Table 19 : Actual Personnel Effort Requirements ............................................................ 50 Table 20 : Original Estimate of Other Resource Requirements ....................................... 51 Table 21 : Revised Estimate of Other Resource Requirements ........................................ 51 Table 22 : Actual Other Resource Requirements ............................................................. 51 Table 23 : Original Estimate of Financial Requirements.................................................. 52 Table 24 : Revised Estimate of Financial Requirements .................................................. 52 Table 25 : Actual Financial Requirements........................................................................ 53 Table 26 : Original Estimate of Deliverables Schedule .................................................... 59 Table 27 : Original Estimate of Deliverables Schedule .................................................... 59 Table 28 : Actual Deliverables Schedule .......................................................................... 59 Table 29: Milestone Evaluation ........................................................................................ 60 Table 30: Project Milestones and their Importance .......................................................... 61 Table 31: Project Evaluation ............................................................................................. 62 v 3. List of Symbols and Definitions Altera™ - Manufacturer of software and hardware for FPGA design ASIC – Application-specific integrate circuit Balanced computing - use of dynamic resource of on-chip cache memory to offset gate usage DFT – Discrete Fourier transform DTFT – Discrete-time Fourier transform FIFO – First in first out FPGA – Field-programmable gate array FFT - Fast Fourier transform IFFT – Inverse fast Fourier transform FT - Fourier transform HDL - Hardware description language IDFT - Inverse discrete Fourier transform IRT - Inverse Radon transform LUT – Look-up table PC – Program counter ModelSim – Software used to simulate VHSIC code OPB – On-chip peripheral bus RT - Radon transform Transform engine - A device that computes transform computations VHDL - VHSIC hardware description language - A programming language used to design hardware VHSIC - Very high speed integrated circuit vi Xilinx™ - Manufacturer of software and hardware for FPGA design Xilinx MultiLinx – Xilinx communication device used to download FPGA design onto a Xilinx board vii 4. Introductory Material This section provides an overview of the project by defining the problem, operating environment, intended users and uses, assumptions, limitations, deliverables and expected end product. 4.1 Project Description Software calculations of FFT and IFFT are very time consuming due to their use of complex trigonometric functions, and do not work well for use in real-time systems. In this project, the team has designed an FPGA to calculate the discrete FFT and IFFT. This hardware implementation provides a faster method of calculating these transforms than software is capable of. 4.2 Executive Summary The following document is a final report for the May 05-31 senior design project “FPGA Implemented Transforms”, which details and summarizes the design of FFT and IFFT calculation engine. 4.2.1 Need for the Project Mathematical calculations such as the Fourier transform (FT) play an important role in many digital signal processing applications including telecommunications, and image pattern extraction. Applications based on FT require high computational power, which gives rise to the need to experiment with efficient algorithms. Reconfigurable hardware devices in the form of field-programmable gate arrays (FPGAs) have been proposed as a way of obtaining high performance, more efficient implementation, and maximum speed. The goal of this project was to implement the design of a hardware device that would calculate, in real-time, FFT and IFFT. Implementation was done using a Xilinx™ FPGA (similar to Figure 1). This design can be used as a component in more complex projects, such as the design of a piano keyboard system that could transcribe the notes played in real-time, or any other system in which real-time processing of the discrete Fourier transform is desired. 1 Figure 1: FPGA 4.2.2 Actual Project Activities The main activity of this project was designing an FPGA that could calculate the FFT and IFFT in hardware. This activity included studying the FFT and IFFT algorithms and looking for ways to improve their speed an efficiency. The project also included studying the design and layout of the hardware, in order to maximize speed while minimizing chip size. 4.2.3 Final Results The final result of this project is an FPGA design that can calculate a 1024-point FFT and IFFT. The design has been optimized for speed, size, and efficiency. The design also includes several smaller sub-blocks that can be used as components in larger systems. 4.2.4 Recommendations for Follow-On Work The FPGA design the team created can be researched and studied to optimize speed and size in calculating the FFT and IFFT. Also, other transforms, including the RT and IRT, can be researched and studied for similarities to the FFT and integrated into the design to form a chip that can calculate multiple transforms. The FPGA can be implemented into systems that currently use software to calculate the FFT/IFFT, as a way to speed up the calculation process. 4.3 Acknowledgements The design group members would like to thank Professor Arun Somani, who has contributed his expertise on a variety of subjects throughout the project as well as coordinating and guiding the team through the project planning process. Professor Somani also provided the hardware for the project implementation. The team also would like to thank to Ganesh Surbramanian and Michael Frederick, graduate students in 2 electrical and computer engineering, for their time and resource contributions to this project. 4.4 Problem Statement This section defines the general problem statement and the general solution approach that was used by the team. 4.4.1 General Problem Statement Software calculations of the Fourier transform are very time consuming due to their use of complex trigonometric functions, and are too slow for use in real-time systems. 4.4.2 General Solution Approach In this project, the team used Xilinx™ FPGAs to implement the hardware design for calculating, in real-time, the discrete Fourier transform (DFT). This hardware implementation provided a faster method of calculating the DFT. The end product of the design was a transportable implementation using hardware description language (HDL), which could be used in more complex projects. 4.5 Operating Environment A controlled lab is the intended operating environment for this product. Since the product is a hardware design, it can be used and modified on a computer that has the appropriate software. The design can also be studied on a chip it is downloaded to, or built exclusively for the design. 4.6 Intended Users The specific intended users of this project are graduate students in electrical and computer engineering who would be designing more complex systems with a need to perform time-intensive transform calculations. Additional end users could be ASIC (application-specific integrated circuits) designers needing to do similar calculations, and would be able to use the VHDL sub-blocks and the design methodology used to build a complete transform engine. 4.7 Intended Uses The transform engine designed was expected to be used as a component in larger systems such as the design of a piano keyboard system that could transcribe the notes played in real-time, or any other system in which real-time processing of the discrete Fourier transform was desired. 4.8 Assumptions In order for this project to be a success, several assumptions were made about the project including the hardware and software availability, and the feasibility of the project as a whole. The following assumptions were made: 3 It is possible to implement real-time transform calculation engines in hardware Xilinx™ produces an FPGA with the needed gate count to implement such algorithms All hardware and software needed for development and testing is provided by the client Each chip designed implements one or more transforms All numbers are represented as real/imaginary pairs, using 2’s complement, fixed decimal notation. The number of inputs is a power of 2. The project can be held to a budget of $150. The project can be completed in two semesters. 4.9 Limitations For this project to proceed as planned, the limitations imposed by the technology being used were considered. The following limitations were identified in the project: The clock speed of the circuits was limited to the FPGAs specified maximum clock speed, thus the algorithms chosen must compute the transforms as efficiently as possible to achieve real-time status The number of I/O pads available for data was set for each FPGA, thus I/O formats must be optimized The client had only three versions of Xilinx™ FPGAs on which testing was done, thus the designs must be optimized for those specific chips Knowledge of various discrete transform algorithms by the team Knowledge of VHDL and specific Xilinx™ functions by the team 4 4.10 Expected End product and Other Deliverables The deliverables of this project include the following: Design Methodology: A method for designing real-time transform engines by using parallel computations and generic hardware sub-blocks arranged to produce the desired output. The design methodology was completed by May 4, 2005. Sub-Block VHDL Code: Implementations of generic blocks needed for transform calculations. Examples of such blocks include memory controllers, on-chip storage, and computational sub-blocks including a PC, address generator, multiplier, and adder. The sub-block VHDL code will be delivered to the client by May 4, 2005. Transform VHDL Code and finalized FPGA: Implementation of two transform engines: FFT and IFFT. The codes for the engines were composed of several VHDL sub-blocks, as defined above, with additional logic to put them together. The codes were downloaded to the FPGA to produce a functioning transform engine with the desired output within the targeted time constraints. The finalized FPGA was demonstrated to the client by April 31, 2005, and the codes delivered by May 4, 2005. Final Report: Because the implemented transform engine would be used in larger systems, documentation of the finalized FPGA was critical. This final report included documentation of the design methodology used, the VHDL sub-block designs, and the overall implemented transform design. The final report was submitted to the client by May 4, 2005. 5 5. Project Approach and Results The following sections provides a detailed description of the team’s approach and product results. 5.1 Functional Requirements The hardware design fulfilled certain functional requirements that define exactly what the end product should and should not do. The end product must accomplish the following: User input - The hardware received complex numbers as input from the user. Initiate calculation - The user commands the chip to perform the calculation. Perform high-speed calculations - The hardware outputs the transform (either FFT or IFFT) of inputted complex numbers. The hardware design must achieve high efficiency Termination – The chip would indicate when the computation was complete. User output – The user would retrieve the complex output numbers from the chip. Figure 2 depicts these operations in a very high-level block diagram. Hardware performs high-speed calculation of FT User User retrieves inputs the FT of the output Complex Real Real FT of the complex complex numbers numbers FPGA numbers complex Imaginary Imaginary numbers Chip is Chip indicates commanded to do when the the computation computation is complete Figure 2: Functional Block Diagram 5.2 Design Requirements The requirements of this hardware design were developed as a result of the problem statement. These design requirements have been expanded and clarified during the project. Design requirements were to provide the following: 6 Fast method of calculating FT- Software calculations of FT are time-consuming due to their use of complex trigonometric functions. The hardware design (FPGA-based transform engine) described herein use application specific hardware to calculate those complex trigonometric functions and output their FT. The completely hardware-based computation provides an extremely fast method of calculating the Fourier transform. Component in complex systems- The hardware end-product enables the designed FPGA to be used as a component in complex systems in which real-time transforms are needed. A piano keyboard system that can transcribe the notes played in real-time is one example of those large systems. Systems such as these are currently being designed by graduate students in electrical and computer engineering. 5.2.1 Design Constraints The constraints of the project were derived from the assumptions and limitation mentioned earlier. These constraints are: Speed vs. size– Trade between time of calculating FT and the size of the FPGA (gate count) would be limited by the process time and the size of inputted signals. The end product depended greatly on the speed of the operation and the amount of gates used in the FPGA design. I/O format - All numbers were represented as real/imaginary pairs, using 2’s complement, fixed decimal notation. The number of inputs is a power of 2. Functionality - The FPGA was only responsible for the calculation of the FFT, and its inverse, IFFT. Finances– With a $150 budget the team relied on the resources already available as students at Iowa State University. The team had to ensure access to the needed labs for the sufficient amount of time. Time– With only two semesters to complete the project, the team needed to budget time effectively in order to complete the project. 5.3 Approach Used The approach used section includes the following components to insure a high probability of project success. 5.3.1 Technical Approach Considerations and Results In order to complete the design of the project, the team has considered several different technological approaches, weighed their advantages, and disadvantages, and decided on which technological approach would be most beneficial to the project. The approaches considered were listed in the following section. 5.3.1.1 Programming Language The team had two options for programming language: Verilog of VHDL. The trade-offs are summarized in Table 1 and Table 2. 7 Table 1: Pros and Cons of Verilog Advantages of Verilog Disadvantages of Verilog IEEE standard Limited support of system level Supported by EDA vendors modeling Limited simulation Table 2: Pros and Cons of VHDL Advantages of VHDL Disadvantages of VHDL IEEE Standard Harder to learn Supported by EDA Vendors Not as easy to use High support for modeling Simulation is far more comprehensive VHDL preferred by customer Readily available resources Result: the team used VHDL to code the hardware. VHDL was chosen mainly because it was the preferred language of the client. VHDL was also chosen because resources and guides for VHDL were readily available. VHDL provides better functionality in modeling systems and simulation for the FPGA and also provides advantages over Verilog. 5.3.1.2 Software/hardware used for design The team had two options for tools to use to help design the FPGA. Both Xilinx™ and Altera™ were available at Iowa State. The tradeoffs are summarized in Table 3 and Table 4. Table 3: Pros and Cons of Xilinx™ Advantages of Xilinx™ Disadvantages of Xilinx™ Readily available at Iowa State Never been used by team members Numerous resources available, including examples of similar projects completed using Xilinx™ Numerous tutorials to help learn Xilinx™ boards provided by customer Table 4: Pros and Cons of Altera™ Advantages of Altera™ Disadvantages of Altera™ Readily available at Iowa State Lack of resources available for Previous experience of team members complex projects 8 Result: The team used Xilinx™ to complete the project. Even though the team had experience using Altera in the past, the amount of resources available for Xilinx™ was much greater. These resources included complex designs similar to the team’s project and were important to help the team learn the complex tools needed to complete the project. 5.3.1.3 Transform chosen to implement The team had narrowed down the different types of transforms to implement into two choices: fast Fourier transform, and Radon transform. The tradeoffs are summarized in Table 5 and Table 6. Table 5: Pros and Cons of Fast Fourier Transform Advantages of fast Fourier transform Disadvantages of fast Fourier transform The team has experience using FFT Fast Fourier transform has limited room The team has studied the FFT, and to maximize speed developed an algorithm that can be implemented into hardware Table 6: Pros and Cons of Radon Transform Advantages of Radon transform Disadvantages of Radon transform Growing applications in the avionics Customer already has working design field of an FPGA to calculate the Radon Limited past research into Radon transform transform has been done, so research would be innovative Result: The team designed an FPGA to calculate the fast Fourier transform. Due to the fact the customer already had a design for an FPGA to find the Radon transform; the team only designed an FPGA to calculate the fast Fourier transform. The customer had expressed interest in the design of an FPGA of a fast Fourier transform, and the team would improve the speed as much as possible. 9 5.3.1.4 Design of FPGA The design of the FPGA could be done in two different ways, as either a pipelined design, or a combinational design. The tradeoffs are summarized in Table 7 and Table 8. Table 7: Pros and Cons of Pipeline Design Advantages of Pipeline Design Disadvantages of Pipeline Design A pipeline design will improve the A pipeline design is complex to design, speed and efficiency of the hardware implement and test The FFT and IFFT algorithm support a pipeline design Table 8: Pros and Cons of Combinational Design Advantages of Combinational Design Disadvantages of Combinational Design A combinational design will be easy to No advantages in speed or size are implement, design, and test gained by using a combinational design Some delay is encountered because of the FFT algorithm Result: The team designed a pipelined FPGA to calculate the fast Fourier transform. The benefits of increasing speed and efficiency, as well as decreased size outweighed the simplicity of a combinational design. 10 5.4 Detailed Design The design of the project includes many different areas of hardware, software, and fast Fourier transforms. The details of the design are listed below: 5.4.1 Discrete Fourier transform Algorithm The discrete Fourier transform (DFT) is defined as the frequency samples of the Fourier transform. This should not be confused with the discrete-time Fourier transform. They are not the same! Before examining the DFT in detail, consider the following two cases: 1. x[n] is an infinite sequence A discrete time signal x[n] can be recovered unambiguously from its Fourier transform through the inverse Fourier transform. In order to do this, the values of its Fourier transform for all frequency in the range [-π, π] should be known. However, knowing the values in this frequency range is not sufficient to recover the signal, since the signal x[n] is an infinite sequence in general. 2. x[n] is a finite sequence If x[n] had a finite amount of terms, say {0 ≤ n ≤ N-1}, then knowing the values of DFT at N frequency points would be sufficient to recover the signal, if these frequency points were chosen properly. In other words, the Fourier transform of this signal could be sampled at N points and the signal could be recovered from these samples. One way to justify this claim is given as follows: The Fourier transform is a linear operation. Therefore, the values of the Fourier transform at N frequency points provide N linear equations at N unknowns. The N unknowns here refer to the signal values. From algebra, such a system of equations has a unique solution if the coefficients are not singular. Therefore, if the frequency points are chosen to satisfy this condition, the signal values could be computed unambiguously. The sampled Fourier transform of a finite duration, discrete time signal is known as the discrete Fourier transform. The DFT contained a finite number of samples that equal the number of input signal samples. By definition, the DFT is denoted as N 1 X[k] = x[n] WNkn n 0 where WN = e-j2π / N and 0 ≤ k ≤ N-1 -------[1] 5.4.1.1 Complexity of the Direct DFT Computation For an input sequence of length N, the number of arithmetic operations in direct computation of the DFT is proportional to N2. (Direct DFT computation means computing the DFT using Equation 1) In general, the DFT operation is a multiplication of a complex N*N matrix by a complex N-dimensional vector. Therefore, the operation requires N2 complex multiplications and N(N-1) complex additions. Since the elements of the DFT matrix on the first row and the first column are 1, the multiplication 11 operations could be reduced by 2N-1. Now, the DFT operation involves N2 – 2N +1 which is (N-1)2 complex multiplications and N(N-1) complex additions. 5.4.2 Fast Fourier Transform Algorithm Fast Fourier transform (FFT) algorithm is another computational scheme for computing the DFT. As its name implies, FFT algorithm can be employed to compute the DFT faster by reducing its computational complexity. This invention, by Cooley and Tukey in 1965, was a major breakthrough in digital signal processing. They discovered that when the DFT of length N, is a factorable number, the number of DFT operations could be decomposed into a number of DFTs of shorter length. They showed that the total number of operations needed to compute the shorter DFT was less than that of direct computation of DFT. 5.4.2.1 Complexity of FFT Algorithm Each of the shorter DFTs could be decomposed into an even shorter DFT until all the DFT were of prime lengths, the prime factor of N. The DFT of prime lengths were then computed directly. The total number of operations in this scheme depends on the factorization of N into a prime factor. In this project, N was chosen to be an integer power of 2. Therefore, the total number of operations was N*log2N. N*log2N is much smaller than N2. Since DFT was decomposed until all of the DFT computations were of prime lengths, Cooley and Tukey discovered the 2-point DFT butterfly (Figure 3). x[0] X[0] -1 x[1] X[1] Figure 3: The butterfly of a 2-point frequency-decimated FFT In Figure 3, the following conventions are used: A line with an arrow indicates signal flow. A circle around a ‘+’ sign, with two or more lines leading to it, indicates addition. A constant number above a line indicates multiplication of the signal flowing in that line by the constant number. 12 To show visually the reduction of total number of operations, refer to the following example: Example: 3 Suppose x[n] had length N = 4, then its DFT, X[k] = x[n] W4kn n 0 3 X[0] = x[n] = x[0] + x[1] + x[2] + x[3] n 0 3 X[1] = x[n] W4n = x[0] + x[1]W4 + x[2]W42 + x[3]W43 n 0 3 X[2] = x[n] W42n = x[0] + x[1]W42 + x[2]W44 + x[3]W46 n 0 3 X[3] = x[n] W43n = x[0] + x[1]W43 + x[2]W46 + x[3]W49 n 0 By inspection, Total number of operations = 9 multiplications + 16 additions = 25 As depicted in Figure 4, the DFT is decomposed into a shorter DFT, length-2 DFT in this case. Therefore, Total number of operations = 1 multiplication + 8 additions = 9 The operations could be reduced significantly if employing the FFT algorithm. x[0] X[0] -1 x[2] X[1] x[1] X[2] W4 -1 x[3] X[3] Figure 4: The butterfly of a 4-point frequency-decimated FFT 13 According to Figure 4, the values of the DFT are the following: 3 X[0] = x[n] = x[0] + x[1] + x[2] + x[3] n 0 3 X[1] = x[n] W4n = x[0] + x[1]W4 + x[2]W42 + x[3]W43 n 0 = x[0] + x[1]W4 - x[2] - x[3]W4 3 X[2] = x[n] W42n = x[0] + x[1]W42 + x[2]W44 + x[3]W46 n 0 = x[0] - x[1] + x[2] - x[3] 3 X[3] = x[n] W43n = x[0] + x[1]W43 + x[2]W46 + x[3]W49 n 0 = x[0] - x[1]W4 - x[2] - x[3]W4 Simplification of the twiddle factors: Using Euler’s formula, ejw = cos(w) + j*sin(w) For N = 4 W4 = e-j2π / 4 = cos(π/2) – j*sin(π/2) =-j 3 W4 = - W4 = j W45 = W4 = - j W47 = - W4 = j W49 = W4 = - j In general, W42n+1 = - j n = even j n = odd As shown in the above example, the twiddle factors, WNkn can be pre-calculated and stored in a look-up table. Every time the twiddle factor is needed, it can be retrieved from the look-up table. 5.4.2.2 Frequency-decimated FFT Algorithm The frequency-decimated FFT algorithm was obtained by using the divide-and-conquer approach. To derive the algorithm, the DFT formula was divided into two summations, one of which involves the summation of the first N/2 data points and the other summation involves the last N/2 data points. N 1 X[k] = x[n] WNkn n 0 N / 2 1 N 1 = x[n] WNkn + x[n] WNkn n 0 n N / 2 let n = r + N/2 N / 2 1 N / 2 1 = x[n] WNkn + n 0 x[r N / 2] WNk(r+N/2) r 0 14 N / 2 1 N / 2 1 = x[n] WNkn + n 0 x[r N / 2] WNkr WNkN/2 r 0 Note: WNkN/2 = (-1)k N / 2 1 N / 2 1 = x[n] WNkn + (-1)k n 0 x[r N / 2] WNkr r 0 N / 2 1 = {x[n] +(-1)kx[n+N/2]}WNkn n 0 Thus, the decimated DFT can be divided into even and odd samples by the following: N / 2 1 X[2k] = {x[n] +(-1)k x[n+N/2]}WN2kn n 0 N / 2 1 = {x[n] +x[n+N/2]}WN/2kn n 0 N / 2 1 x[2k+1] = {x[n] +(-1)kx[n+N/2]}WN(2k+1)n n 0 N / 2 1 = {x[n] - x[n+N/2]}WN/2kn WNn n 0 The DFT of the input signal is computed using the frequency-decimated FFT algorithm. Adders, multipliers, and registers are used to perform the computation. Figure 5 shows the breakdown in computation of DFT using the frequency-decimated FFT algorithm. x[0] 2-point DFT X[0] x[1] X[4] 4-point x[2] DFT 2-point DFT X[2] x[3] X[6] 8-point DFT x[4] 2-point DFT X[1] x[5] X[5] 4-point x[6] DFT X[3] 2-point DFT x[7] X[7] Figure 5: Three stages in the computation of a 8-point frequency-decimated FFT 15 5.4.2.3 Inverse FFT Algorithm (IFFT) Once the FFT algorithm is obtained, the IFFT is computed by going backward in the FFT algorithm. The IFFT is described by the following equation: N 1 x[n] = 1/N X [k ] WNkn, k 0 where n = 0,1,….N-1 The procedures for an 8-point signal were taken as indicated below: Conjugate the FFT coefficients X[k] to obtain X*[k] Compute the FFT of X*[k] as shown in Figure 6 X*[0] 2-point DFT x*[0] X*[1] x*[4] 4-point X*[2] DFT 2-point DFT x*[2] X*[3] x*[6] 8-point DFT X*[4] 2-point DFT x*[1] X*[5] x*[5] 4-point X*[6] DFT x*[3] 2-point DFT X*[7] x*[7] Figure 6: FFT algorithm for computing 8-point input signal Scale the resulted x*[n] from Figure 6 by 1/N. Conjugate x*[n] to obtain the IFFT x[n], if the signal is real-valued; this final conjugation operation is not needed. Those procedures represented the IFFT algorithm and are shown all together in Figure 7: 16 Input signal FFT algorithm IFFT of signal X*[0] 2-point DFT x*[0] X*[1] x*[4] 4-point (1/N) X*[2] DFT Re(X[k]) 2-point DFT x*[2] Re(x[n]) X*[3] x*[6] 8-point DFT X*[4] 2-point DFT x*[1] (-1) 2454 (1/N) X*[5] x*[5] Im(X[k]) 4-point Im(x[n]) DFT X*[6] 2-point DFT x*[3] (-1) X*[7] x*[7] Figure 7: IFFT algorithm for computing 8-point input signal 5.4.3 Detailed Design of the FFT Algorithm on an FPGA Chip Implementing the algorithms just described on an FPGA is not an easy task. The structure of the system must be optimized for speed and size. The project team has implemented a design to compute the FFT, and IFFT. As described in Section 5.4.2 Fast Fourier Transform Algorithm, the IFFT and FFT can be implemented using just a multiplier, adder, and subtractor. The teams design uses complex carry-lookahead adders and complex multipliers using carry-save additions to maximize dataflow through the system. A description of the hardware design to calculate the FFT and IFFT is given next. 5.4.3.1 Overall Description of System The project team used a Xilinx board to implement the FPGA. The FPGA is designed to handle a 1024-point FFT and IFFT transform calculation, with each point having a real and imaginary part. Figure 8 shows a high-level block diagram of the system. 17 Input: 1024 Memory on Output: point image chip transformed 1024 point image Transform Algorithm FPGA Figure 8: High-level block diagram of the FFT Implementation 18 5.4.3.2 Overall Design The FPGA design is divided into three separate parts: the transform control, transform wrapper, and OPB transform wrapper. The design of the FPGA for a Xilinx board is shown in Figure 9. OPB Control Transform Memory OPB Control memory controller Transform Wrapper OPB Transform Wrapper OPB Bus Figure 9: High Level Xilinx Chip Layout As shown in the layout, the Xilinx chip includes an OPB bus, memory controller, and memory. The other components, including the OPB transform wrapper, transform wrapper, transform control and OPB control, were designed and implemented by the project team. The user inputs a 1024 point image of complex values into the Xilinx chip using C code. The input must be stored as a 16-point fixed decimal number, with 8 bits dedicated to the fraction, and 8 bits dedicated to the whole number. This memory contains all of the numbers needed for the computation. Once the memory has been loaded with the 1024 point image, the user starts the transform calculation by writing to the “start transform” register. The user also selects whether to calculate the FFT or IFFT by writing to the “transform to be calculated” register. Once the transform is completed, the system notifies the user by writing a 1 to the “transform complete” register, when the computation is complete. The user may then retrieve the resulting complex values from memory. 19 The OPB bus is used to communicate between the processor on the chip, the chip’s memory, and the user designed FPGA. The actual design of the FFT/IFFT transform hardware is in the transform control, and is discussed in the next section. The transform wrapper, and OPB transform wrapper are used to simplify interaction with the OPB bus, and memory. The memory is a dual-port RAM, with one port dedicated to communication of the OPB memory controller. The other memory port is dedicated to the OPB transform wrapper, and is seen as a single-port RAM, since only one port is available. The memory controller is included with the chip, and was not designed or written by the project team. The transform wrapper maps the ports of the transform control to interact with the memory, as well as the OPB control. It is in this level that the signals to start the transform, select the transform to calculate, signal overflow, and signal the transform is complete are registered. These signals can be modified through the OPB bus. The schematic diagram of the transform wrapper is shown in Figure 10. The signals Bus2IP_Clk, Bus2IP_CS, Bus2IP_RdCE, Bus2IP_WrCE, Bus2IP_Reset, Bus2IP_Data, Bus2IP_Addr, IP2Bus_Data, and IP2Bus_Ack are used to communicate with the OPB Bus. The ports real_doa, real_dob, imag_doa, imag_dob, real_addra, real_addrb, real_dina, real_dinb, real_enablea, real_enableb, real_wea, real_web, imag_addra, imag_addrb, imag_dina, imag_dinb, imag_enablea, imag_enableb, imag_wea, imag_web, twiddle_real_doa, twiddle_real_addra, twiddle_real_dina, twiddle_real_enablea, twiddle_real_wea, twiddle_imag_doa, twiddle_imag_addra, twiddle_imag_dina, twiddle_imag_enablea, and twiddle_imag_wea are used to communicate with the memory. The signals can be deciphered using the following rules: A real indicates the port communicates with the real component memory A imag indicates the port communicates with the imaginary component memory An a at the end of the port name means the signal is memory block 1 A b at the end of the port name means the signal is memory block 2 do indicates the port is the data output from memory addr indicates it is an address port din indicates the port is data input from memory enable is the enable for the memory port we is the write enable for the memory port The memory ports are discussed further in the Section 5.4.3.5 Memory Block. 20 Figure 10: Transform Wrapper Schematic Symbol The OPB wrapper allows the transform wrapper to interact with the OPB bus. This wrapper was provided by Michael Frederick (listed in the acknowledgements), and is used to handle the decoding of the OPB bus to decipher when the bus is communicating with the FPGA, when it is writing to the chip, and when it is reading from the chip. 5.4.3.3 Transform Control Design The transform control is the heart of the FPGA. In the transform control the calculation of the FFT, and IFFT, are implemented in hardware. 21 The team decided to implement these transforms using a pipeline design in order to make the calculation as fast as possible, as discussed in Section 5.3.1 Technical Approach Considerations and Results. The pipeline design consists of 9 stages and a high level design is shown in Figure 11. As shown in the figure, two stages are dedicated to the address generation block, one stage is dedicated for the memory block, four stages are dedicated to the multiplier, and two stages are dedicated to the adder/subtractor block. These blocks of the transform control are made up of several sub-components including a PC, shifter, look up table, address generator, multiplier, adder/subtractor, and multiplexers and registers. The memory block in the figure refers to the Xilinx chip memory shown in Figure 9, and is not actually included in the transform control, but is shown for clarity. Details of each component are given in the following sections. Stages 1-2 Stage 3 Stages 4-7 Stages 8-9 Address Adder/ Generation Memory Multiplier Subtractor Block Block Block Block Figure 11: High Level Design of Transform Control 5.4.3.4 Transform Control Pipeline Design Description The team implemented a pipeline design to calculate transforms in order to maximize the computation speed of the transform. A 9-stage design was chosen based on the time delay associated with the components of the transform control. Timing delays for the components are given in Table 9. The timing delays for all of the components designed by the team were obtained from Xilinx synthesis report used to estimated computation time. The memory access timing was obtained from p. 58 of the Virtex II data sheet provided by Xilinx. 22 Table 9: Timing of Transform Control Components Component Min. Period Time Max Clock Frequency PC 3.79ns 263.644 MHz Address Generator 3.45ns 289.855 MHz Multiplier 17.34ns 57.67 MHz Add/Subtractor 6.54ns 152.91 MHz Memory Access Time 1.54ns 650.35 MHz As the table shows, the pipeline design will be based mainly on the multiplier. Since the multiplier consists of levels of addition, it can easily be broken up into stages for a pipeline design. The adder/subtractor can also be divided into stages, since it is dependent on two memory reads. Therefore, the team has broken the multiplier block into four stages, the adder/subtractor block into two stages, and the PC and address generator into two stages. In order to study the pipeline design the team came up with a space- time diagram to depict the pipeline stages. The team’s original space-time diagram is shown in Table 10. The diagram was simplified by using only an 8-point image. 23 Table 10: Original Space-Time Diagram for Pipeline Time 0 1 2 3 4 5 6 7 8 9 10 11 I1 Add Read Mult Mult Mult Mult PC Addr Read x[0] PC Addr x[4]/ S1 S2 S3 S4 - 0 0 x[0] + 1 4 Write x[0] x[0] x[0] x[0] X[4] x[2] I2 Sub Mult Mult Mult Mult PC Addr Read x[0] PC Addr S1 S2 S3 S4 - 1 4 x[4] - 2 1 x[4] x[4] x[4] x[4] X[4] I3 Add Mult Mult Mult Mult PC Addr Read x[1] PC S1 S2 S3 S4 - 2 1 x[1] + 3 x[1] x[1] x[1] X[1] x[5] I4 Sub Mult Mult Mult Mult PC Addr Read x[1] S1 S2 S3 S4 - 3 5 x[5] - Instruction x[5] x[5] X[5] x[5] x[5] I5 Mult Mult Mult Mult PC Addr Read S1 S2 S3 S4 - 4 2 x[2] x[2] X[2] x[2] x[2] I6 Mult Mult Mult Mult PC Addr Read S1 S2 S3 S4 5 6 x[6] X[6] x[6] x[6] x[6] I7 Mult Mult Mult PC Addr Read S1 S2 S3 6 3 X[3] x[3] x[3] x[3] I8 Read Mult Mult PC Addr x[7]/ S1 S2 7 7 Write x[7] x[7] x[0] I9 Read Mult PC Addr x[0]/ S1 0 0 Write x[0] x[1] In the original space-time diagram, both an adder and subtractor are needed at the same time. After studying this diagram, the team implemented a solution to minimize chip space, by using the same adder to calculate the subtraction. This could be done by shifting the subtraction stage one space to the right. A revised space-time diagram is shown in Table 111. 24 Table 11: Revised Space-Time Diagram Time 0 1 2 3 4 5 6 7 8 9 10 11 I1 Add Read Mult Mult Mult Mult PC Addr Read x[0] PC Addr x[4]/ S1 S2 S3 S4 - 0 0 x[0] + 1 4 Write x[0] x[0] x[0] x[0] X[4] x[2] I2 Sub Mult Mult Mult Mult PC Addr Read x[0] PC Addr S1 S2 S3 S4 - 1 4 x[4] - 2 1 x[4] x[4] x[4] x[4] x[4] I3 Add Mult Mult Mult Mult PC Addr Read x[1] PC S1 S2 S3 S4 - 2 1 x[1] + 3 x[1] x[1] x[1] X[1] x[5] I4 Sub Mult Mult Mult Mult PC Addr Read x[1] S1 S2 S3 S4 - 3 5 x[5] - Instruction x[5] x[5] X[5] x[5] x[5] I5 Mult Mult Mult Mult PC Addr Read S1 S2 S3 S4 - 4 2 x[2] x[2] X[2] x[2] x[2] I6 Mult Mult Mult Mult PC Addr Read S1 S2 S3 S4 5 6 x[6] X[6] x[6] x[6] x[6] I7 Mult Mult Mult PC Addr Read S1 S2 S3 6 3 X[3] x[3] x[3] x[3] I8 Read Mult Mult PC Addr x[7]/ S1 S2 7 7 Write x[7] x[7] x[0] I9 Read Mult PC Addr x[0]/ S1 0 0 Write x[0] x[1] In this diagram it is clear addition and subtraction take place in separate stages, therefore only one adder is needed. 25 5.4.3.5 Memory Block A 1024x16-bit single-port RAM memory block is used to store the 1024 data points in 2’s complement form. A 512x16-bit single-port RAM memory block was used as a look- up table to store constant twiddle multiplier numbers. During the FFT and IFFT transform calculations memory reads and writes occur at the same time. Rather than losing a clock cycle by alternating between a memory read and write, the team used two single-port RAMs for the 1024 data points, and switched between reading and writing to each block. Therefore, a total of six memory blocks are used in the design of the system. Four single-port RAM memory blocks (two for the real part, two for the imaginary part), of size 1024x16, are used to store the input/output. Two single-port RAM memory blocks (one for the real part and one for the imaginary part), of size 512x16, are used to store the twiddle multiplication factors Since each level of the FFT needs to read from the transform calculations from the previous level, switching reading and writing between memory blocks is needed. Figure 12 shows the design of the memory and memory controller. For a 1024-point FFT and IFFT, there will be 10 levels of calculation. The stage and corresponding memory block operation is given in Table 12. Read/Write Switch Decoder Real Imaginary Real Imaginary Memory Memory Memory Memory Block 1 Block 1 Block 2 Block 2 Figure 12: Memory Block Control 26 Table 12: Level and Memory Block Operation Imaginary Imaginary Real Memory Real Memory Memory Block Memory Block Block 1 Block 2 1 2 Level Operation Operation Operation Operation 1 Read Write Read Write 2 Write Read Write Read 3 Read Write Read Write 4 Write Read Write Read 5 Read Write Read Write 6 Write Read Write Read 7 Read Write Read Write 8 Write Read Write Read 9 Read Write Read Write 10 Write Read Write Read Due to the 9-stage pipeline design, the last 9 data addresses will not be written back to memory before the port is switched because of the stage delay. The team simply used a buffer to store the last 9 data points calculated in the FPGA, and whenever these addresses are needed to be read, they are read from the buffer. In calculating the FFT and IFFT, data is written back to memory sequentially. But as described in Section 5.4.2, the read data for the DFT is not always sequential. The next section describes how these read addresses are calculated. 5.4.3.6 Address Generation Block To calculate the FFT/IFFT data must be written back sequentially, but read addresses are not always sequential. Through analysis of the FFT and IFFT algorithms, the team discovered the read address is calculated in each stage by rotating a PC generated address to the right by 1 bit. Table 13: Address Generation for FFT/IFFT shows this address generation for the four levels of a 16-point transform calculation, and the table can be expanded to the 10 levels of a 1024-point transform calculation. 27 Table 13: Address Generation for FFT/IFFT PC & Read Address for Level 1 Level 2 Level 3 Level 4 levels 1-4 PC generated Read Actual data Actual data Actual data Actual data address address location location location location written to written to written to written to memory in memory in memory in memory in sequential sequential sequential sequential order order order order 0 0 0 0 0 0 1 8 8 4 2 1 2 1 1 8 4 2 3 9 9 12 6 3 4 2 2 1 8 4 5 10 10 5 10 5 6 3 3 9 12 6 7 11 11 13 14 7 8 4 4 2 1 8 9 12 12 6 3 9 10 5 5 10 5 10 11 13 13 14 7 11 12 6 6 3 9 12 13 14 14 7 11 13 14 7 7 11 13 14 15 15 15 15 15 15 As one can see from the table, shifting the PC generated address by 1 one to the right each time results in the correct data being read from the memory at each stage in the transform calculation. The twiddle factor look-up table addresses are also generated in the address generation block. The twiddle addresses are found by taking the read address, and shifting to the left by N bits, with 0s shifted in, where N is equal to the level number minus one. If the most significant bit of the shifted address is a 0, then the twiddle address is 0. However, if the most significant bit is a one, then the twiddle address is the remaining least significant bits (9 for a 1024-point image). In the first level of calculation, the twiddle address is always 0. Table 14: Twiddle Factor Addresses shows the twiddle addresses generated for a 16-point image. This table can be expanded similarly to a 1024-point image. 28 Table 14: Twiddle Factor Addresses PC & Read Address for Stage 1 Stage 2 Stage 3 Stage 4 stages 1-4 PC generated Read Twiddle Data written Data written Data written address address Address to memory to memory to memory in sequential in sequential in sequential order order order 0 0 0 0 0 0 1 8 0 0 0 0 2 1 0 0 0 0 3 9 0 4 2 2 4 2 0 0 0 0 5 10 0 0 0 0 6 3 0 1 0 0 7 11 0 5 2 2 8 4 0 0 0 0 9 12 0 0 0 0 10 5 0 2 1 0 11 13 0 6 3 2 12 6 0 0 0 0 13 14 0 0 0 0 14 7 0 3 1 0 15 15 0 7 3 2 The address generation block consists of two components: a PC, and an address generator. 5.4.3.7 PC The PC is a simple block that is very important to the system. The PC is similar to most standard PCs. The PC counts up on every rising edge clock and counts from 0 to 210 -1. It is also the responsibility of the PC to keep track of the level of the transform being calculated. After each rollover, the PC increments its level count by 1 and this level count number is used in the twiddle factor address generation. Once the PC level count reaches 10, on the next PC rollover, the transform completed signal is set, and can be read in the register by the user. The PC block is shown in Figure 13. The clk port is the clock for the PC. The i_run port is the register signal that starts the transform calculation. The reset_f port clears the PC value on a reset. The o_level keeps track of the transform stage and is inputted into the twiddle address generator. The o_PC is the PC 9 bit PC output. The o_done signal is used to indicate when a transform has been completed. 29 Figure 13: PC Schematic Symbol 5.4.3.8 Address Generator The lookup table address is generated according to the algorithm described in Section 5.4.3.5 Memory Block. The address generator includes a bit rotator that rotates a 10 bit number to the left by 1, and rotates in a 0. The 10 bit number address is the read address generated by the shifter. For the twiddle address, if the transform is in the first level, then the generator returns the address 0. For the remaining levels, the read address is rotated by the level minus one bit/s to the left. If the 10th bit is a zero, the look-up table twiddle address generator returns address 0. If the 10th bit is a one, the look-up table twiddle address generator returns the nine least significant bits of the rotated number. The team implemented the shifter and LUT address generator as one unit, called the address generator, because both use a rotation to generate addresses. The address generator block is shown in Figure 14. The i_level and i_PC signals are output from the PC, once they have been registered from stage 1 in the pipeline. The clk signal is the clock input, and the reset_f signal clears the address generator on a reset. The o_data_addr is the 10-bit read address, and the o_twiddle_addr is the 9-bit twiddle address. Figure 14: Address Generator Schematic Symbol 30 5.4.3.9 Multiplier Block The 16-bit complex multiplier takes two complex numbers, and multiplies them, truncating the output to an 16-bit number. Figure 15 shows the construction of an n-bit multiplier using n-bit carry-save adders, carry-lookahead adders, and carry-lookahead subtractors, which was used as a reference for the 16-bit multiplier. The use of carry- save adders in the multiplier was chosen due to its very high-speed calculation, and adaptability to a pipelined process. For large data widths, a different algorithm for computing the multiplication may need to be explored. 31 Figure 15: n-bit Complex Multiplier Figure 16 shows the VHDL schematic symbol for the multiplier. The ports i_a_imag, i_a_real, i_b_imag, i_b_real are the 16-bit input signals to multiply. The clk port is used to register the values in the multiplier for the pipeline, as discussed in Section 5.4.3.4 32 Transform Control Pipeline Design Description, and the reset_f pin is used to clear these registers on a reset. The o_y_imag and o_y_real are the 32-bit output signals. Only the least significant 16-bits of these signals are actually used. The o_ovfl_imag and o_ovfl_real are used to indicate if an overflow exception occurred during multiplication. Figure 16: 16-bit Complex Multiplier 33 5.4.3.10 Adder/Subtractor Block The adder/subtractor block is used to add and subtract two points in the transform calculation. The team implemented one 16-bit adder. In order to calculate subtractions, the adder is used by setting the carry in bit to one, and inverting the subtracted input. The n-bit complex carry-lookahead adder was used as a model for the adder block to the system. This block performed fast complex addition by using the carry-lookahead technique. The team implemented a 16-bit complex carry-lookahead adder. An n-bit complex carry-lookahead adder is shown in Figure 17. Figure 17: n-bit Complex Carry-Lookahead Adder The 16-bit complex carry-lookahead adder’s VHDL schematic symbol is shown in Figure 18. The ports i_a and i_b are the two 16-bit input signals, and o_sum is the 16-bit addition of i_a and i_b ouput signal. The signal i_carry is used as the initial carry bit, and o_ovfl indicates if an overflow occurred in the addition. 34 Figure 18: 16-bit Complex Carry-Lookahead Adder The adder/subtractor block also includes one stage of registers, and two multiplexers. The extra components are needed due to the design of the FFT/IFFT algorithms. In the algorithm, each DFT calculation performs an addition and subtraction of two points, after each point is multiplied by the appropriate twiddle factor. In the design of our system, we used only one multiplier, therefore the output of the first point is held for three additional clock cycles, one to wait for the second point to finish multiplication, one to be added to the second point, and one to perform a subtraction with the second point. The second data point is held for two additional clock cycles, one cycle to perform an addition with the first point, and one to perform a subtraction with the second point. Also, the carry-in bit is set for the subtraction, and the second data point is inverted. The team implemented the following logic circuit, shown in Figure 19: Adder/Subtractor Block, to handle this problem. Table 15 details the operations of this circuit for an 8- point transform calculation. 35 Mux Register 1 Multiplier Adder Mux Inverter Register cin 2 Add/Sub Figure 19: Adder/Subtractor Block Table 15: Signals in Adder/Subtractor Block Signal Time Multiplier Add/Sub Register 1 Register 2 Operation 0 x[0] 1 x[0] - - 1 x[4] 0 x[0] x[0] x[0]+x[4] 2 x[1] 1 x[0] x[4] x[0]-x[4] 3 x[5] 0 x[1] x[1] x[1]+x[5] 4 x[2] 1 x[1] x[5] x[1]-x[5] 5 x[6] 0 x[2] x[2] x[2]+[6] 6 x[3] 1 x[2] x[6] x[2]-x[6] 7 x[7] 0 x[3] x[3] x[3]+x[7] 8 x[0] 1 x[3] x[7] x[3]-x[7] In the calculation of an FFT, no additional blocks are needed, however for the IFFT; the transform is calculating the complex conjugate. In order to calculate the IFFT, a multiplexer has been added after the adder. The multiplexer chooses between the output of the adder, and the conjugate of the output of the adder. Only in the 10th level of the IFFT transform calculation is the conjugate taken. 36 These four main blocks make up the design of the transform control and are used to implement an FPGA that can calculate both an FFT and IFFT. 5.5 Implementation Process Description In order to design an FPGA that could be used to calculate transforms, the team had to apply the mathematical algorithms to calculate the transforms to hardware. This process was implemented through research and discussion. The team studied the algorithms to find how they could be broken into hardware components, and how the hardware components could interact to calculate the transform. The team implemented the hardware by breaking down the hardware design into smaller units. These smaller units were used as building blocks for the larger system. Once a smaller unit was designed, it was tested extensively until the team was satisfied with its success. At this point the unit was integrated into its sub-block based on the pipeline design. These sub-blocks were again tested extensively until the team was satisfied with the completion of the. Once all of the sub-blocks were completed, they were integrated together to form the FPGA. In order to design an FPGA to calculate transforms, the team had to use several different tools. The majority of the components were written using VHDL. All of the components were designed using the Xilinx Integrated Student Edition 6 program to design hardware components, and ModelSim was used to test the components. Once a working design of the FPGA was completed, it was downloaded onto a Xilinx board using a Xilinx MulitLinx. The FPGA was then tested on hardware using a serial port to communicate with the board. C code and Java were used to communicate with the chip and interpret data input and output. One major problem was encountered with the implementation. The hardware designed to calculate the transform was designed for 16-bit fixed point numbers, with 8 bits dedicated to the fraction, and 8 bits dedicated to the whole number, and the data inputted into the system was through C code. The team spent a considerable amount of time trying to figure out how to convert a float or double into a 16-bit fixed point number. This implementation process was fairly successful, although one area could have been improved. The team needed to better design the FPGA before trying to implement the system. This could have saved valuable time spent trying to overcome implementation problems. 37 5.6 Testing of the End Product and its Results The team conducted four different types of testing during the project. The test plan is illustrated in the diagram in Figure 20. Project design and planning is given on the left side of the diagram, and testing is given on the right side. Unit testing, integration testing, system testing, and acceptance testing were used to verify the functionality of the FPGA design. Each type of testing corresponded to a particular aspect of the planning of the project. Requirements Acceptance Architecture System Design Integration Code Unit Time Figure 20: Testing Plan Two members of the group, Sean Casey and Chris Miller, wrote all of the code for the system. Therefore the other two members, Ibrahim Ali and Chii-Aik Fang, were unfamiliar with the underlying code of the subcomponents of the system. These two members were valuable in the testing process because they were able to extensively test the components without the knowledge of how the system worked. The testing was conducted by breaking down the stages of the pipeline into the four main components shown in Figure 11: High Level Design of Transform Control. This way, each stage could be verified as working, before integrating it with any other stage. Each stage was broken down into its subcomponents, which were unit tested before being integrated into the stage component. This type of testing allowed us to identify problems early in the design process. Each of the types of tests is detailed on the next several pages. 38 1. Unit testing: Unit testing was performed after the initial code for the project sub- blocks was developed. Unit testing was performed by the team in the hardware lab in Coover, using ModelSim. Unit testing was used to test code to see if the individual components of the system were working and were coded correctly. Unit testing only tested the functionality of the smallest components of the system. If the tests showed failures, the code was fixed and retested until the individual components were functional, and the team could not find any errors. The tests were both automated and interactive tests, with the team vigorously testing boundary conditions, to ensure that all cases were tested. A sample integration testing form is given in Appendix A.1. The following sub-blocks of the system were unit tested, and their completed testing forms are given in the Appendix. PC – The PC was tested using a ModelSim test bench. The ModelSim display was analyzed to see if the PC was incrementing properly and outputting the appropriate signals. Once the team analyzed the PC for the entire 1024-point transform, and all values were checked, the PC was verified to be accurate. The test form and model ModelSim output is shown in the appendix. Twiddle address generator – The twiddle address generator was tested using a ModelSim test bench. The ModelSim display was analyzed to see if the twiddle address generator was incrementing properly and outputting the appropriate signals. Once the team analyzed the twiddle address generator for the entire 1024-point transform, and all values were checked, the twiddle address generator was verified to be accurate. The test form and model ModelSim output is shown in the appendix. Multiplier – The multiplier was tested using a ModelSim test bench. The boundary values of the multiplier were tested as well as several other values. Once extensive testing was done on the multiplier, and the outputted values were verified, the multiplier passed the testing phase. The results of the test bench are given in the appendix. Adder – The adder was tested using a ModelSim test bench. . The boundary values of the adder were tested as well as several other values. Once extensive testing was done on the adder, and the outputted values were verified, the adder passed the testing phase. The results of the test bench are given in the appendix. 2. Integration Testing: As the team completed the testing of individual components of the system, the components were integrated together. Through integration testing, the team determined if the components were interacting correctly as described in the design. Integration testing was also performed only by the team, and took place in Coover. Once again, the test was both automated and interactive, with the team testing for functionality, and for boundary conditions, making sure that the code functioned properly when boundaries were reached. The team continued integration testing until satisfied that components worked together as specified in the design. A sample integration testing form is given in Appendix A.2. 39 The following blocks of the system were integration tested, and their completed testing forms are given in the Appendix. Address generation block – The address generator block was tested using a ModelSim test bench. The block was tested by members of the team without knowledge of the inner workings of the block. Once the team analyzed the address generator block for the entire 1024-point transform, and all values were checked, the address generator block was verified to be accurate. The results of the test bench are given in the appendix. Adder/subtractor Block – The adder/subtractor block was tested using a ModelSim test bench. The boundary values of the adder/subtractor block were tested as well as several other values. Once extensive testing was done on the adder/subtractor block, and the outputted values were verified, the adder/subtractor block passed the testing phase. The results of the test bench are given in the appendix. 3. System Testing: Once the integration testing completed, the overall system was tested. System testing was performed by the team. The testing was both automated and interactive. A sample system testing form is given in Appendix A.3. The main system testing was performed on the transform control wrapper; the results of the testing are given in the appendix. One problem with memory switching was discovered in this testing phase, and it is shown in the appendix. 4. Acceptance Testing: At this time, acceptance testing has yet to be completed. During acceptance testing the FPGA will be tested by the client for acceptance. Here, the client will test both the functionality of the device, and the speed of calculation, size of design, and improvement of current similar technologies. The criteria for judging success in acceptance testing is determined by the client, and is specified by the functional and non-functional requirements of the project. Satisfaction of the client will mean the acceptance testing is completed. A sample acceptance testing form is given in Appendix A.4. Through this testing, the team has tested both the functionality and performance of the FPGA. Through this testing and retesting the team has maximized the performance of the FPGA. 40 5.7 End Results of the Project The end result of the project was an FPGA design that could calculate the FFT and IFFT transforms. Two members of the team, Ibrahim Ali, and Chii-Aik Fang, spent time researching the radon transform and inverse radon transform. Due to time constraints and differences between the RT and FFT algorithms, these transforms were not able to be implemented into the FPGA Design. Their research is given in the following section. 41 5.7.1 Research of Radon Transform The Radon transform is another transform that can be used in many image processing applications. The following section describes the Radon transform. Before introducing the discrete Radon algorithm, some important points need to be explained, these points are addressed below: In the x-y plane, an image I(x,y) is represented in N x N array of pixels as shown in Figure 21. Y . . . . . . . I(x,y) . . . . . . . . . . . . . . . . . . . . . Pixels . . . . . . . . . . . . . . . . . . . . . X Figure 21: Representation of an image in x-y plane Every pixel represents the average gray level of a unit squire in the image. The discrete Radon transform (DRT) is the projections of this image taking by integrating alone lines defined by this equation: x cosө + y cosө = d d is the distance between a line and the origin, and ө is the angle of the line with respect to y-axis. Refer to Figure 22. An image in (x,y) space is thus transformed into Radon space (d ө). 42 Figure 22: Representation of Strips for Summation along a Single Direction, ө Now the discrete Radon transform (DRT) could be computed by taking the following procedures: For any given angle, ө, each pixel lies in exactly one strip, therefore, for each pixel we simply compute its strip, δ(relative to ө) and add it to the current total for (δ, ө). This procedure is repeated for each value of ө. A simple code descriptions of this algorithm is given below: ( N 1) for (ө = 0; ө≤ ; ө += ) N N { for (x = 0 ; x<N ; x ++) { for (y = 0 ; y<N ; y++) { 1 d = [x cosө + y sinө - ]; 2 R[d][ө]= R[d][ө]+ I[x][y]; } } } 43 Fast approximate discrete Radon transform: For neighboring angles, large subsets of pixels may be shared by different strips. On the left of Figure 23, the discrete lines represented by the two angles are shown, on the right, their representation as unit-width strips are shown. Thus, one could potentially save time by computing such shared partial sums only once for use in two or more lines. Figure 23: Overlap between Strips at Neighboring Angles is Depicted A parallel algorithm is constructed to compute an approximation to the desired DRT witch is designed to take maximum advantage of intermediate terms. The computation is divided into four parts corresponding to four equal-sized ranges of angles: 3 3 [0- ],[ - ],[ - ], and [ - ]. 4 4 2 2 4 4 Then the algorithm is applied to each range of the angles. Now, the following steps are taken to fast approximate the discrete Radon transform (DRT): In the first pass, a set of segments of approximate length of 2 are computed (the segments are the sums of two pixels) Next, pairs of length-2 segments are combined to form a set of segments of approximate length 4. In successive passes, segments of approximate length 2 i are computed, using only the length 2 1 i segment from the previous phase. After log N passes, strips of approximated length N are computed, each representing the sum of N pixels from the original image. These sums constitute the approximated DRT data. Segments computations are illustrated in Figure 24. For each pass in the figure, one complete set of angles is highlighted, beginning in the lower right corner of the NxN original image. 44 Figure 24: Illustration of the Segments Computed in the First Three Passes The NxN DRT algorithm is mapped into an Nx(logN+1) processor butterfly, yielding an N time pipelined algorithm. The data is pipelined by sending one column at a time into the N processors in the first stage. For a vertical line, this is a straightforward sweep through the logN stages. Other angles require data form different columns, which is achieved by inserting delays for different angles. At each step, a processor receives and stores two elements representing length-2 1 i segments, and adds two delayed elements to create a length-2 1 i segments. Considering Figure 25 for mapping the algorithm, for the first three passes, into a butterfly network, one could conclude that: Row 0 is added to row 1 and saved in row 0 Row 0 is also added to a shifted row 1, and the result saved in row 1 The figure represents this as the initial row 0 contributing to rows 0 and 1 in pass 1 Pass 2 and pass 3 are computed in the same manner. 45 Figure 25: Mapping DRT Algorithm into a Butterfly for N =16 Image 46 5.7.2 Final Status of Major Components The final statuses of the major components of the product are listed in Table 16. Most of the major components have been successfully completed at the time of this report. The team has looked into using the FPGA to bypass a software transform calculation for a music translation on a digital keyboard as an application of the product. At the time of this report, the team has not been able to successfully interface the board and the keyboard, mainly due to time constraints. Also, the team has not been able to design an FPGA that can calculate the RT/IRT due to time constraints and differences between the RT and FFT algorithms. Table 16: End Result of Project Components Component Success/Failure Date Completed FPGA Design Success 3/19/05 OPB Transform Wrapper Success 3/5/05 Transform Control Wrapper Success 3/5/05 Transform Control Success 3/4/05 Address Generation Block Success 2/27/05 PC Success 2/15/05 Address Generator Success 2/27/05 Memory Block Success 3/18/05 Multiplier Block Success 2/15/05 Multiplier Success 2/15/05 Adder/Subtractor Block Success 3/20/05 Logic Design Success 3/19/05 Adder Success 2/15/05 Interface with keyboard FAILED FPGA to Calculate RT/IRT FAILED Though the project the team has been unable to complete two major components, the overall project has been a success. The team has successfully implemented an FPGA to calculate transforms, the goal of the project. 47 6. Estimated Resources and Schedules The following section provides an original estimate, revised estimate, and actual occurrence of the resources that would be used to complete the project including physical resources, labor, and a time schedule. 6.1 Estimated Resources This section shows the original estimate, revised estimate, and actual man-hours to be performed during the project by the team, and the amount of financial resources required for completion of the project. Although the team had performed the work to fulfill a curriculum requirement, estimated labor costs were figured into the overall project cost to simulate an industry setting. 6.1.1 Personnel Effort Requirements Table 17 contains the original estimate of personnel effort requirements for the project. The table was divided to show personal effort by each team member on each task of the statement of work defined in the project plan. Table 17 : Original Estimate of Personnel Effort Requirements Task1 Task2 Task3 Task4 Task5 Task6 Task7 Task8 End Product Documentation End product Demonstration Technology Considerations End Product Prototype End Product Testing End Product Design Problem Definition Project Reporting Implementation and Selections Personnel Name Total Sean 10 35 30 55 55 15 5 10 215 Casey Chris 12 32 35 60 50 13 5 20 227 Miller Chii Aik 13 30 33 58 52 14 5 11 216 Fang Ibrahim 14 34 35 57 58 13 5 11 227 Ali Total 49 131 133 230 215 55 20 52 885 48 Table 18 contains the revised estimate of personnel effort requirements for the project. Table 18 : Revised Estimate of Personnel Effort Requirements Task1 Task2 Task3 Task4 Task5 Task6 Task7 Task8 End Product Documentation End product Demonstration Technology Considerations End Product Prototype End Product Testing End Product Design Problem Definition Project Reporting Implementation and Selections Personnel Name Total Sean 10 35 30 55 55 15 5 10 215 Casey Chris 12 32 35 60 50 13 5 20 227 Miller Chii Aik 13 30 33 58 52 14 5 11 216 Fang Ibrahim 14 34 35 57 58 13 5 11 227 Ali Total 49 131 133 230 215 55 20 52 885 49 Table 19 contains the actual personnel effort requirements for the project. The actual total hours spent for the project was less than that of the estimate total hours. This was because the team was estimating the total hours for implementing four different transforms: FFT, IFFT, RT, and IRT. However, due to the limitation of time, only FFT and IFFT were decided to be implemented. Table 19 : Actual Personnel Effort Requirements Task1 Task2 Task3 Task4 Task5 Task6 Task7 Task8 End Product Documentation End product Demonstration Technology Considerations End Product Prototype End Product Testing End Product Design Problem Definition Project Reporting Implementation and Selections Personnel Name Total Sean 10 35 30 60 55 15 5 10 220 Casey Chris 12 32 35 75 60 13 5 20 252 Miller Chii Aik 13 30 33 30 31 19 5 11 172 Fang Ibrahim 14 34 35 29 30 11 5 11 169 Ali Total 49 131 133 194 178 61 20 52 813 50 6.1.2 Other Resource Requirements Table 20 defines the original estimate of miscellaneous resources required for this project. Table 20 : Original Estimate of Other Resource Requirements Item Team Hours Cost 3 × FPGA Boards 0 Provided by the client Xilinx Software 0 Downloaded VHDL Materials 0 Checked out from library Project Poster 12 $50 Total 12 $50 Table 21 defines the revised estimate of miscellaneous resources required for this project. Table 21 : Revised Estimate of Other Resource Requirements Item Team Hours Cost 3 × FPGA Boards 0 Provided by the client Keyboard 0 Provided by the client Xilinx Software 0 Downloaded VHDL Materials 0 Checked out from library Project Poster 12 $60 Bound Final Report 16 $10 Total 12 $70 Table 22 defines the actual miscellaneous resources required for this project. Table 22 : Actual Other Resource Requirements Item Team Hours Cost 3 × FPGA Boards 0 Provided by the client Keyboard 0 Provided by the client Xilinx Software 0 Downloaded VHDL Materials 0 Checked out from library Project Poster 12 $60 Bound Final Report 16 $10 Total 12 $70 51 6.1.3 Financial Requirements Table 23 contains the original estimate of financial requirements of the project. The top half of the table defined the physical resources needed to successfully fulfill the requirements of the senior design course and project. The bottom half of the table defined an estimate of the cost incurred by employing the team members to perform work on the project. Table 23 : Original Estimate of Financial Requirements Parts and Materials Cost ($) a. Course Manual 50.00 b. Project Poster 60.00 c. FPGA Boards Provided by client d. Development Tools No cost Subtotal $110.00 Labor at $10.50/hr Total Hours Cost ($) a. Sean Casey 215 2257.50 b. Chris Miller 227 2383.50 c. Chii-Aik Fang 216 2268.00 d. Ibrahim Ali 227 2383.50 Subtotal (labor) 9292.50 Project Total $9,402.50 Table 24 contains the revised estimate of financial requirements of the project. Table 24 : Revised Estimate of Financial Requirements Parts and Materials Cost ($) a. Course Manual 50.00 b. Project Poster 60.00 c. Bound Final Report 10.00 d. FPGA Boards Provided by client e. Keyboard Provided by client f. Development Tools No cost Subtotal $120.00 Labor at $10.50/hr Total Hours Cost ($) a. Sean Casey 215 2257.50 b. Chris Miller 227 2383.50 c. Chii-Aik Fang 216 2268.00 d. Ibrahim Ali 227 2383.50 Subtotal (labor) 9292.50 Project Total $9,412.50 52 Table 25 contains the actual financial requirements of the project. The actual financial requirements were less than that of the estimate financial requirements was because the client had decided to implement only the FFT and IFFT using the FPGA-chip. Table 25 : Actual Financial Requirements Parts and Materials Cost ($) a. Course Manual 50.00 b. Project Poster 60.00 c. Bound Final Report 10.00 d. FPGA Boards Provided by client e. Keyboard Provided by client f. Development Tools No cost Subtotal $120.00 Labor at $10.50/hr Total Hours Cost ($) a. Sean Casey 220 2310.00 b. Chris Miller 252 2646.00 c. Chii-Aik Fang 172 1806.00 d. Ibrahim Ali 169 1774.50 Subtotal (labor) 8536.50 Project Total $8,656.50 53 6.2 Schedules This section depicts the schedules for the project. Microsoft Project Professional 2002 was used to design the following project schedules defined by the project team. Figure 26, Figure 27, Figure 28 and Figure 29 on the next four pages shows the original estimate, revised estimate and actual project schedules. According to Figure 27, the team started implementing FFT a week later than the date was scheduled because of the issues of binary number representation of the input. The team had spent a week to resolve this problem. In addition, the team discovered that the memory that was available was a single-port memory. This was a constraint because the team had decided to work with dual-port memory previously. Fortunately, the team was able to resolve the problems. 54 Figure 26: Project Schedules Part 1 55 Figure 27: Project Schedules Part 2 56 Figure 28: Project Schedules Part 3 57 Figure 29: Project Schedules Part 4 58 The following schedule, shown in Table 266, was the original estimate of deliverables schedule for the senior design course. Table 26 : Original Estimate of Deliverables Schedule Deliverable Due date September 17, 2004 Unbound project plan will be completed. October 5, 2004 Bound project plan will be completed and posted on project webpage. October 12, 2004 Poster will be completed. November 12, 2004 Unbound design report will be completed. December 15, 2004 Bound design report will be completed and posted on project webpage. March 31, 2005 Unbound final report will be completed May 4, 2005 Bound final report will be completed and posted on project webpage. Table 277 shows the revised estimate of deliverables schedule for the senior design course. Table 27 : Original Estimate of Deliverables Schedule Deliverable Due date September 17, 2004 Unbound project plan was completed. October 5, 2004 Bound project plan was completed and posted on project webpage. October 12, 2004 Poster was completed. November 12, 2004 Unbound design report was completed. December 15, 2004 Bound design report was completed and posted on project webpage. March 31, 2005 Unbound final report was completed May 4, 2005 Bound final report will be completed and posted on project webpage. Table 28 shows the actual deliverables schedule for the senior design course. Table 28 : Actual Deliverables Schedule Deliverable Due date September 17, 2004 Unbound project plan was completed. October 5, 2004 Bound project plan was completed and posted on project webpage. October 12, 2004 Poster was completed. November 12, 2004 Unbound design report was completed. December 15, 2004 Bound design report was completed and posted on project webpage. March 31, 2005 Unbound final report was completed May 4, 2005 Bound final report will be completed and posted on project webpage. 59 7. Closing Materials This section provides informational materials including project evaluation, commercialization, recommendations for additional work, lessons learned, risk and risk management, team contact information, closing summary, references, and appendixes. 7.1 Project Evaluation The project has several milestones and evaluation criteria to give a concrete measure used to evaluate how well the team completed the project. Each milestone was graded based on how the team performed. The criteria for judging each milestone is as follows (given in Table 29): Table 29: Milestone Evaluation Evaluation Result Numerical Score Met or Exceeded 100% Partially Met 75% Not Met 50% Not Attempted 0% The following items were identified as the project milestones. The criteria to evaluate the milestones are also given. Problem Definition – The project will be clearly defined through a project plan. The project plan will include the operating environment, intended uses, intended users, functional requirements, assumptions and limitations, constraint considerations, and possible problems. This milestone will be evaluated on how clear the problem definition is, and if it meets the customers desired definition. Research of Transforms – This milestone will be accomplished when the team has successfully researched various transforms and has decided upon which transforms should be implemented in the design. This milestone will be evaluated by analyzing the chosen transform’s ability to be implemented in hardware. Familiarity with Development Tools – In order to successfully design a chip, the team must familiarize itself with the tools needed in the design process. This milestone will be evaluated on the team’s knowledge of each tool, and each member’s ability to use the tool at an advanced skill level. Design of Chosen Algorithm – Once the team has chosen an algorithm, a design will be made that will implement the algorithm on a Xilinx™ FPGA. This milestone will be evaluated on the success of the design based on the designs functionality and size. Implementation of Algorithm – The transform algorithm chosen by the team will need to be successfully implemented and loaded into a Xilinx™ FPGA. The implementation should exhibit speed and efficiency. The team will be judged on this milestone by the success of the implementation of the algorithm in an FPGA design. 60 Testing of FPGA – The FPGA will need to be rigorously tested and benchmarked to judge performance, ability, and inability. Testing will be a valuable part of the project and will be evaluated on the team’s ability to vigorously test all areas of the design, and show the design strengths and weaknesses. Demonstration to Client – The team will present the end product and all deliverables to the client in the form of a presentation. The demonstration will be based on the team’s ability to display the full functionality of the product to the client. Final Documentation of Product – The team will prepare final documentation on the end product. The final documentation will be evaluated on the team’s ability to successfully document all phases of the project, in a form that is easy for the customer to understand and use. Table 30 summarizes the milestones of the project, as the well as their relative importance, and percentage of value to the overall project. These percentages show how the individual milestones were combined into the total project evaluation. In the project definition, the team stated an overall score of 80% or above would be a successful project. Table 30: Project Milestones and their Importance Milestone Importance Relative Percentage Problem Definition High 10% Research of Transforms High 10% Familiarity with Development Tools Medium 5% Design of Chosen Algorithm High 15% Implementation of Algorithm High 15% Testing of FPGA High 15% Demonstration to Client Low 5% Final Documentation of Project High 15% Total 100% Based on the evaluation process stated earlier, the project milestones have been evaluated using the scale in Table 30. Table 31 details the evaluation of the milestones of the project. 61 Table 31: Project Evaluation Milestone Evaluation Evaluation Percentage Problem Definition Met 10% Research of Transforms Exceeded 10% Familiarity with Development Tools Exceeded 5% Design of Chosen Algorithm Partially Met 11.25% Implementation of Algorithm Met 15% Testing of FPGA Met 15% Demonstration to Client Not Met 2.5% Final Documentation of Project Met 15% Total 84% As shown in the table above, the majority of the project’s milestones were met or exceeded expectation based on the evaluation criteria given. The design of the chosen algorithm was only partially met because a few problems with the design were discovered during the testing phase. Although these problems were small, a lower evaluation was given because more time spent on the design phase would have fixed these problems. Demonstration to the client has not been met at the time of this report. The team still intends to demonstrate the project to the client. The overall evaluation of the project is that the project is a success. Most of the milestones have exceeded expectation and the total project evaluation was 83.75%. The team considered 80% or above to indicate the project is successful, and that score has been exceeded. 7.2 Commercialization Software calculations of the Fourier transform are very time consuming and do not work well for use in real-time systems. Because of the need of FFT and the inefficiency of the software that computes FFT, the commercialization of this hardware design is possible and practical. The total design cost is $8,751.00, which is one-time cost. Any additional cost will include just the price of the chip, which is in the range of 20 to 30 dollars, according to Xilinx’s website, www.xilinx.com. The street selling price, with a 25% markup, would be around 25 to 35 dollars. The chip could play an important role in many digital signal processing applications including optics, telecommunications, speech, and image processing. The end FPGA design or chip could be marketed to real-time circuit applications that need a fast computation for Fourier transforms. It could also be marketed as a portable hardware that couples to any system to perform FFT calculations. One possible system is a piano keyboard that can transcribe the notes played in real-time. 62 7.3 Recommendations for Additional Work Although the project is considered a success, there are some areas of the project that could be expanded into additional work. 1. Integrate the RT and IRT into the FPGA design. The team was able to successfully design an FPGA that could calculate the FFT and IFFT, but was unable to also implement the RT and IRT into the design. Future work could include research into these algorithms to find the similarities between them and the FFT/IFFT. This would allow for an FPGA to be designed that could calculate four transforms on single dedicated chip. 2. Integrate the chip into the music translation system. The hardware chip to calculate the FFT could be used to bypass a software system that calculates the same transform on a digital keyboard used at Iowa State University. Future work could integrate the hardware chip into the system to improve translation time. 3. Improve and optimize the design of the FPGA. Although considerable time was spent analyzing and designing the FPGA, future work could include studying the team’s design and finding areas for improvement to speed up the calculation or decrease the size of the hardware. The team has recommended these three areas for additional work to future individuals or groups who would like to expand upon the project. 7.4 Lessons Learned This section provides the lessons learned by the team technically and non-technically, throughout the project. It included what went well, what did not go well, what technical knowledge was gained, what non-technical knowledge was gained, and what the team would do differently if the team had project to do over again. 7.4.1 What Went Well The team had several successes throughout the course of the project. The team was able to improve the efficiency of the pipelining of the overall system. The team was also able to improve the efficiency of the first designed n-bit complex multiplier. In addition, the team was able to reduce the number of stages needed to implement the 2-point butterfly frequency-decimated FFT. 7.4.2 What Did Not Go Well The team had a few difficulties throughout the course of the project. The team could hardly set up a meeting time other than the regular meeting time among the team members. This was because the team members were involved in on-site interviews, honor society activities, and projects and presentations for other classes. 7.4.3 Technical Knowledge Gained The team has gained knowledge in configuring an FPGA-chip to perform the FFT and IFFT. Knowledge and understanding of several transforms including FFT, IFFT, RT, and 63 IRT has also been gained through the project. Members of the team also learned VHDL to complete the project. 7.4.4 Non-technical Knowledge Gained Not everything that the team learned was technical. The team had also gained the experience of performing oral presentation and formal report documentation. The team learned the proper way of giving an oral presentation. 7.4.5 What Would Be Done Differently If Do Again The team would like to do a few aspects of the project differently. The team would like to start working on the algorithm structuring and coding phase earlier. By structuring the algorithm earlier, the team would have more time to implement and refine the design on the FPGA-chip. Also, the team would have researched and studied the specifics of the Xilinx chip better. The team faced several challenges in interfacing the transform design with the memory being used on the chip. A better initial understanding of the chip would have made the interface easier. 7.5 Risk and Risk Management This section describes the anticipated potential risks of the project and the solutions taken. It included the anticipated potential risks and planned management thereof, anticipated risks encountered and success in management thereof, unanticipated risks encountered, attempts to manage and success thereof, and resultant changes in risk management made because of encountered unanticipated risk. 7.5.1 Anticipated Potential Risks and Planned Management Thereof The first anticipated potential risk that the team planned for was the loss of a team member. In order to minimize the damage caused by this risk, the team documented their work and meeting details individually. The second anticipated potential risk was the loss of codes. In order to minimize the damage caused by this risk, every team member was keeping a copy of the completed codes. The third anticipated potential risk was using the developing tools that would become obsolete and lose its maintenance and support resources. In order to minimize the damage caused by this risk, the team was ensured by the client that the tools provided by the client would function properly. The fourth anticipated potential risk was using the technologies that would be difficult and time consuming to learn (VHDL and XilinxTM software). In order to minimize the damage caused by this risk, one of the team members had experience with VHDL before and the rest of the team picked up the VHDL during the fall semester. The team also used the tutorial of XilinxTM software that was given by the faculty advisor. 64 7.5.2 Anticipated Risks Encountered and Management Thereof Fortunately, no team member and previously completed codes were lost throughout the course of the project. However, the team encountered one of the anticipated potential risks that were mentioned earlier. The anticipated risk the team encountered was using the technologies that would be difficult and time consuming to learn. The team was not familiar with the FPGA-chip functionality and only a team member was familiar with VHDL. However, the team was able to resolve this problem by seeking help from the graduate student and faculty advisor. 7.5.3 Unanticipated Risks Encountered and Management Thereof The team encountered risks that were unanticipated. The first unanticipated risk was the difficulty to set up a proper meeting time other than the regular meeting time with the advisor. This was because the team members were involved in on-site interviews, honor society activities, and projects and presentations for other classes. In order to resolve this problem, the team established a meeting on the weekend to accommodate team member schedules. 7.5.4 Resultant Changes in Risk Management Made Due to the unanticipated risk the team encountered, the team decided to establish a meeting on weekend when it was necessary, to discuss and resolve the issues that arose throughout the course of the project. 7.6 Project Team Information This section provides project team information for the project advisor and student team members. 7.6.1 Faculty Advisor and Client Professor Arun Somani Iowa State University 2215 Coover Ames, IA 50011 – 0001 Phone # (515) 294-0442 Fax # (515) 294-3637 arun@iastate.edu 7.6.2 Student Team Members Sean Casey Electrical and Computer Engineering 218 Stanton Apt # 6 Ames IA, 50014 Phone # (515) 278-4429 caseysm@iastate.edu Ibrahim Ali 65 Electrical Engineering 2609 Ferndale Ave Apt #9 Ames, IA 50010 Phone # (515) 451-1500 imali@iastate.edu Chii-Aik Fang Electrical Engineering 246 N Hyland #311 Ames, IA 50014 Phone # (515) 296-2194 cafang@iastate.edu Christopher Miller Electrical and Computer Engineering 1232 Frederiksen Court Ames, IA 50010 Phone # (515) 572-7687 cbmiller@iastate.edu 7.7 Closing Summary Faster, smaller, and more efficient chip designs are needed to calculate the Fourier transform in real-time. The FPGA designed has far-reaching, important applications in the fields of electrical and computer engineering. The FPGA-chip that the team has designed can be used as a building block for larger systems that, for example, could accurately record a note strummed by a skilled musician. Digital signals processing was becoming an important part of everyday life, but the current technology to calculate transforms is trailing the demand for speed. Through research and study, an efficient algorithm for the Fourier transform, in this case, FFT was adapted from the software world and implemented into the design of an FPGA. The end product of this design was a hardware chip used for calculating Fourier transforms that would ultimately improve current industry technology. 66 Figure 30 : Circuit Board 67 7.8 References Hue-Sung Kim. “Towards adaptive balanced computing (ABC) using reconfigurable functional caches (RGCs)”, 2001. Kathryn Foutaian Gossett. “The Use of a Reconfigurable Functional Cache in a Digital Signal, Processor: power and performance”, 2002. Nathan A, VanderHorn, Michael T. Frederick, Jonathan A. Lucas, and Arun K. Somani. “Real-Time Radon Transform Engine Optimized for Hardware Implementation”, Dec 19, 2003. 7.9 Appendices The following appendices provide additional information relating to the project. A: Testing Forms - Forms to be used for testing and project evaluation. B: Testing Forms Completed – These forms were used in the actual testing. 68 A. Testing Forms The following appendix includes sample forms for recording test results. The forms correspond to the four testing stages developed in Section 5.1.5 of this document. The following pages contain: Unit testing form Integration testing form System testing form Acceptance testing form A-1 A.1 Unit Testing Form Name: _____________________ Date: _____________________ Component used in testing: _____________________ Description of test: Description of results: Description of problems/failures: Overall Testing (Circle One) Successful Failed A-2 A.2 Integration Testing Form Name: _____________________ Date: _____________________ Components used in testing: _____________________ Integration used in testing: ________________________ Description of test: Description of results: Description of problems/failures: Overall Testing (Circle One) Successful Failed A-3 A.3 System Testing Form Name: _____________________ Date: _____________________ Description of test: Description of results: Listing of results (speed, size, etc.): Description of problems/failures: Overall Testing (Circle One) Successful Failed A-4 A.4 Acceptance Testing Form Client Name: _____________________ Date: ___________________________ Description of test: Description of results: Overall Testing (Circle One) Successful Failed Overall system design (circle one): Incomplete Below Average Satisfactory Above Average Excellent A-5