; Abstract
Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Abstract

VIEWS: 26 PAGES: 81

  • pg 1
									                       FPGA Implemented Transforms

                                              Final Report
                                                   May05-31

                                          Advisor and Client:
                                           Arun K. Somani

                                      Team Members:
                            Christopher Miller  ( CprE & EE )
                            Sean Casey          ( CprE & EE )
                            Ibrahim Ali                 ( EE )
                            Chii Aik Fang               ( EE )




                              REPORT DISCLAIMER NOTICE
DISCLAIMER: This document was developed as a part of the requirements of an electrical and computer engineering
course at Iowa State University, Ames, Iowa. This document does not constitute a professional engineering design or a
professional land-surveying document. Although the information is intended to be accurate, the associated students,
faculty, and Iowa State University make no claims, promises, or guarantees about the accuracy, completeness, quality,
or adequacy of the information. The user of this document shall ensure that any such use does not violate any laws
with regard to professional licensing and certification requirements. This use includes any work resulting from this
student-prepared document that is required to be under the responsible charge of a licensed engineer or surveyor. This
document is copyrighted by the students who produced this document and the associated faculty advisors. No part may
be reproduced without the written permission of the senior design course coordinator.


                                               April 27, 2005
                                                 Table of Contents

1.     List of Figures ......................................................................................... iv
2.     List of Tables ........................................................................................... v
3.     List of Symbols and Definitions ............................................................. vi
4.     Introductory Material ............................................................................... 1
     4.1             Project Description ..................................................................................... 1
     4.2             Executive Summary ..................................................................................... 1
       4.2.1         Need for the Project .................................................................................... 1
       4.2.2         Actual Project Activities ............................................................................. 2
       4.2.3         Final Results................................................................................................ 2
       4.2.4         Recommendations for Follow-On Work .................................................... 2
     4.3             Acknowledgements ...................................................................................... 2
     4.4             Problem Statement ...................................................................................... 3
       4.4.1         General Problem Statement ........................................................................ 3
       4.4.2         General Solution Approach......................................................................... 3
     4.5             Operating Environment .............................................................................. 3
     4.6             Intended Users ............................................................................................ 3
     4.7             Intended Uses .............................................................................................. 3
     4.8             Assumptions ................................................................................................ 3
     4.9             Limitations .................................................................................................. 4
     4.10            Expected End product and Other Deliverables .......................................... 5
5. Project Approach and Results .................................................................. 6
     5.1             Functional Requirements ............................................................................ 6
     5.2             Design Requirements .................................................................................. 6
       5.2.1         Design Constraints ...................................................................................... 7
     5.3         Approach Used............................................................................................ 7
       5.3.1     Technical Approach Considerations and Results ....................................... 7
          5.3.1.1 Programming Language .......................................................................... 7
          5.3.1.2 Software/hardware used for design ......................................................... 8
          5.3.1.3 Transform chosen to implement ............................................................. 9
          5.3.1.4 Design of FPGA .................................................................................... 10
     5.4         Detailed Design ........................................................................................ 11
       5.4.1     Discrete Fourier transform Algorithm ...................................................... 11
          5.4.1.1 Complexity of the Direct DFT Computation ........................................ 11
       5.4.2     Fast Fourier Transform Algorithm............................................................ 12
          5.4.2.1 Complexity of FFT Algorithm .............................................................. 12
          5.4.2.2 Frequency-decimated FFT Algorithm .................................................. 14
          5.4.2.3 Inverse FFT Algorithm (IFFT) ............................................................. 16
       5.4.3     Detailed Design of the FFT Algorithm on an FPGA Chip ....................... 17


                                                                                                                                    i
         5.4.3.1    Overall Description of System .............................................................. 17
         5.4.3.2    Overall Design ...................................................................................... 19
         5.4.3.3    Transform Control Design .................................................................... 21
         5.4.3.4    Transform Control Pipeline Design Description .................................. 22
         5.4.3.5    Memory Block ...................................................................................... 26
         5.4.3.6    Address Generation Block .................................................................... 27
         5.4.3.7    PC.......................................................................................................... 29
         5.4.3.8    Address Generator ................................................................................ 30
         5.4.3.9    Multiplier Block .................................................................................... 31
         5.4.3.10     Adder/Subtractor Block .................................................................... 34
   5.5           Implementation Process Description ........................................................ 37
   5.6           Testing of the End Product and its Results ............................................... 38
   5.7           End Results of the Project ......................................................................... 41
     5.7.1       Research of Radon Transform .................................................................. 42
     5.7.2       Final Status of Major Components ........................................................... 47
6. Estimated Resources and Schedules ...................................................... 48
   6.1           Estimated Resources ................................................................................. 48
     6.1.1       Personnel Effort Requirements ................................................................. 48
     6.1.2       Other Resource Requirements .................................................................. 51
     6.1.3       Financial Requirements ............................................................................ 52
   6.2           Schedules................................................................................................... 54
7. Closing Materials ................................................................................... 60
   7.1           Project Evaluation .................................................................................... 60
   7.2           Commercialization .................................................................................... 62
   7.3           Recommendations for Additional Work .................................................... 63
   7.4           Lessons Learned........................................................................................ 63
     7.4.1       What Went Well ....................................................................................... 63
     7.4.2       What Did Not Go Well ............................................................................. 63
     7.4.3       Technical Knowledge Gained ................................................................... 63
     7.4.4       Non-technical Knowledge Gained ............................................................ 64
     7.4.5       What Would Be Done Differently If Do Again ........................................ 64
   7.5           Risk and Risk Management ....................................................................... 64
     7.5.1       Anticipated Potential Risks and Planned Management Thereof............... 64
     7.5.2       Anticipated Risks Encountered and Management Thereof ...................... 65
     7.5.3       Unanticipated Risks Encountered and Management Thereof ................... 65
     7.5.4       Resultant Changes in Risk Management Made ........................................ 65
   7.6           Project Team Information ......................................................................... 65
     7.6.1       Faculty Advisor and Client ....................................................................... 65
     7.6.2       Student Team Members ............................................................................ 65




                                                                                                                                 ii
   7.7           Closing Summary ...................................................................................... 66
   7.8           References ................................................................................................. 68
   7.9           Appendices ................................................................................................ 68
A. Testing Forms ...................................................................................... A-1
   A.1           Unit Testing Form ....................................................................................A-2
   A.2           Integration Testing Form .........................................................................A-3
   A.3           System Testing Form ................................................................................A-4
   A.4           Acceptance Testing Form ........................................................................A-5




                                                                                                                             iii
1. List of Figures
Figure 1: FPGA ................................................................................................................... 2
Figure 2: Functional Block Diagram .................................................................................. 6
Figure 3: The butterfly of a 2-point frequency-decimated FFT ........................................ 12
Figure 4: The butterfly of a 4-point frequency-decimated FFT ........................................ 13
Figure 5: Three stages in the computation of a 8-point frequency-decimated FFT.......... 15
Figure 6: FFT algorithm for computing 8-point input signal ........................................... 16
Figure 7: IFFT algorithm for computing 8-point input signal .......................................... 17
Figure 8: High-level block diagram of the FFT Implementation ..................................... 18
Figure 9: High Level Xilinx Chip Layout......................................................................... 19
Figure 10: Transform Wrapper Schematic Symbol .......................................................... 21
Figure 11: High Level Design of Transform Control ....................................................... 22
Figure 12: Memory Block Control ................................................................................... 26
Figure 13: PC Schematic Symbol ..................................................................................... 30
Figure 14: Address Generator Schematic Symbol ............................................................ 30
Figure 15: n-bit Complex Multiplier................................................................................. 32
Figure 16: 16-bit Complex Multiplier............................................................................... 33
Figure 17: n-bit Complex Carry-Lookahead Adder.......................................................... 34
Figure 18: 16-bit Complex Carry-Lookahead Adder........................................................ 35
Figure 19: Adder/Subtractor Block ................................................................................... 36
Figure 20: Testing Plan ..................................................................................................... 38
Figure 21: Representation of an image in x-y plane ......................................................... 42
Figure 22: Representation of Strips for Summation along a Single Direction, ө ............. 43
Figure 23: Overlap between Strips at Neighboring Angles is Depicted ........................... 44
Figure 24: Illustration of the Segments Computed in the First Three Passes ................... 45
Figure 25: Mapping DRT Algorithm into a Butterfly for N =16 Image ........................... 46
Figure 26: Project Schedules Part 1 .................................................................................. 55
Figure 27: Project Schedules Part 2 .................................................................................. 56
Figure 28: Project Schedules Part 3 .................................................................................. 57
Figure 29: Project Schedules Part 4 .................................................................................. 58
Figure 30 : Circuit Board .................................................................................................. 67




                                                                                                                                 iv
2. List of Tables
Table 1: Pros and Cons of Verilog ...................................................................................... 8
Table 2: Pros and Cons of VHDL ....................................................................................... 8
Table 3: Pros and Cons of Xilinx™ .................................................................................... 8
Table 4: Pros and Cons of Altera™ .................................................................................... 8
Table 5: Pros and Cons of Fast Fourier Transform............................................................. 9
Table 6: Pros and Cons of Radon Transform...................................................................... 9
Table 7: Pros and Cons of Pipeline Design ...................................................................... 10
Table 8: Pros and Cons of Combinational Design ............................................................ 10
Table 9: Timing of Transform Control Components ........................................................ 23
Table 10: Original Space-Time Diagram for Pipeline ...................................................... 24
Table 11: Revised Space-Time Diagram .......................................................................... 25
Table 12: Level and Memory Block Operation ................................................................ 27
Table 13: Address Generation for FFT/IFFT.................................................................... 28
Table 14: Twiddle Factor Addresses ................................................................................ 29
Table 15: Signals in Adder/Subtractor Block ................................................................... 36
Table 16: End Result of Project Components ................................................................... 47
Table 17 : Original Estimate of Personnel Effort Requirements ...................................... 48
Table 18 : Revised Estimate of Personnel Effort Requirements....................................... 49
Table 19 : Actual Personnel Effort Requirements ............................................................ 50
Table 20 : Original Estimate of Other Resource Requirements ....................................... 51
Table 21 : Revised Estimate of Other Resource Requirements ........................................ 51
Table 22 : Actual Other Resource Requirements ............................................................. 51
Table 23 : Original Estimate of Financial Requirements.................................................. 52
Table 24 : Revised Estimate of Financial Requirements .................................................. 52
Table 25 : Actual Financial Requirements........................................................................ 53
Table 26 : Original Estimate of Deliverables Schedule .................................................... 59
Table 27 : Original Estimate of Deliverables Schedule .................................................... 59
Table 28 : Actual Deliverables Schedule .......................................................................... 59
Table 29: Milestone Evaluation ........................................................................................ 60
Table 30: Project Milestones and their Importance .......................................................... 61
Table 31: Project Evaluation ............................................................................................. 62




                                                                                                                           v
3. List of Symbols and Definitions
Altera™ - Manufacturer of software and hardware for FPGA design

ASIC – Application-specific integrate circuit

Balanced computing - use of dynamic resource of on-chip cache memory to offset gate
usage

DFT – Discrete Fourier transform

DTFT – Discrete-time Fourier transform

FIFO – First in first out

FPGA – Field-programmable gate array

FFT - Fast Fourier transform

IFFT – Inverse fast Fourier transform

FT - Fourier transform

HDL - Hardware description language

IDFT - Inverse discrete Fourier transform

IRT - Inverse Radon transform

LUT – Look-up table

PC – Program counter

ModelSim – Software used to simulate VHSIC code

OPB – On-chip peripheral bus

RT - Radon transform

Transform engine - A device that computes transform computations

VHDL - VHSIC hardware description language - A programming language used to
design hardware

VHSIC - Very high speed integrated circuit


                                                                                      vi
Xilinx™ - Manufacturer of software and hardware for FPGA design

Xilinx MultiLinx – Xilinx communication device used to download FPGA design onto a
       Xilinx board




                                                                                vii
4.      Introductory Material
This section provides an overview of the project by defining the problem, operating
environment, intended users and uses, assumptions, limitations, deliverables and
expected end product.


4.1     Project Description
Software calculations of FFT and IFFT are very time consuming due to their use of
complex trigonometric functions, and do not work well for use in real-time systems. In
this project, the team has designed an FPGA to calculate the discrete FFT and IFFT. This
hardware implementation provides a faster method of calculating these transforms than
software is capable of.


4.2     Executive Summary
The following document is a final report for the May 05-31 senior design project “FPGA
Implemented Transforms”, which details and summarizes the design of FFT and IFFT
calculation engine.


4.2.1         Need for the Project
Mathematical calculations such as the Fourier transform (FT) play an important role in
many digital signal processing applications including telecommunications, and image
pattern extraction. Applications based on FT require high computational power, which
gives rise to the need to experiment with efficient algorithms. Reconfigurable hardware
devices in the form of field-programmable gate arrays (FPGAs) have been proposed as a
way of obtaining high performance, more efficient implementation, and maximum speed.
The goal of this project was to implement the design of a hardware device that would
calculate, in real-time, FFT and IFFT. Implementation was done using a Xilinx™ FPGA
(similar to Figure 1). This design can be used as a component in more complex projects,
such as the design of a piano keyboard system that could transcribe the notes played in
real-time, or any other system in which real-time processing of the discrete Fourier
transform is desired.




                                                                                      1
                                      Figure 1: FPGA

4.2.2          Actual Project Activities
The main activity of this project was designing an FPGA that could calculate the FFT and
IFFT in hardware. This activity included studying the FFT and IFFT algorithms and
looking for ways to improve their speed an efficiency. The project also included
studying the design and layout of the hardware, in order to maximize speed while
minimizing chip size.


4.2.3          Final Results
The final result of this project is an FPGA design that can calculate a 1024-point FFT and
IFFT. The design has been optimized for speed, size, and efficiency. The design also
includes several smaller sub-blocks that can be used as components in larger systems.


4.2.4          Recommendations for Follow-On Work
The FPGA design the team created can be researched and studied to optimize speed and
size in calculating the FFT and IFFT. Also, other transforms, including the RT and IRT,
can be researched and studied for similarities to the FFT and integrated into the design to
form a chip that can calculate multiple transforms. The FPGA can be implemented into
systems that currently use software to calculate the FFT/IFFT, as a way to speed up the
calculation process.



4.3     Acknowledgements
The design group members would like to thank Professor Arun Somani, who has
contributed his expertise on a variety of subjects throughout the project as well as
coordinating and guiding the team through the project planning process. Professor
Somani also provided the hardware for the project implementation. The team also would
like to thank to Ganesh Surbramanian and Michael Frederick, graduate students in




                                                                                         2
electrical and computer engineering, for their time and resource contributions to this
project.

4.4     Problem Statement
This section defines the general problem statement and the general solution approach that
was used by the team.


4.4.1          General Problem Statement
Software calculations of the Fourier transform are very time consuming due to their use
of complex trigonometric functions, and are too slow for use in real-time systems.


4.4.2          General Solution Approach
In this project, the team used Xilinx™ FPGAs to implement the hardware design for
calculating, in real-time, the discrete Fourier transform (DFT). This hardware
implementation provided a faster method of calculating the DFT. The end product of the
design was a transportable implementation using hardware description language (HDL),
which could be used in more complex projects.


4.5     Operating Environment
A controlled lab is the intended operating environment for this product. Since the
product is a hardware design, it can be used and modified on a computer that has the
appropriate software. The design can also be studied on a chip it is downloaded to, or
built exclusively for the design.

4.6     Intended Users
The specific intended users of this project are graduate students in electrical and
computer engineering who would be designing more complex systems with a need to
perform time-intensive transform calculations. Additional end users could be ASIC
(application-specific integrated circuits) designers needing to do similar calculations, and
would be able to use the VHDL sub-blocks and the design methodology used to build a
complete transform engine.

4.7     Intended Uses
The transform engine designed was expected to be used as a component in larger systems
such as the design of a piano keyboard system that could transcribe the notes played in
real-time, or any other system in which real-time processing of the discrete Fourier
transform was desired.

4.8     Assumptions
In order for this project to be a success, several assumptions were made about the project
including the hardware and software availability, and the feasibility of the project as a
whole. The following assumptions were made:


                                                                                          3
     It is possible to implement real-time transform calculation engines in hardware
     Xilinx™ produces an FPGA with the needed gate count to implement such
      algorithms
     All hardware and software needed for development and testing is provided by the
      client
     Each chip designed implements one or more transforms
     All numbers are represented as real/imaginary pairs, using 2’s complement, fixed
      decimal notation.
     The number of inputs is a power of 2.
     The project can be held to a budget of $150.
     The project can be completed in two semesters.

4.9      Limitations
For this project to proceed as planned, the limitations imposed by the technology being
used were considered. The following limitations were identified in the project:
 The clock speed of the circuits was limited to the FPGAs specified maximum clock
   speed, thus the algorithms chosen must compute the transforms as efficiently as
   possible to achieve real-time status
 The number of I/O pads available for data was set for each FPGA, thus I/O formats
   must be optimized
 The client had only three versions of Xilinx™ FPGAs on which testing was done,
   thus the designs must be optimized for those specific chips
 Knowledge of various discrete transform algorithms by the team
 Knowledge of VHDL and specific Xilinx™ functions by the team




                                                                                     4
4.10 Expected End product and Other Deliverables
The deliverables of this project include the following:
 Design Methodology: A method for designing real-time transform engines by using
   parallel computations and generic hardware sub-blocks arranged to produce the
   desired output. The design methodology was completed by May 4, 2005.
 Sub-Block VHDL Code: Implementations of generic blocks needed for transform
   calculations. Examples of such blocks include memory controllers, on-chip storage,
   and computational sub-blocks including a PC, address generator, multiplier, and
   adder. The sub-block VHDL code will be delivered to the client by May 4, 2005.
 Transform VHDL Code and finalized FPGA: Implementation of two transform
   engines: FFT and IFFT. The codes for the engines were composed of several VHDL
   sub-blocks, as defined above, with additional logic to put them together. The codes
   were downloaded to the FPGA to produce a functioning transform engine with the
   desired output within the targeted time constraints. The finalized FPGA was
   demonstrated to the client by April 31, 2005, and the codes delivered by May 4, 2005.
 Final Report: Because the implemented transform engine would be used in larger
   systems, documentation of the finalized FPGA was critical. This final report included
   documentation of the design methodology used, the VHDL sub-block designs, and
   the overall implemented transform design. The final report was submitted to the
   client by May 4, 2005.




                                                                                      5
        5.     Project Approach and Results
        The following sections provides a detailed description of the team’s approach and
        product results.

        5.1    Functional Requirements
        The hardware design fulfilled certain functional requirements that define exactly what the
        end product should and should not do. The end product must accomplish the following:
         User input - The hardware received complex numbers as input from the user.
         Initiate calculation - The user commands the chip to perform the calculation.
         Perform high-speed calculations - The hardware outputs the transform (either FFT
           or IFFT) of inputted complex numbers. The hardware design must achieve high
           efficiency
         Termination – The chip would indicate when the computation was complete.
         User output – The user would retrieve the complex output numbers from the chip.

        Figure 2 depicts these operations in a very high-level block diagram.


                                                  Hardware performs
                                                 high-speed calculation
                                                         of FT




                                                                                                      User
   User                                                                                             retrieves
inputs the                                                                         FT of           the output
                   Complex              Real                              Real                      FT of the
 complex                                                                          complex
 numbers           numbers                            FPGA                        numbers           complex
                                     Imaginary                        Imaginary                     numbers




                       Chip is                                                    Chip indicates
                 commanded to do                                                    when the
                  the computation                                                 computation is
                                                                                    complete
                                     Figure 2: Functional Block Diagram

        5.2    Design Requirements
        The requirements of this hardware design were developed as a result of the problem
        statement. These design requirements have been expanded and clarified during the
        project. Design requirements were to provide the following:



                                                                                                   6
     Fast method of calculating FT- Software calculations of FT are time-consuming due
      to their use of complex trigonometric functions. The hardware design (FPGA-based
      transform engine) described herein use application specific hardware to calculate
      those complex trigonometric functions and output their FT. The completely
      hardware-based computation provides an extremely fast method of calculating the
      Fourier transform.
     Component in complex systems- The hardware end-product enables the designed
      FPGA to be used as a component in complex systems in which real-time transforms
      are needed. A piano keyboard system that can transcribe the notes played in real-time
      is one example of those large systems. Systems such as these are currently being
      designed by graduate students in electrical and computer engineering.

5.2.1            Design Constraints
The constraints of the project were derived from the assumptions and limitation
mentioned earlier. These constraints are:
 Speed vs. size– Trade between time of calculating FT and the size of the FPGA (gate
   count) would be limited by the process time and the size of inputted signals. The end
   product depended greatly on the speed of the operation and the amount of gates used
   in the FPGA design.
 I/O format - All numbers were represented as real/imaginary pairs, using 2’s
   complement, fixed decimal notation. The number of inputs is a power of 2.
 Functionality - The FPGA was only responsible for the calculation of the FFT, and
   its inverse, IFFT.
 Finances– With a $150 budget the team relied on the resources already available as
   students at Iowa State University. The team had to ensure access to the needed labs
   for the sufficient amount of time.
 Time– With only two semesters to complete the project, the team needed to budget
   time effectively in order to complete the project.


5.3      Approach Used
The approach used section includes the following components to insure a high probability
of project success.

5.3.1            Technical Approach Considerations and Results
In order to complete the design of the project, the team has considered several different
technological approaches, weighed their advantages, and disadvantages, and decided on
which technological approach would be most beneficial to the project. The approaches
considered were listed in the following section.

5.3.1.1       Programming Language
The team had two options for programming language: Verilog of VHDL. The trade-offs
are summarized in Table 1 and Table 2.




                                                                                         7
                           Table 1: Pros and Cons of Verilog
Advantages of Verilog                       Disadvantages of Verilog
 IEEE standard                              Limited support of       system   level
 Supported by EDA vendors                     modeling
                                             Limited simulation


                            Table 2: Pros and Cons of VHDL
Advantages of VHDL                          Disadvantages of VHDL
 IEEE Standard                              Harder to learn
 Supported by EDA Vendors                   Not as easy to use
 High support for modeling
 Simulation is far more comprehensive
 VHDL preferred by customer
 Readily available resources

Result: the team used VHDL to code the hardware. VHDL was chosen mainly because it
was the preferred language of the client. VHDL was also chosen because resources and
guides for VHDL were readily available. VHDL provides better functionality in
modeling systems and simulation for the FPGA and also provides advantages over
Verilog.

5.3.1.2      Software/hardware used for design
The team had two options for tools to use to help design the FPGA. Both Xilinx™ and
Altera™ were available at Iowa State. The tradeoffs are summarized in Table 3 and
Table 4.
                           Table 3: Pros and Cons of Xilinx™
Advantages of Xilinx™                      Disadvantages of Xilinx™
 Readily available at Iowa State           Never been used by team members
 Numerous        resources     available,
   including examples of similar projects
   completed using Xilinx™
 Numerous tutorials to help learn
 Xilinx™ boards provided by customer



                           Table 4: Pros and Cons of Altera™
Advantages of Altera™                       Disadvantages of Altera™
 Readily available at Iowa State            Lack of resources available         for
 Previous experience of team members          complex projects



                                                                                   8
Result: The team used Xilinx™ to complete the project. Even though the team had
experience using Altera in the past, the amount of resources available for Xilinx™ was
much greater. These resources included complex designs similar to the team’s project
and were important to help the team learn the complex tools needed to complete the
project.

5.3.1.3        Transform chosen to implement
The team had narrowed down the different types of transforms to implement into two
choices: fast Fourier transform, and Radon transform. The tradeoffs are summarized in
Table 5 and Table 6.

                     Table 5: Pros and Cons of Fast Fourier Transform
Advantages of fast Fourier transform  Disadvantages of fast Fourier transform
 The team has experience using FFT    Fast Fourier transform has limited room
 The team has studied the FFT, and      to maximize speed
   developed an algorithm that can be
   implemented into hardware


                       Table 6: Pros and Cons of Radon Transform
Advantages of Radon transform           Disadvantages of Radon transform
 Growing applications in the avionics  Customer already has working design
   field                                   of an FPGA to calculate the Radon
 Limited past research into Radon         transform
   transform has been done, so research
   would be innovative

Result: The team designed an FPGA to calculate the fast Fourier transform. Due to the
fact the customer already had a design for an FPGA to find the Radon transform; the
team only designed an FPGA to calculate the fast Fourier transform. The customer had
expressed interest in the design of an FPGA of a fast Fourier transform, and the team
would improve the speed as much as possible.




                                                                                    9
5.3.1.4     Design of FPGA
The design of the FPGA could be done in two different ways, as either a pipelined
design, or a combinational design. The tradeoffs are summarized in Table 7 and Table 8.
                          Table 7: Pros and Cons of Pipeline Design
Advantages of Pipeline Design           Disadvantages of Pipeline Design
 A pipeline design will improve the  A pipeline design is complex to design,
   speed and efficiency of the hardware    implement and test
 The FFT and IFFT algorithm support a
   pipeline design

                       Table 8: Pros and Cons of Combinational Design
Advantages of Combinational Design       Disadvantages of Combinational Design
 A combinational design will be easy to  No advantages in speed or size are
   implement, design, and test              gained by using a combinational design
                                          Some delay is encountered because of
                                            the FFT algorithm

Result: The team designed a pipelined FPGA to calculate the fast Fourier transform.
The benefits of increasing speed and efficiency, as well as decreased size outweighed the
simplicity of a combinational design.




                                                                                      10
5.4      Detailed Design
The design of the project includes many different areas of hardware, software, and fast
Fourier transforms. The details of the design are listed below:

5.4.1            Discrete Fourier transform Algorithm
The discrete Fourier transform (DFT) is defined as the frequency samples of the Fourier
transform. This should not be confused with the discrete-time Fourier transform. They are
not the same! Before examining the DFT in detail, consider the following two cases:

      1. x[n] is an infinite sequence
      A discrete time signal x[n] can be recovered unambiguously from its Fourier
      transform through the inverse Fourier transform. In order to do this, the values of its
      Fourier transform for all frequency in the range [-π, π] should be known. However,
      knowing the values in this frequency range is not sufficient to recover the signal,
      since the signal x[n] is an infinite sequence in general.

      2. x[n] is a finite sequence
      If x[n] had a finite amount of terms, say {0 ≤ n ≤ N-1}, then knowing the values of
      DFT at N frequency points would be sufficient to recover the signal, if these
      frequency points were chosen properly. In other words, the Fourier transform of this
      signal could be sampled at N points and the signal could be recovered from these
      samples. One way to justify this claim is given as follows: The Fourier transform is a
      linear operation. Therefore, the values of the Fourier transform at N frequency points
      provide N linear equations at N unknowns. The N unknowns here refer to the signal
      values. From algebra, such a system of equations has a unique solution if the
      coefficients are not singular. Therefore, if the frequency points are chosen to satisfy
      this condition, the signal values could be computed unambiguously.

      The sampled Fourier transform of a finite duration, discrete time signal is known as
      the discrete Fourier transform. The DFT contained a finite number of samples that
      equal the number of input signal samples. By definition, the DFT is denoted as

                      N 1
             X[k] =    x[n] WNkn
                      n 0
                                    where WN = e-j2π / N and 0 ≤ k ≤ N-1 -------[1]



5.4.1.1       Complexity of the Direct DFT Computation
For an input sequence of length N, the number of arithmetic operations in direct
computation of the DFT is proportional to N2. (Direct DFT computation means
computing the DFT using Equation 1) In general, the DFT operation is a multiplication
of a complex N*N matrix by a complex N-dimensional vector. Therefore, the operation
requires N2 complex multiplications and N(N-1) complex additions. Since the elements
of the DFT matrix on the first row and the first column are 1, the multiplication




                                                                                          11
operations could be reduced by 2N-1. Now, the DFT operation involves N2 – 2N +1
which is (N-1)2 complex multiplications and N(N-1) complex additions.

5.4.2          Fast Fourier Transform Algorithm
Fast Fourier transform (FFT) algorithm is another computational scheme for computing
the DFT. As its name implies, FFT algorithm can be employed to compute the DFT faster
by reducing its computational complexity. This invention, by Cooley and Tukey in 1965,
was a major breakthrough in digital signal processing. They discovered that when the
DFT of length N, is a factorable number, the number of DFT operations could be
decomposed into a number of DFTs of shorter length. They showed that the total number
of operations needed to compute the shorter DFT was less than that of direct computation
of DFT.

5.4.2.1        Complexity of FFT Algorithm
Each of the shorter DFTs could be decomposed into an even shorter DFT until all the
DFT were of prime lengths, the prime factor of N. The DFT of prime lengths were then
computed directly. The total number of operations in this scheme depends on the
factorization of N into a prime factor. In this project, N was chosen to be an integer
power of 2. Therefore, the total number of operations was N*log2N. N*log2N is much
smaller than N2. Since DFT was decomposed until all of the DFT computations were of
prime lengths, Cooley and Tukey discovered the 2-point DFT butterfly (Figure 3).

                      x[0]                                  X[0]



                                      -1
                      x[1]                                  X[1]

                 Figure 3: The butterfly of a 2-point frequency-decimated FFT




In Figure 3, the following conventions are used:
     A line with an arrow indicates signal flow.
     A circle around a ‘+’ sign, with two or more lines leading to it, indicates addition.
     A constant number above a line indicates multiplication of the signal flowing in
       that line by the constant number.




                                                                                        12
To show visually the reduction of total number of operations, refer to the following
example:
Example:
                                                               3
Suppose x[n] had length N = 4, then its DFT, X[k] =            x[n] W4kn
                                                              n 0
          3
X[0] =    x[n] = x[0] + x[1] + x[2] + x[3]
         n 0
          3
X[1] =    x[n] W4n = x[0] + x[1]W4 + x[2]W42 + x[3]W43
         n 0
          3
X[2] =    x[n] W42n = x[0] + x[1]W42 + x[2]W44 + x[3]W46
         n 0
          3
X[3] =    x[n] W43n = x[0] + x[1]W43 + x[2]W46 + x[3]W49
         n 0
By inspection,

         Total number of operations = 9 multiplications + 16 additions = 25

As depicted in Figure 4, the DFT is decomposed into a shorter DFT, length-2 DFT in this
case. Therefore,

         Total number of operations = 1 multiplication + 8 additions = 9

The operations could be reduced significantly if employing the FFT algorithm.



                x[0]                                                                  X[0]



                                                                   -1
                x[2]                                                                  X[1]




                x[1]                                                                  X[2]


                                                  W4
                                                                   -1
                x[3]                                                                  X[3]

                       Figure 4: The butterfly of a 4-point frequency-decimated FFT




                                                                                             13
According to Figure 4, the values of the DFT are the following:
           3
X[0] =    x[n] = x[0] + x[1] + x[2] + x[3]
          n 0
           3
X[1] =    x[n] W4n = x[0] + x[1]W4 + x[2]W42 + x[3]W43
          n 0
       = x[0] + x[1]W4 - x[2] - x[3]W4
           3
X[2] =    x[n] W42n = x[0] + x[1]W42 + x[2]W44 + x[3]W46
          n 0
       = x[0] - x[1] + x[2] - x[3]
           3
X[3] =    x[n] W43n = x[0] + x[1]W43 + x[2]W46 + x[3]W49
          n 0
       = x[0] - x[1]W4 - x[2] - x[3]W4

Simplification of the twiddle factors:
Using Euler’s formula, ejw = cos(w) + j*sin(w)
For N = 4
W4 = e-j2π / 4 = cos(π/2) – j*sin(π/2)
    =-j
    3
 W4 = - W4 = j
W45 = W4 = - j
W47 = - W4 = j
W49 = W4 = - j
In general, W42n+1 =         - j n = even
                              j n = odd

As shown in the above example, the twiddle factors, WNkn can be pre-calculated and
stored in a look-up table. Every time the twiddle factor is needed, it can be retrieved from
the look-up table.


5.4.2.2        Frequency-decimated FFT Algorithm
The frequency-decimated FFT algorithm was obtained by using the divide-and-conquer
approach. To derive the algorithm, the DFT formula was divided into two summations,
one of which involves the summation of the first N/2 data points and the other summation
involves the last N/2 data points.

               N 1
X[k]     =      x[n] WNkn
               n 0
               N / 2 1           N 1
         =        x[n] WNkn +  x[n] WNkn
                 n 0            n N / 2
                                                    let n = r + N/2
               N / 2 1          N / 2 1
         =        x[n] WNkn +
                 n 0
                                   x[r  N / 2] WNk(r+N/2)
                                  r 0




                                                                                         14
             N / 2 1              N / 2 1
        =      x[n] WNkn +
              n 0
                                     x[r  N / 2] WNkr WNkN/2
                                    r 0
                                                                     Note: WNkN/2 = (-1)k
             N / 2 1                         N / 2 1
        =      x[n] WNkn + (-1)k
              n 0
                                                x[r  N / 2] WNkr
                                               r 0
            N / 2 1
        =     {x[n] +(-1)kx[n+N/2]}WNkn
             n 0




Thus, the decimated DFT can be divided into even and odd samples by the following:

            N / 2 1
X[2k] =       {x[n] +(-1)k x[n+N/2]}WN2kn
             n 0
            N / 2 1
        =     {x[n] +x[n+N/2]}WN/2kn
             n 0
              N / 2 1
x[2k+1] =       {x[n] +(-1)kx[n+N/2]}WN(2k+1)n
               n 0
              N / 2 1
        =       {x[n] - x[n+N/2]}WN/2kn WNn
               n 0


The DFT of the input signal is computed using the frequency-decimated FFT algorithm.
Adders, multipliers, and registers are used to perform the computation. Figure 5 shows
the breakdown in computation of DFT using the frequency-decimated FFT algorithm.

 x[0]                                                                   2-point DFT          X[0]

 x[1]                                                                                        X[4]
                                                         4-point
 x[2]
                                                          DFT
                                                                        2-point DFT          X[2]

 x[3]                                                                                        X[6]
                         8-point
                          DFT
 x[4]                                                                   2-point DFT          X[1]

 x[5]                                                                                        X[5]
                                                         4-point
 x[6]
                                                          DFT                                X[3]
                                                                        2-point DFT
 x[7]                                                                                        X[7]



            Figure 5: Three stages in the computation of a 8-point frequency-decimated FFT




                                                                                                15
5.4.2.3       Inverse FFT Algorithm (IFFT)
Once the FFT algorithm is obtained, the IFFT is computed by going backward in the FFT
algorithm. The IFFT is described by the following equation:

                N 1
x[n] = 1/N       X [k ] WNkn,
                k 0
                                 where n = 0,1,….N-1


The procedures for an 8-point signal were taken as indicated below:
    Conjugate the FFT coefficients X[k] to obtain X*[k]
    Compute the FFT of X*[k] as shown in Figure 6


        X*[0]                                                       2-point DFT         x*[0]

        X*[1]                                                                           x*[4]
                                                 4-point
        X*[2]
                                                  DFT
                                                                    2-point DFT         x*[2]

        X*[3]                                                                           x*[6]
                          8-point
                           DFT
        X*[4]                                                       2-point DFT         x*[1]

        X*[5]                                                                           x*[5]
                                                 4-point
        X*[6]
                                                  DFT                                   x*[3]
                                                                    2-point DFT
        X*[7]                                                                           x*[7]



Figure 6: FFT algorithm for computing 8-point input signal



         Scale the resulted x*[n] from Figure 6 by 1/N.
         Conjugate x*[n] to obtain the IFFT x[n], if the signal is real-valued; this final
          conjugation operation is not needed.

Those procedures represented the IFFT algorithm and are shown all together in Figure 7:




                                                                                        16
 Input signal                               FFT algorithm                                 IFFT of signal



                        X*[0]                               2-point DFT   x*[0]

                        X*[1]                                             x*[4]
                                               4-point                            (1/N)
                        X*[2]
                                                DFT
 Re(X[k])                                                   2-point DFT   x*[2]              Re(x[n])
                        X*[3]                                             x*[6]
                                  8-point
                                   DFT
                        X*[4]                               2-point DFT   x*[1]
            (-1)                       2454                                       (1/N)
                        X*[5]                                             x*[5]
 Im(X[k])                                      4-point                                       Im(x[n])
                                                DFT
                        X*[6]                               2-point DFT   x*[3]   (-1)
                        X*[7]                                             x*[7]




                   Figure 7: IFFT algorithm for computing 8-point input signal



5.4.3           Detailed Design of the FFT Algorithm on an FPGA Chip
Implementing the algorithms just described on an FPGA is not an easy task. The structure
of the system must be optimized for speed and size. The project team has implemented a
design to compute the FFT, and IFFT. As described in Section 5.4.2 Fast Fourier
Transform Algorithm, the IFFT and FFT can be implemented using just a multiplier,
adder, and subtractor. The teams design uses complex carry-lookahead adders and
complex multipliers using carry-save additions to maximize dataflow through the system.
A description of the hardware design to calculate the FFT and IFFT is given next.


5.4.3.1        Overall Description of System
The project team used a Xilinx board to implement the FPGA. The FPGA is designed to
handle a 1024-point FFT and IFFT transform calculation, with each point having a real
and imaginary part. Figure 8 shows a high-level block diagram of the system.




                                                                                                    17
Input: 1024                    Memory on                    Output:
point image                      chip                    transformed
                                                          1024 point
                                                            image


                               Transform
                               Algorithm

                             FPGA


       Figure 8: High-level block diagram of the FFT Implementation




                                                                       18
5.4.3.2       Overall Design
The FPGA design is divided into three separate parts: the transform control, transform
wrapper, and OPB transform wrapper. The design of the FPGA for a Xilinx board is
shown in Figure 9.




                   OPB Control




                     Transform                      Memory              OPB
                      Control                                          memory
                                                                      controller



                Transform Wrapper


             OPB Transform Wrapper
                                                                        OPB Bus


                          Figure 9: High Level Xilinx Chip Layout


 As shown in the layout, the Xilinx chip includes an OPB bus, memory controller, and
memory. The other components, including the OPB transform wrapper, transform
wrapper, transform control and OPB control, were designed and implemented by the
project team.

The user inputs a 1024 point image of complex values into the Xilinx chip using C code.
The input must be stored as a 16-point fixed decimal number, with 8 bits dedicated to the
fraction, and 8 bits dedicated to the whole number. This memory contains all of the
numbers needed for the computation. Once the memory has been loaded with the 1024
point image, the user starts the transform calculation by writing to the “start transform”
register. The user also selects whether to calculate the FFT or IFFT by writing to the
“transform to be calculated” register. Once the transform is completed, the system
notifies the user by writing a 1 to the “transform complete” register, when the
computation is complete. The user may then retrieve the resulting complex values from
memory.




                                                                                       19
The OPB bus is used to communicate between the processor on the chip, the chip’s
memory, and the user designed FPGA. The actual design of the FFT/IFFT transform
hardware is in the transform control, and is discussed in the next section. The transform
wrapper, and OPB transform wrapper are used to simplify interaction with the OPB bus,
and memory.

The memory is a dual-port RAM, with one port dedicated to communication of the OPB
memory controller. The other memory port is dedicated to the OPB transform wrapper,
and is seen as a single-port RAM, since only one port is available.

The memory controller is included with the chip, and was not designed or written by the
project team.

The transform wrapper maps the ports of the transform control to interact with the
memory, as well as the OPB control. It is in this level that the signals to start the
transform, select the transform to calculate, signal overflow, and signal the transform is
complete are registered. These signals can be modified through the OPB bus. The
schematic diagram of the transform wrapper is shown in Figure 10. The signals
Bus2IP_Clk, Bus2IP_CS, Bus2IP_RdCE, Bus2IP_WrCE, Bus2IP_Reset, Bus2IP_Data,
Bus2IP_Addr, IP2Bus_Data, and IP2Bus_Ack are used to communicate with the OPB
Bus. The ports real_doa, real_dob, imag_doa, imag_dob, real_addra, real_addrb,
real_dina, real_dinb, real_enablea, real_enableb, real_wea, real_web, imag_addra,
imag_addrb, imag_dina, imag_dinb, imag_enablea, imag_enableb, imag_wea, imag_web,
twiddle_real_doa,     twiddle_real_addra,     twiddle_real_dina,    twiddle_real_enablea,
twiddle_real_wea, twiddle_imag_doa, twiddle_imag_addra,               twiddle_imag_dina,
twiddle_imag_enablea, and twiddle_imag_wea are used to communicate with the
memory. The signals can be deciphered using the following rules:
     A real indicates the port communicates with the real component memory
     A imag indicates the port communicates with the imaginary component memory
     An a at the end of the port name means the signal is memory block 1
     A b at the end of the port name means the signal is memory block 2
     do indicates the port is the data output from memory
     addr indicates it is an address port
     din indicates the port is data input from memory
     enable is the enable for the memory port
     we is the write enable for the memory port
The memory ports are discussed further in the Section 5.4.3.5 Memory Block.




                                                                                       20
                     Figure 10: Transform Wrapper Schematic Symbol



The OPB wrapper allows the transform wrapper to interact with the OPB bus. This
wrapper was provided by Michael Frederick (listed in the acknowledgements), and is
used to handle the decoding of the OPB bus to decipher when the bus is communicating
with the FPGA, when it is writing to the chip, and when it is reading from the chip.


5.4.3.3        Transform Control Design
The transform control is the heart of the FPGA. In the transform control the calculation
of the FFT, and IFFT, are implemented in hardware.


                                                                                     21
The team decided to implement these transforms using a pipeline design in order to make
the calculation as fast as possible, as discussed in Section 5.3.1 Technical Approach
Considerations and Results.

The pipeline design consists of 9 stages and a high level design is shown in Figure 11.
As shown in the figure, two stages are dedicated to the address generation block, one
stage is dedicated for the memory block, four stages are dedicated to the multiplier, and
two stages are dedicated to the adder/subtractor block. These blocks of the transform
control are made up of several sub-components including a PC, shifter, look up table,
address generator, multiplier, adder/subtractor, and multiplexers and registers. The
memory block in the figure refers to the Xilinx chip memory shown in Figure 9, and is
not actually included in the transform control, but is shown for clarity. Details of each
component are given in the following sections.


    Stages 1-2              Stage 3              Stages 4-7               Stages 8-9




   Address                                                                Adder/
  Generation              Memory                 Multiplier
                                                                         Subtractor
    Block                  Block                  Block
                                                                           Block




                     Figure 11: High Level Design of Transform Control

5.4.3.4        Transform Control Pipeline Design Description
The team implemented a pipeline design to calculate transforms in order to maximize the
computation speed of the transform. A 9-stage design was chosen based on the time
delay associated with the components of the transform control. Timing delays for the
components are given in Table 9. The timing delays for all of the components designed
by the team were obtained from Xilinx synthesis report used to estimated computation
time. The memory access timing was obtained from p. 58 of the Virtex II data sheet
provided by Xilinx.




                                                                                       22
                     Table 9: Timing of Transform Control Components
Component                     Min. Period Time               Max Clock Frequency
PC                            3.79ns                         263.644 MHz
Address Generator             3.45ns                         289.855 MHz
Multiplier                    17.34ns                        57.67 MHz
Add/Subtractor                6.54ns                         152.91 MHz
Memory Access Time            1.54ns                         650.35 MHz

As the table shows, the pipeline design will be based mainly on the multiplier. Since the
multiplier consists of levels of addition, it can easily be broken up into stages for a
pipeline design. The adder/subtractor can also be divided into stages, since it is
dependent on two memory reads. Therefore, the team has broken the multiplier block
into four stages, the adder/subtractor block into two stages, and the PC and address
generator into two stages.


In order to study the pipeline design the team came up with a space- time diagram to
depict the pipeline stages. The team’s original space-time diagram is shown in
Table 10. The diagram was simplified by using only an 8-point image.




                                                                                      23
                                          Table 10: Original Space-Time Diagram for Pipeline
Time                 0      1        2         3        4        5        6        7        8        9          10        11

              I1                                                                            Add                           Read
                                               Mult     Mult     Mult     Mult
                      PC     Addr    Read                                                   x[0]         PC     Addr      x[4]/
                                                S1       S2       S3       S4          -
                       0      0      x[0]                                                    +            1      4        Write
                                               x[0]     x[0]     x[0]     x[0]
                                                                                            X[4]                          x[2]
              I2                                                                            Sub
                                                        Mult     Mult     Mult     Mult
                                PC   Addr      Read                                         x[0]                 PC       Addr
                                                         S1       S2       S3       S4                    -
                                 1    4        x[4]                                           -                   2        1
                                                        x[4]     x[4]     x[4]     x[4]
                                                                                            X[4]
              I3                                                                                                Add
                                                                 Mult     Mult     Mult     Mult
                                         PC    Addr     Read                                                    x[1]       PC
                                                                  S1       S2       S3       S4           -
                                          2     1       x[1]                                                     +          3
                                                                 x[1]     x[1]     x[1]     X[1]
                                                                                                                x[5]
              I4                                                                                                Sub
                                                                          Mult     Mult     Mult         Mult
                                                   PC   Addr     Read                                           x[1]
                                                                           S1       S2       S3           S4                   -
                                                    3    5       x[5]                                             -
Instruction




                                                                          x[5]     x[5]     X[5]         x[5]
                                                                                                                x[5]
              I5                                                                   Mult     Mult     Mult       Mult
                                                            PC   Addr     Read
                                                                                    S1       S2       S3         S4            -
                                                             4    2       x[2]
                                                                                   x[2]     X[2]     x[2]       x[2]
              I6                                                                            Mult     Mult       Mult      Mult
                                                                     PC   Addr     Read
                                                                                             S1       S2         S3        S4
                                                                      5    6       x[6]
                                                                                            X[6]     x[6]       x[6]      x[6]
              I7                                                                                     Mult       Mult      Mult
                                                                              PC   Addr     Read
                                                                                                      S1         S2        S3
                                                                               6    3       X[3]
                                                                                                     x[3]       x[3]      x[3]
              I8                                                                                     Read
                                                                                                                Mult      Mult
                                                                                       PC   Addr     x[7]/
                                                                                                                 S1        S2
                                                                                        7    7       Write
                                                                                                                x[7]      x[7]
                                                                                                     x[0]
              I9                                                                                                Read
                                                                                                                          Mult
                                                                                                PC   Addr       x[0]/
                                                                                                                           S1
                                                                                                 0    0         Write
                                                                                                                          x[0]
                                                                                                                x[1]


                   In the original space-time diagram, both an adder and subtractor are needed at the same
                   time. After studying this diagram, the team implemented a solution to minimize chip
                   space, by using the same adder to calculate the subtraction. This could be done by
                   shifting the subtraction stage one space to the right. A revised space-time diagram is
                   shown in Table 111.




                                                                                                                     24
                                               Table 11: Revised Space-Time Diagram
Time                 0      1         2        3        4        5        6        7        8        9          10            11

              I1                                                                            Add                               Read
                                               Mult     Mult     Mult     Mult
                      PC     Addr     Read                                                  x[0]         PC     Addr          x[4]/
                                                S1       S2       S3       S4          -
                       0      0       x[0]                                                   +            1      4            Write
                                               x[0]     x[0]     x[0]     x[0]
                                                                                            X[4]                              x[2]
              I2                                                                                         Sub
                                                        Mult     Mult     Mult     Mult
                                PC    Addr     Read                                                      x[0]    PC           Addr
                                                         S1       S2       S3       S4          -
                                 1     4       x[4]                                                        -      2            1
                                                        x[4]     x[4]     x[4]     x[4]
                                                                                                         x[4]
              I3                                                                                                Add
                                                                 Mult     Mult     Mult     Mult
                                          PC   Addr     Read                                                    x[1]           PC
                                                                  S1       S2       S3       S4           -
                                           2    1       x[1]                                                     +              3
                                                                 x[1]     x[1]     x[1]     X[1]
                                                                                                                x[5]
              I4                                                                                                               Sub
                                                                          Mult     Mult     Mult         Mult
                                                   PC   Addr     Read                                                          x[1]
                                                                           S1       S2       S3           S4         -
                                                    3    5       x[5]                                                            -
Instruction




                                                                          x[5]     x[5]     X[5]         x[5]
                                                                                                                               x[5]
              I5                                                                   Mult     Mult     Mult       Mult
                                                            PC   Addr     Read
                                                                                    S1       S2       S3         S4                -
                                                             4    2       x[2]
                                                                                   x[2]     X[2]     x[2]       x[2]
              I6                                                                            Mult     Mult       Mult          Mult
                                                                     PC   Addr     Read
                                                                                             S1       S2         S3            S4
                                                                      5    6       x[6]
                                                                                            X[6]     x[6]       x[6]          x[6]
              I7                                                                                     Mult       Mult          Mult
                                                                              PC   Addr     Read
                                                                                                      S1         S2            S3
                                                                               6    3       X[3]
                                                                                                     x[3]       x[3]          x[3]
              I8                                                                                     Read
                                                                                                                Mult          Mult
                                                                                       PC   Addr     x[7]/
                                                                                                                 S1            S2
                                                                                        7    7       Write
                                                                                                                x[7]          x[7]
                                                                                                     x[0]
              I9                                                                                                Read
                                                                                                                              Mult
                                                                                                PC   Addr       x[0]/
                                                                                                                               S1
                                                                                                 0    0         Write
                                                                                                                              x[0]
                                                                                                                x[1]


                   In this diagram it is clear addition and subtraction take place in separate stages, therefore
                   only one adder is needed.




                                                                                                                         25
5.4.3.5         Memory Block
A 1024x16-bit single-port RAM memory block is used to store the 1024 data points in
2’s complement form. A 512x16-bit single-port RAM memory block was used as a look-
up table to store constant twiddle multiplier numbers.

During the FFT and IFFT transform calculations memory reads and writes occur at the
same time. Rather than losing a clock cycle by alternating between a memory read and
write, the team used two single-port RAMs for the 1024 data points, and switched
between reading and writing to each block. Therefore, a total of six memory blocks are
used in the design of the system. Four single-port RAM memory blocks (two for the real
part, two for the imaginary part), of size 1024x16, are used to store the input/output. Two
single-port RAM memory blocks (one for the real part and one for the imaginary part), of
size 512x16, are used to store the twiddle multiplication factors

Since each level of the FFT needs to read from the transform calculations from the
previous level, switching reading and writing between memory blocks is needed. Figure
12 shows the design of the memory and memory controller. For a 1024-point FFT and
IFFT, there will be 10 levels of calculation. The stage and corresponding memory block
operation is given in Table 12.


                                         Read/Write
                Switch                    Decoder




             Real              Imaginary             Real              Imaginary
            Memory              Memory              Memory              Memory
            Block 1             Block 1             Block 2             Block 2

                              Figure 12: Memory Block Control




                                                                                        26
                        Table 12: Level and Memory Block Operation
                                                         Imaginary         Imaginary
                   Real Memory        Real Memory
                                                        Memory Block      Memory Block
                     Block 1            Block 2
                                                             1                 2
     Level           Operation          Operation        Operation         Operation
       1               Read              Write             Read              Write
       2              Write              Read              Write             Read
       3               Read              Write             Read              Write
       4              Write              Read              Write             Read
       5               Read              Write             Read              Write
       6              Write              Read              Write             Read
       7               Read              Write             Read              Write
       8              Write              Read              Write             Read
       9               Read              Write             Read              Write
      10              Write              Read              Write             Read

Due to the 9-stage pipeline design, the last 9 data addresses will not be written back to
memory before the port is switched because of the stage delay. The team simply used a
buffer to store the last 9 data points calculated in the FPGA, and whenever these
addresses are needed to be read, they are read from the buffer.

In calculating the FFT and IFFT, data is written back to memory sequentially. But as
described in Section 5.4.2, the read data for the DFT is not always sequential. The next
section describes how these read addresses are calculated.


5.4.3.6        Address Generation Block
To calculate the FFT/IFFT data must be written back sequentially, but read addresses are
not always sequential. Through analysis of the FFT and IFFT algorithms, the team
discovered the read address is calculated in each stage by rotating a PC generated address
to the right by 1 bit. Table 13: Address Generation for FFT/IFFT shows this address
generation for the four levels of a 16-point transform calculation, and the table can be
expanded to the 10 levels of a 1024-point transform calculation.




                                                                                       27
                            Table 13: Address Generation for FFT/IFFT
  PC & Read Address for               Level 1        Level 2            Level 3    Level 4
        levels 1-4
PC generated       Read             Actual data    Actual data     Actual data    Actual data
  address         address            location       location        location       location
                                    written to      written to     written to     written to
                                    memory in      memory in       memory in      memory in
                                    sequential     sequential      sequential     sequential
                                       order          order           order          order
       0                0                0              0               0              0
       1                8                8              4               2              1
       2                1                1              8               4              2
       3                9                9             12               6              3
       4                2                2              1               8              4
       5               10               10              5              10              5
       6                3                3              9              12              6
       7               11               11             13              14              7
       8                4                4              2               1              8
       9               12               12              6               3              9
      10                5                5             10               5             10
      11               13               13             14               7             11
      12                6                6              3               9             12
      13               14               14              7              11             13
      14                7                7             11              13             14
      15               15               15             15              15             15

As one can see from the table, shifting the PC generated address by 1 one to the right
each time results in the correct data being read from the memory at each stage in the
transform calculation.

The twiddle factor look-up table addresses are also generated in the address generation
block. The twiddle addresses are found by taking the read address, and shifting to the left
by N bits, with 0s shifted in, where N is equal to the level number minus one. If the most
significant bit of the shifted address is a 0, then the twiddle address is 0. However, if the
most significant bit is a one, then the twiddle address is the remaining least significant
bits (9 for a 1024-point image). In the first level of calculation, the twiddle address is
always 0. Table 14: Twiddle Factor Addresses shows the twiddle addresses generated for
a 16-point image. This table can be expanded similarly to a 1024-point image.




                                                                                          28
                             Table 14: Twiddle Factor Addresses
  PC & Read Address for            Stage 1         Stage 2          Stage 3         Stage 4
        stages 1-4
PC generated       Read            Twiddle      Data written      Data written    Data written
  address         address          Address       to memory         to memory       to memory
                                                in sequential     in sequential   in sequential
                                                    order             order           order
       0               0               0              0                 0               0
       1               8               0              0                 0               0
       2               1               0              0                 0               0
       3               9               0              4                 2               2
       4               2               0              0                 0               0
       5              10               0              0                 0               0
       6               3               0              1                 0               0
       7              11               0              5                 2               2
       8               4               0              0                 0               0
       9              12               0              0                 0               0
      10               5               0              2                 1               0
      11              13               0              6                 3               2
      12               6               0              0                 0               0
      13              14               0              0                 0               0
      14               7               0              3                 1               0
      15              15               0              7                 3               2

The address generation block consists of two components: a PC, and an address
generator.


5.4.3.7         PC
The PC is a simple block that is very important to the system. The PC is similar to most
standard PCs. The PC counts up on every rising edge clock and counts from 0 to 210 -1.
It is also the responsibility of the PC to keep track of the level of the transform being
calculated. After each rollover, the PC increments its level count by 1 and this level
count number is used in the twiddle factor address generation. Once the PC level count
reaches 10, on the next PC rollover, the transform completed signal is set, and can be
read in the register by the user.

The PC block is shown in Figure 13. The clk port is the clock for the PC. The i_run port
is the register signal that starts the transform calculation. The reset_f port clears the PC
value on a reset. The o_level keeps track of the transform stage and is inputted into the
twiddle address generator. The o_PC is the PC 9 bit PC output. The o_done signal is
used to indicate when a transform has been completed.




                                                                                           29
                                Figure 13: PC Schematic Symbol

5.4.3.8         Address Generator
The lookup table address is generated according to the algorithm described in Section
5.4.3.5         Memory Block. The address generator includes a bit rotator that rotates a
10 bit number to the left by 1, and rotates in a 0. The 10 bit number address is the read
address generated by the shifter. For the twiddle address, if the transform is in the first
level, then the generator returns the address 0. For the remaining levels, the read address
is rotated by the level minus one bit/s to the left. If the 10th bit is a zero, the look-up table
twiddle address generator returns address 0. If the 10th bit is a one, the look-up table
twiddle address generator returns the nine least significant bits of the rotated number.

The team implemented the shifter and LUT address generator as one unit, called the
address generator, because both use a rotation to generate addresses.

The address generator block is shown in Figure 14. The i_level and i_PC signals are
output from the PC, once they have been registered from stage 1 in the pipeline. The clk
signal is the clock input, and the reset_f signal clears the address generator on a reset.
The o_data_addr is the 10-bit read address, and the o_twiddle_addr is the 9-bit twiddle
address.




                        Figure 14: Address Generator Schematic Symbol




                                                                                              30
5.4.3.9        Multiplier Block
The 16-bit complex multiplier takes two complex numbers, and multiplies them,
truncating the output to an 16-bit number. Figure 15 shows the construction of an n-bit
multiplier using n-bit carry-save adders, carry-lookahead adders, and carry-lookahead
subtractors, which was used as a reference for the 16-bit multiplier. The use of carry-
save adders in the multiplier was chosen due to its very high-speed calculation, and
adaptability to a pipelined process. For large data widths, a different algorithm for
computing the multiplication may need to be explored.




                                                                                    31
                             Figure 15: n-bit Complex Multiplier


Figure 16 shows the VHDL schematic symbol for the multiplier. The ports i_a_imag,
i_a_real, i_b_imag, i_b_real are the 16-bit input signals to multiply. The clk port is used
to register the values in the multiplier for the pipeline, as discussed in Section 5.4.3.4



                                                                                        32
Transform Control Pipeline Design Description, and the reset_f pin is used to clear these
registers on a reset. The o_y_imag and o_y_real are the 32-bit output signals. Only the
least significant 16-bits of these signals are actually used. The o_ovfl_imag and
o_ovfl_real are used to indicate if an overflow exception occurred during multiplication.




                            Figure 16: 16-bit Complex Multiplier




                                                                                      33
5.4.3.10        Adder/Subtractor Block
The adder/subtractor block is used to add and subtract two points in the transform
calculation. The team implemented one 16-bit adder. In order to calculate subtractions,
the adder is used by setting the carry in bit to one, and inverting the subtracted input.

The n-bit complex carry-lookahead adder was used as a model for the adder block to the
system. This block performed fast complex addition by using the carry-lookahead
technique. The team implemented a 16-bit complex carry-lookahead adder. An n-bit
complex carry-lookahead adder is shown in Figure 17.




                           Figure 17: n-bit Complex Carry-Lookahead Adder


The 16-bit complex carry-lookahead adder’s VHDL schematic symbol is shown in Figure
18. The ports i_a and i_b are the two 16-bit input signals, and o_sum is the 16-bit
addition of i_a and i_b ouput signal. The signal i_carry is used as the initial carry bit, and
o_ovfl indicates if an overflow occurred in the addition.




                                                                                           34
                      Figure 18: 16-bit Complex Carry-Lookahead Adder


The adder/subtractor block also includes one stage of registers, and two multiplexers.
The extra components are needed due to the design of the FFT/IFFT algorithms. In the
algorithm, each DFT calculation performs an addition and subtraction of two points, after
each point is multiplied by the appropriate twiddle factor. In the design of our system,
we used only one multiplier, therefore the output of the first point is held for three
additional clock cycles, one to wait for the second point to finish multiplication, one to be
added to the second point, and one to perform a subtraction with the second point. The
second data point is held for two additional clock cycles, one cycle to perform an addition
with the first point, and one to perform a subtraction with the second point. Also, the
carry-in bit is set for the subtraction, and the second data point is inverted.

The team implemented the following logic circuit, shown in Figure 19: Adder/Subtractor
Block, to handle this problem. Table 15 details the operations of this circuit for an 8-
point transform calculation.




                                                                                          35
                          Mux         Register
                                         1



    Multiplier                                                                    Adder

                                                                        Mux
                                                      Inverter
                                     Register                                         cin
                                        2




       Add/Sub

                              Figure 19: Adder/Subtractor Block




                          Table 15: Signals in Adder/Subtractor Block
                                        Signal
Time             Multiplier     Add/Sub     Register 1            Register 2   Operation
0                x[0]           1           x[0]                  -            -
1                x[4]           0           x[0]                  x[0]         x[0]+x[4]
2                x[1]           1           x[0]                  x[4]         x[0]-x[4]
3                x[5]           0           x[1]                  x[1]         x[1]+x[5]
4                x[2]           1           x[1]                  x[5]         x[1]-x[5]
5                x[6]           0           x[2]                  x[2]         x[2]+[6]
6                x[3]           1           x[2]                  x[6]         x[2]-x[6]
7                x[7]           0           x[3]                  x[3]         x[3]+x[7]
8                x[0]           1           x[3]                  x[7]         x[3]-x[7]


In the calculation of an FFT, no additional blocks are needed, however for the IFFT; the
transform is calculating the complex conjugate. In order to calculate the IFFT, a
multiplexer has been added after the adder. The multiplexer chooses between the output
of the adder, and the conjugate of the output of the adder. Only in the 10th level of the
IFFT transform calculation is the conjugate taken.


                                                                                           36
These four main blocks make up the design of the transform control and are used to
implement an FPGA that can calculate both an FFT and IFFT.

5.5    Implementation Process Description
In order to design an FPGA that could be used to calculate transforms, the team had to
apply the mathematical algorithms to calculate the transforms to hardware. This process
was implemented through research and discussion. The team studied the algorithms to
find how they could be broken into hardware components, and how the hardware
components could interact to calculate the transform.

The team implemented the hardware by breaking down the hardware design into smaller
units. These smaller units were used as building blocks for the larger system. Once a
smaller unit was designed, it was tested extensively until the team was satisfied with its
success. At this point the unit was integrated into its sub-block based on the pipeline
design. These sub-blocks were again tested extensively until the team was satisfied with
the completion of the. Once all of the sub-blocks were completed, they were integrated
together to form the FPGA.

In order to design an FPGA to calculate transforms, the team had to use several different
tools. The majority of the components were written using VHDL. All of the components
were designed using the Xilinx Integrated Student Edition 6 program to design hardware
components, and ModelSim was used to test the components. Once a working design of
the FPGA was completed, it was downloaded onto a Xilinx board using a Xilinx
MulitLinx. The FPGA was then tested on hardware using a serial port to communicate
with the board. C code and Java were used to communicate with the chip and interpret
data input and output.

One major problem was encountered with the implementation. The hardware designed to
calculate the transform was designed for 16-bit fixed point numbers, with 8 bits dedicated
to the fraction, and 8 bits dedicated to the whole number, and the data inputted into the
system was through C code. The team spent a considerable amount of time trying to
figure out how to convert a float or double into a 16-bit fixed point number.

This implementation process was fairly successful, although one area could have been
improved. The team needed to better design the FPGA before trying to implement the
system. This could have saved valuable time spent trying to overcome implementation
problems.




                                                                                       37
5.6    Testing of the End Product and its Results
The team conducted four different types of testing during the project. The test plan is
illustrated in the diagram in Figure 20. Project design and planning is given on the left
side of the diagram, and testing is given on the right side. Unit testing, integration
testing, system testing, and acceptance testing were used to verify the functionality of the
FPGA design. Each type of testing corresponded to a particular aspect of the planning of
the project.




      Requirements                                                            Acceptance




           Architecture                                                  System




                       Design                                   Integration



                                     Code                Unit

                                                 Time


                                    Figure 20: Testing Plan
Two members of the group, Sean Casey and Chris Miller, wrote all of the code for the
system. Therefore the other two members, Ibrahim Ali and Chii-Aik Fang, were
unfamiliar with the underlying code of the subcomponents of the system. These two
members were valuable in the testing process because they were able to extensively test
the components without the knowledge of how the system worked.

The testing was conducted by breaking down the stages of the pipeline into the four main
components shown in Figure 11: High Level Design of Transform Control. This way,
each stage could be verified as working, before integrating it with any other stage. Each
stage was broken down into its subcomponents, which were unit tested before being
integrated into the stage component. This type of testing allowed us to identify problems
early in the design process.

Each of the types of tests is detailed on the next several pages.



                                                                                           38
1. Unit testing: Unit testing was performed after the initial code for the project sub-
   blocks was developed. Unit testing was performed by the team in the hardware lab in
   Coover, using ModelSim. Unit testing was used to test code to see if the individual
   components of the system were working and were coded correctly. Unit testing only
   tested the functionality of the smallest components of the system. If the tests showed
   failures, the code was fixed and retested until the individual components were
   functional, and the team could not find any errors. The tests were both automated and
   interactive tests, with the team vigorously testing boundary conditions, to ensure that
   all cases were tested. A sample integration testing form is given in Appendix A.1.

   The following sub-blocks of the system were unit tested, and their completed testing
   forms are given in the Appendix.

          PC – The PC was tested using a ModelSim test bench. The ModelSim display
           was analyzed to see if the PC was incrementing properly and outputting the
           appropriate signals. Once the team analyzed the PC for the entire 1024-point
           transform, and all values were checked, the PC was verified to be accurate.
           The test form and model ModelSim output is shown in the appendix.
          Twiddle address generator – The twiddle address generator was tested using a
           ModelSim test bench. The ModelSim display was analyzed to see if the
           twiddle address generator was incrementing properly and outputting the
           appropriate signals. Once the team analyzed the twiddle address generator for
           the entire 1024-point transform, and all values were checked, the twiddle
           address generator was verified to be accurate. The test form and model
           ModelSim output is shown in the appendix.
          Multiplier – The multiplier was tested using a ModelSim test bench. The
           boundary values of the multiplier were tested as well as several other values.
           Once extensive testing was done on the multiplier, and the outputted values
           were verified, the multiplier passed the testing phase. The results of the test
           bench are given in the appendix.
          Adder – The adder was tested using a ModelSim test bench. . The boundary
           values of the adder were tested as well as several other values. Once
           extensive testing was done on the adder, and the outputted values were
           verified, the adder passed the testing phase. The results of the test bench are
           given in the appendix.


2. Integration Testing: As the team completed the testing of individual components of
   the system, the components were integrated together. Through integration testing, the
   team determined if the components were interacting correctly as described in the
   design. Integration testing was also performed only by the team, and took place in
   Coover. Once again, the test was both automated and interactive, with the team
   testing for functionality, and for boundary conditions, making sure that the code
   functioned properly when boundaries were reached. The team continued integration
   testing until satisfied that components worked together as specified in the design. A
   sample integration testing form is given in Appendix A.2.


                                                                                       39
   The following blocks of the system were integration tested, and their completed
   testing forms are given in the Appendix.

          Address generation block – The address generator block was tested using a
           ModelSim test bench. The block was tested by members of the team without
           knowledge of the inner workings of the block. Once the team analyzed the
           address generator block for the entire 1024-point transform, and all values
           were checked, the address generator block was verified to be accurate. The
           results of the test bench are given in the appendix.
          Adder/subtractor Block – The adder/subtractor block was tested using a
           ModelSim test bench. The boundary values of the adder/subtractor block
           were tested as well as several other values. Once extensive testing was done
           on the adder/subtractor block, and the outputted values were verified, the
           adder/subtractor block passed the testing phase. The results of the test bench
           are given in the appendix.

3. System Testing: Once the integration testing completed, the overall system was
   tested. System testing was performed by the team. The testing was both automated
   and interactive. A sample system testing form is given in Appendix A.3.

          The main system testing was performed on the transform control wrapper; the
           results of the testing are given in the appendix. One problem with memory
           switching was discovered in this testing phase, and it is shown in the
           appendix.

4. Acceptance Testing: At this time, acceptance testing has yet to be completed.
   During acceptance testing the FPGA will be tested by the client for acceptance. Here,
   the client will test both the functionality of the device, and the speed of calculation,
   size of design, and improvement of current similar technologies. The criteria for
   judging success in acceptance testing is determined by the client, and is specified by
   the functional and non-functional requirements of the project. Satisfaction of the
   client will mean the acceptance testing is completed. A sample acceptance testing
   form is given in Appendix A.4.

Through this testing, the team has tested both the functionality and performance of the
FPGA. Through this testing and retesting the team has maximized the performance of
the FPGA.




                                                                                        40
5.7 End Results of the Project
The end result of the project was an FPGA design that could calculate the FFT and IFFT
transforms.

Two members of the team, Ibrahim Ali, and Chii-Aik Fang, spent time researching the
radon transform and inverse radon transform. Due to time constraints and differences
between the RT and FFT algorithms, these transforms were not able to be implemented
into the FPGA Design. Their research is given in the following section.




                                                                                   41
5.7.1          Research of Radon Transform
The Radon transform is another transform that can be used in many image processing
applications. The following section describes the Radon transform.

Before introducing the discrete Radon algorithm, some important points need to be
explained, these points are addressed below:

       In the x-y plane, an image I(x,y) is represented in N x N array of pixels as shown
        in Figure 21.




                                                              Y

                                .   .   .   .   .   .   .           I(x,y)
                                .   .   .   .   .   .   .
                                .   .   .   .   .   .   .
                                .   .   .   .   .   .   .          Pixels
                                .   .   .   .   .   .   .
                                .   .   .   .   .   .   .
                                .   .   .   .   .   .   .           X


                       Figure 21: Representation of an image in x-y plane


       Every pixel represents the average gray level of a unit squire in the image.
       The discrete Radon transform (DRT) is the projections of this image taking by
        integrating alone lines defined by this equation:

   x cosө + y cosө = d

       d is the distance between a line and the origin, and ө is the angle of the line with
        respect to y-axis. Refer to Figure 22.
       An image in (x,y) space is thus transformed into Radon space (d ө).




                                                                                         42
          Figure 22: Representation of Strips for Summation along a Single Direction, ө




Now the discrete Radon transform (DRT) could be computed by taking the following
procedures:

      For any given angle, ө, each pixel lies in exactly one strip, therefore, for each
       pixel we simply compute its strip, δ(relative to ө) and add it to the current total for
       (δ, ө).
      This procedure is repeated for each value of ө. A simple code descriptions of this
       algorithm is given below:


                    ( N  1)       
   for (ө = 0; ө≤             ; ө += )
                        N           N
   {
               for (x = 0 ; x<N ; x ++)
               {
                       for (y = 0 ; y<N ; y++)
                       {
                                                    1
                               d = [x cosө + y sinө - ];
                                                    2
                               R[d][ө]= R[d][ө]+ I[x][y];
                       }
               }
   }



                                                                                           43
Fast approximate discrete Radon transform:
For neighboring angles, large subsets of pixels may be shared by different strips. On the
left of Figure 23, the discrete lines represented by the two angles are shown, on the right,
their representation as unit-width strips are shown.

Thus, one could potentially save time by computing such shared partial sums only once
for use in two or more lines.




              Figure 23: Overlap between Strips at Neighboring Angles is Depicted


A parallel algorithm is constructed to compute an approximation to the desired DRT
witch is designed to take maximum advantage of intermediate terms. The computation is
divided into four parts corresponding to four equal-sized ranges of angles:
                                         3            3
                         [0- ],[ - ],[ -          ], and [    -  ].
                             4     4 2      2 4             4
Then the algorithm is applied to each range of the angles. Now, the following steps are
taken to fast approximate the discrete Radon transform (DRT):

       In the first pass, a set of segments of approximate length of 2 are computed (the
        segments are the sums of two pixels)
       Next, pairs of length-2 segments are combined to form a set of segments of
        approximate length 4.
       In successive passes, segments of approximate length 2 i are computed, using
        only the length 2 1 i segment from the previous phase.
       After log N passes, strips of approximated length N are computed, each
        representing the sum of N pixels from the original image. These sums constitute
        the approximated DRT data.

Segments computations are illustrated in Figure 24. For each pass in the figure, one
complete set of angles is highlighted, beginning in the lower right corner of the NxN
original image.


                                                                                         44
        Figure 24: Illustration of the Segments Computed in the First Three Passes


The NxN DRT algorithm is mapped into an Nx(logN+1) processor butterfly, yielding
an N time pipelined algorithm. The data is pipelined by sending one column at a time
into the N processors in the first stage. For a vertical line, this is a straightforward
sweep through the logN stages. Other angles require data form different columns,
which is achieved by inserting delays for different angles. At each step, a processor
receives and stores two elements representing length-2 1 i segments, and adds two
delayed elements to create a length-2 1 i segments.

Considering Figure 25 for mapping the algorithm, for the first three passes, into a
butterfly network, one could conclude that:

      Row 0 is added to row 1 and saved in row 0
      Row 0 is also added to a shifted row 1, and the result saved in row 1
      The figure represents this as the initial row 0 contributing to rows 0 and 1 in
       pass 1
      Pass 2 and pass 3 are computed in the same manner.



                                                                                     45
Figure 25: Mapping DRT Algorithm into a Butterfly for N =16 Image




                                                                    46
5.7.2           Final Status of Major Components
The final statuses of the major components of the product are listed in Table 16. Most of
the major components have been successfully completed at the time of this report. The
team has looked into using the FPGA to bypass a software transform calculation for a
music translation on a digital keyboard as an application of the product. At the time of
this report, the team has not been able to successfully interface the board and the
keyboard, mainly due to time constraints. Also, the team has not been able to design an
FPGA that can calculate the RT/IRT due to time constraints and differences between the
RT and FFT algorithms.

                          Table 16: End Result of Project Components
             Component                   Success/Failure               Date Completed
                      FPGA Design           Success                        3/19/05
          OPB Transform Wrapper             Success                         3/5/05
        Transform Control Wrapper           Success                         3/5/05
                 Transform Control          Success                         3/4/05
         Address Generation Block           Success                        2/27/05
                                PC          Success                        2/15/05
                 Address Generator          Success                        2/27/05
                     Memory Block           Success                        3/18/05
                   Multiplier Block         Success                        2/15/05
                          Multiplier        Success                        2/15/05
            Adder/Subtractor Block          Success                        3/20/05
                      Logic Design          Success                        3/19/05
                              Adder         Success                        2/15/05
            Interface with keyboard         FAILED
        FPGA to Calculate RT/IRT            FAILED

Though the project the team has been unable to complete two major components, the
overall project has been a success. The team has successfully implemented an FPGA to
calculate transforms, the goal of the project.




                                                                                        47
6.            Estimated Resources and Schedules
The following section provides an original estimate, revised estimate, and actual
occurrence of the resources that would be used to complete the project including physical
resources, labor, and a time schedule.

6.1           Estimated Resources
This section shows the original estimate, revised estimate, and actual man-hours to be
performed during the project by the team, and the amount of financial resources required
for completion of the project. Although the team had performed the work to fulfill a
curriculum requirement, estimated labor costs were figured into the overall project cost to
simulate an industry setting.

6.1.1                     Personnel Effort Requirements
Table 17 contains the original estimate of personnel effort requirements for the project.
The table was divided to show personal effort by each team member on each task of the
statement of work defined in the project plan.

                                       Table 17 : Original Estimate of Personnel Effort Requirements
                Task1                     Task2                         Task3                  Task4                    Task5                   Task6                         Task7                         Task8
                                                                                                                                                  End Product Documentation


                                                                                                                                                                                End product Demonstration
                                            Technology Considerations




                                                                                                End Product Prototype



                                                                                                                          End Product Testing
                                                                          End Product Design
                  Problem Definition




                                                                                                                                                                                                              Project Reporting
                                                                                                Implementation
                                            and Selections
  Personnel
  Name




                                                                                                                                                                                                                                  Total
Sean
                  10                             35                      30                         55                   55                      15                               5                          10                   215
Casey
Chris
                  12                             32                      35                         60                   50                      13                               5                          20                   227
Miller
Chii Aik
                  13                             30                      33                         58                   52                      14                               5                          11                   216
Fang
Ibrahim
                  14                             34                      35                         57                   58                      13                               5                          11                   227
Ali
    Total         49                          131                        133                     230                     215                     55                            20                            52                   885




                                                                                                                                                                                                                                  48
Table 18 contains the revised estimate of personnel effort requirements for the project.

                                     Table 18 : Revised Estimate of Personnel Effort Requirements
              Task1                     Task2                        Task3                  Task4                    Task5                   Task6                         Task7                         Task8




                                                                                                                                               End Product Documentation


                                                                                                                                                                             End product Demonstration
                                         Technology Considerations




                                                                                             End Product Prototype



                                                                                                                       End Product Testing
                                                                       End Product Design
                Problem Definition




                                                                                                                                                                                                           Project Reporting
                                                                                             Implementation
                                         and Selections
  Personnel
  Name




                                                                                                                                                                                                                               Total
Sean
               10                             35                      30                         55                   55                      15                               5                          10                   215
Casey
Chris
               12                             32                      35                         60                   50                      13                               5                          20                   227
Miller
Chii Aik
               13                             30                      33                         58                   52                      14                               5                          11                   216
Fang
Ibrahim
               14                             34                      35                         57                   58                      13                               5                          11                   227
Ali
    Total      49                          131                        133                     230                     215                     55                            20                            52                   885




                                                                                                                                                                                                                               49
Table 19 contains the actual personnel effort requirements for the project. The actual total
hours spent for the project was less than that of the estimate total hours. This was because
the team was estimating the total hours for implementing four different transforms: FFT,
IFFT, RT, and IRT. However, due to the limitation of time, only FFT and IFFT were
decided to be implemented.

                                            Table 19 : Actual Personnel Effort Requirements
              Task1                  Task2                        Task3                  Task4                    Task5                   Task6                         Task7                         Task8




                                                                                                                                            End Product Documentation


                                                                                                                                                                          End product Demonstration
                                      Technology Considerations




                                                                                          End Product Prototype



                                                                                                                    End Product Testing
                                                                    End Product Design
                Problem Definition




                                                                                                                                                                                                        Project Reporting
                                                                                          Implementation
                                      and Selections
  Personnel
  Name




                                                                                                                                                                                                                            Total
Sean
               10                          35                      30                         60                   55                      15                               5                          10                   220
Casey
Chris
               12                          32                      35                         75                   60                      13                               5                          20                   252
Miller
Chii Aik
               13                          30                      33                         30                   31                      19                               5                          11                   172
Fang
Ibrahim
               14                          34                      35                         29                   30                      11                               5                          11                   169
Ali
    Total      49                       131                        133                     194                     178                     61                            20                            52                   813




                                                                                                                                                                                                                            50
6.1.2          Other Resource Requirements
Table 20 defines the original estimate of miscellaneous resources required for this
project.

                 Table 20 : Original Estimate of Other Resource Requirements
           Item                   Team Hours          Cost
           3 × FPGA Boards        0                   Provided by the client
           Xilinx Software        0                   Downloaded
           VHDL Materials         0                   Checked out from library
           Project Poster         12                  $50
                          Total   12                  $50

Table 21 defines the revised estimate of miscellaneous resources required for this project.

                 Table 21 : Revised Estimate of Other Resource Requirements
           Item                   Team Hours          Cost
           3 × FPGA Boards        0                   Provided by the client
           Keyboard               0                   Provided by the client
           Xilinx Software        0                   Downloaded
           VHDL Materials         0                   Checked out from library
           Project Poster         12                  $60
           Bound Final Report     16                  $10
                          Total   12                  $70

Table 22 defines the actual miscellaneous resources required for this project.

                       Table 22 : Actual Other Resource Requirements
           Item                   Team Hours          Cost
           3 × FPGA Boards        0                   Provided by the client
           Keyboard               0                   Provided by the client
           Xilinx Software        0                   Downloaded
           VHDL Materials         0                   Checked out from library
           Project Poster         12                  $60
           Bound Final Report     16                  $10
                          Total   12                  $70




                                                                                        51
6.1.3          Financial Requirements
Table 23 contains the original estimate of financial requirements of the project. The top
half of the table defined the physical resources needed to successfully fulfill the
requirements of the senior design course and project. The bottom half of the table
defined an estimate of the cost incurred by employing the team members to perform work
on the project.

                    Table 23 : Original Estimate of Financial Requirements
    Parts and Materials                                                   Cost ($)
    a. Course Manual                                                        50.00
    b. Project Poster                                                       60.00
    c. FPGA Boards                                                    Provided by client
    d. Development Tools                                                   No cost
                        Subtotal                                           $110.00

    Labor at $10.50/hr                     Total Hours                        Cost ($)
    a. Sean Casey                              215                            2257.50
    b. Chris Miller                            227                            2383.50
    c. Chii-Aik Fang                           216                            2268.00
    d. Ibrahim Ali                             227                            2383.50
                 Subtotal (labor)                                             9292.50
                    Project Total                                            $9,402.50

Table 24 contains the revised estimate of financial requirements of the project.

                    Table 24 : Revised Estimate of Financial Requirements
    Parts and Materials                                                   Cost ($)
    a. Course Manual                                                        50.00
    b. Project Poster                                                       60.00
    c. Bound Final Report                                                   10.00
    d. FPGA Boards                                                    Provided by client
    e. Keyboard                                                       Provided by client
    f. Development Tools                                                   No cost
                        Subtotal                                           $120.00

    Labor at $10.50/hr                     Total Hours                        Cost ($)
    a. Sean Casey                              215                            2257.50
    b. Chris Miller                            227                            2383.50
    c. Chii-Aik Fang                           216                            2268.00
    d. Ibrahim Ali                             227                            2383.50
                 Subtotal (labor)                                             9292.50
                    Project Total                                            $9,412.50




                                                                                           52
Table 25 contains the actual financial requirements of the project. The actual financial
requirements were less than that of the estimate financial requirements was because the
client had decided to implement only the FFT and IFFT using the FPGA-chip.

                         Table 25 : Actual Financial Requirements
    Parts and Materials                                                 Cost ($)
    a. Course Manual                                                      50.00
    b. Project Poster                                                     60.00
    c. Bound Final Report                                                 10.00
    d. FPGA Boards                                                  Provided by client
    e. Keyboard                                                     Provided by client
    f. Development Tools                                                 No cost
                        Subtotal                                         $120.00

    Labor at $10.50/hr                   Total Hours                     Cost ($)
    a. Sean Casey                            220                         2310.00
    b. Chris Miller                          252                         2646.00
    c. Chii-Aik Fang                         172                         1806.00
    d. Ibrahim Ali                           169                         1774.50
                 Subtotal (labor)                                        8536.50
                    Project Total                                       $8,656.50




                                                                                         53
6.2    Schedules
This section depicts the schedules for the project.

Microsoft Project Professional 2002 was used to design the following project schedules
defined by the project team. Figure 26, Figure 27, Figure 28 and Figure 29 on the next
four pages shows the original estimate, revised estimate and actual project schedules.
According to Figure 27, the team started implementing FFT a week later than the date
was scheduled because of the issues of binary number representation of the input. The
team had spent a week to resolve this problem. In addition, the team discovered that the
memory that was available was a single-port memory. This was a constraint because the
team had decided to work with dual-port memory previously. Fortunately, the team was
able to resolve the problems.




                                                                                     54
Figure 26: Project Schedules Part 1




                                      55
Figure 27: Project Schedules Part 2




                                      56
Figure 28: Project Schedules Part 3




                                      57
Figure 29: Project Schedules Part 4




                                      58
The following schedule, shown in Table 266, was the original estimate of deliverables
schedule for the senior design course.

                     Table 26 : Original Estimate of Deliverables Schedule
Deliverable           Due date
September 17, 2004    Unbound project plan will be completed.
October 5, 2004       Bound project plan will be completed and posted on project webpage.
October 12, 2004      Poster will be completed.
November 12, 2004     Unbound design report will be completed.
December 15, 2004     Bound design report will be completed and posted on project webpage.
March 31, 2005        Unbound final report will be completed
May 4, 2005           Bound final report will be completed and posted on project webpage.

Table 277 shows the revised estimate of deliverables schedule for the senior design
course.

                     Table 27 : Original Estimate of Deliverables Schedule
Deliverable           Due date
September 17, 2004    Unbound project plan was completed.
October 5, 2004       Bound project plan was completed and posted on project webpage.
October 12, 2004      Poster was completed.
November 12, 2004     Unbound design report was completed.
December 15, 2004     Bound design report was completed and posted on project webpage.
March 31, 2005        Unbound final report was completed
May 4, 2005           Bound final report will be completed and posted on project webpage.

Table 28 shows the actual deliverables schedule for the senior design course.

                           Table 28 : Actual Deliverables Schedule
Deliverable           Due date
September 17, 2004    Unbound project plan was completed.
October 5, 2004       Bound project plan was completed and posted on project webpage.
October 12, 2004      Poster was completed.
November 12, 2004     Unbound design report was completed.
December 15, 2004     Bound design report was completed and posted on project webpage.
March 31, 2005        Unbound final report was completed
May 4, 2005           Bound final report will be completed and posted on project webpage.




                                                                                    59
7.       Closing Materials
This section provides informational materials including project evaluation,
commercialization, recommendations for additional work, lessons learned, risk and risk
management, team contact information, closing summary, references, and appendixes.

7.1      Project Evaluation
The project has several milestones and evaluation criteria to give a concrete measure used
to evaluate how well the team completed the project. Each milestone was graded based
on how the team performed. The criteria for judging each milestone is as follows (given
in Table 29):

                                 Table 29: Milestone Evaluation
Evaluation Result                                Numerical Score
Met or Exceeded                                  100%
Partially Met                                    75%
Not Met                                          50%
Not Attempted                                    0%

      The following items were identified as the project milestones. The criteria to evaluate
      the milestones are also given.
           Problem Definition – The project will be clearly defined through a project
             plan. The project plan will include the operating environment, intended uses,
             intended users, functional requirements, assumptions and limitations,
             constraint considerations, and possible problems. This milestone will be
             evaluated on how clear the problem definition is, and if it meets the customers
             desired definition.
           Research of Transforms – This milestone will be accomplished when the
             team has successfully researched various transforms and has decided upon
             which transforms should be implemented in the design. This milestone will
             be evaluated by analyzing the chosen transform’s ability to be implemented in
             hardware.
           Familiarity with Development Tools – In order to successfully design a
             chip, the team must familiarize itself with the tools needed in the design
             process. This milestone will be evaluated on the team’s knowledge of each
             tool, and each member’s ability to use the tool at an advanced skill level.
           Design of Chosen Algorithm – Once the team has chosen an algorithm, a
             design will be made that will implement the algorithm on a Xilinx™ FPGA.
             This milestone will be evaluated on the success of the design based on the
             designs functionality and size.
           Implementation of Algorithm – The transform algorithm chosen by the team
             will need to be successfully implemented and loaded into a Xilinx™ FPGA.
             The implementation should exhibit speed and efficiency. The team will be
             judged on this milestone by the success of the implementation of the
             algorithm in an FPGA design.


                                                                                          60
          Testing of FPGA – The FPGA will need to be rigorously tested and
           benchmarked to judge performance, ability, and inability. Testing will be a
           valuable part of the project and will be evaluated on the team’s ability to
           vigorously test all areas of the design, and show the design strengths and
           weaknesses.
          Demonstration to Client – The team will present the end product and all
           deliverables to the client in the form of a presentation. The demonstration will
           be based on the team’s ability to display the full functionality of the product to
           the client.
          Final Documentation of Product – The team will prepare final
           documentation on the end product. The final documentation will be evaluated
           on the team’s ability to successfully document all phases of the project, in a
           form that is easy for the customer to understand and use.

Table 30 summarizes the milestones of the project, as the well as their relative
importance, and percentage of value to the overall project. These percentages show how
the individual milestones were combined into the total project evaluation. In the project
definition, the team stated an overall score of 80% or above would be a successful
project.

                      Table 30: Project Milestones and their Importance

               Milestone                                        Importance
                                                  Relative                      Percentage
Problem Definition                           High                         10%
Research of Transforms                       High                         10%
Familiarity with Development Tools           Medium                       5%
Design of Chosen Algorithm                   High                         15%
Implementation of Algorithm                  High                         15%
Testing of FPGA                              High                         15%
Demonstration to Client                      Low                          5%
Final Documentation of Project               High                         15%
                             Total                                                100%

Based on the evaluation process stated earlier, the project milestones have been evaluated
using the scale in Table 30. Table 31 details the evaluation of the milestones of the
project.




                                                                                             61
                                Table 31: Project Evaluation

                Milestone                                       Evaluation
                                                   Evaluation                Percentage
Problem Definition                           Met                      10%
Research of Transforms                       Exceeded                 10%
Familiarity with Development Tools           Exceeded                 5%
Design of Chosen Algorithm                   Partially Met            11.25%
Implementation of Algorithm                  Met                      15%
Testing of FPGA                              Met                      15%
Demonstration to Client                      Not Met                  2.5%
Final Documentation of Project               Met                      15%
                             Total                                             84%

As shown in the table above, the majority of the project’s milestones were met or
exceeded expectation based on the evaluation criteria given. The design of the chosen
algorithm was only partially met because a few problems with the design were
discovered during the testing phase. Although these problems were small, a lower
evaluation was given because more time spent on the design phase would have fixed
these problems. Demonstration to the client has not been met at the time of this report.
The team still intends to demonstrate the project to the client.

The overall evaluation of the project is that the project is a success. Most of the
milestones have exceeded expectation and the total project evaluation was 83.75%. The
team considered 80% or above to indicate the project is successful, and that score has
been exceeded.

7.2    Commercialization
Software calculations of the Fourier transform are very time consuming and do not work
well for use in real-time systems. Because of the need of FFT and the inefficiency of the
software that computes FFT, the commercialization of this hardware design is possible
and practical.

The total design cost is $8,751.00, which is one-time cost. Any additional cost will
include just the price of the chip, which is in the range of 20 to 30 dollars, according to
Xilinx’s website, www.xilinx.com. The street selling price, with a 25% markup, would be
around 25 to 35 dollars.

The chip could play an important role in many digital signal processing applications
including optics, telecommunications, speech, and image processing.

The end FPGA design or chip could be marketed to real-time circuit applications that
need a fast computation for Fourier transforms. It could also be marketed as a portable
hardware that couples to any system to perform FFT calculations. One possible system is
a piano keyboard that can transcribe the notes played in real-time.



                                                                                          62
7.3     Recommendations for Additional Work
Although the project is considered a success, there are some areas of the project that
could be expanded into additional work.
       1. Integrate the RT and IRT into the FPGA design. The team was able to
           successfully design an FPGA that could calculate the FFT and IFFT, but was
           unable to also implement the RT and IRT into the design. Future work could
           include research into these algorithms to find the similarities between them
           and the FFT/IFFT. This would allow for an FPGA to be designed that could
           calculate four transforms on single dedicated chip.
       2. Integrate the chip into the music translation system. The hardware chip to
           calculate the FFT could be used to bypass a software system that calculates
           the same transform on a digital keyboard used at Iowa State University.
           Future work could integrate the hardware chip into the system to improve
           translation time.
       3. Improve and optimize the design of the FPGA. Although considerable time
           was spent analyzing and designing the FPGA, future work could include
           studying the team’s design and finding areas for improvement to speed up the
           calculation or decrease the size of the hardware.

The team has recommended these three areas for additional work to future individuals or
groups who would like to expand upon the project.

7.4     Lessons Learned
This section provides the lessons learned by the team technically and non-technically,
throughout the project. It included what went well, what did not go well, what technical
knowledge was gained, what non-technical knowledge was gained, and what the team
would do differently if the team had project to do over again.

7.4.1         What Went Well
The team had several successes throughout the course of the project. The team was able
to improve the efficiency of the pipelining of the overall system. The team was also able
to improve the efficiency of the first designed n-bit complex multiplier. In addition, the
team was able to reduce the number of stages needed to implement the 2-point butterfly
frequency-decimated FFT.

7.4.2         What Did Not Go Well
The team had a few difficulties throughout the course of the project. The team could
hardly set up a meeting time other than the regular meeting time among the team
members. This was because the team members were involved in on-site interviews, honor
society activities, and projects and presentations for other classes.

7.4.3         Technical Knowledge Gained
The team has gained knowledge in configuring an FPGA-chip to perform the FFT and
IFFT. Knowledge and understanding of several transforms including FFT, IFFT, RT, and




                                                                                       63
IRT has also been gained through the project. Members of the team also learned VHDL
to complete the project.

7.4.4         Non-technical Knowledge Gained
Not everything that the team learned was technical. The team had also gained the
experience of performing oral presentation and formal report documentation. The team
learned the proper way of giving an oral presentation.

7.4.5         What Would Be Done Differently If Do Again
The team would like to do a few aspects of the project differently. The team would like to
start working on the algorithm structuring and coding phase earlier. By structuring the
algorithm earlier, the team would have more time to implement and refine the design on
the FPGA-chip.

Also, the team would have researched and studied the specifics of the Xilinx chip better.
The team faced several challenges in interfacing the transform design with the memory
being used on the chip. A better initial understanding of the chip would have made the
interface easier.

7.5     Risk and Risk Management
This section describes the anticipated potential risks of the project and the solutions
taken. It included the anticipated potential risks and planned management thereof,
anticipated risks encountered and success in management thereof, unanticipated risks
encountered, attempts to manage and success thereof, and resultant changes in risk
management made because of encountered unanticipated risk.

7.5.1         Anticipated Potential Risks and Planned Management Thereof
The first anticipated potential risk that the team planned for was the loss of a team
member. In order to minimize the damage caused by this risk, the team documented their
work and meeting details individually.

The second anticipated potential risk was the loss of codes. In order to minimize the
damage caused by this risk, every team member was keeping a copy of the completed
codes.

The third anticipated potential risk was using the developing tools that would become
obsolete and lose its maintenance and support resources. In order to minimize the damage
caused by this risk, the team was ensured by the client that the tools provided by the
client would function properly.

The fourth anticipated potential risk was using the technologies that would be difficult
and time consuming to learn (VHDL and XilinxTM software). In order to minimize the
damage caused by this risk, one of the team members had experience with VHDL before
and the rest of the team picked up the VHDL during the fall semester. The team also used
the tutorial of XilinxTM software that was given by the faculty advisor.



                                                                                       64
7.5.2          Anticipated Risks Encountered and Management Thereof
Fortunately, no team member and previously completed codes were lost throughout the
course of the project. However, the team encountered one of the anticipated potential
risks that were mentioned earlier. The anticipated risk the team encountered was using
the technologies that would be difficult and time consuming to learn. The team was not
familiar with the FPGA-chip functionality and only a team member was familiar with
VHDL. However, the team was able to resolve this problem by seeking help from the
graduate student and faculty advisor.

7.5.3          Unanticipated Risks Encountered and Management Thereof
The team encountered risks that were unanticipated. The first unanticipated risk was the
difficulty to set up a proper meeting time other than the regular meeting time with the
advisor. This was because the team members were involved in on-site interviews, honor
society activities, and projects and presentations for other classes. In order to resolve this
problem, the team established a meeting on the weekend to accommodate team member
schedules.

7.5.4          Resultant Changes in Risk Management Made
Due to the unanticipated risk the team encountered, the team decided to establish a
meeting on weekend when it was necessary, to discuss and resolve the issues that arose
throughout the course of the project.

7.6     Project Team Information
This section provides project team information for the project advisor and student team
members.


7.6.1          Faculty Advisor and Client
                      Professor Arun Somani
                       Iowa State University
                       2215 Coover
                       Ames, IA 50011 – 0001
                       Phone # (515) 294-0442
                       Fax # (515) 294-3637
                       arun@iastate.edu


7.6.2          Student Team Members
                      Sean Casey
                       Electrical and Computer Engineering
                       218 Stanton Apt # 6
                       Ames IA, 50014
                       Phone # (515) 278-4429
                       caseysm@iastate.edu
                      Ibrahim Ali


                                                                                           65
                      Electrical Engineering
                      2609 Ferndale Ave Apt #9
                       Ames, IA 50010
                       Phone # (515) 451-1500
                       imali@iastate.edu
                     Chii-Aik Fang
                      Electrical Engineering
                      246 N Hyland #311
                      Ames, IA 50014
                      Phone # (515) 296-2194
                      cafang@iastate.edu
                     Christopher Miller
                      Electrical and Computer Engineering
                      1232 Frederiksen Court
                      Ames, IA 50010
                      Phone # (515) 572-7687
                      cbmiller@iastate.edu

7.7    Closing Summary
Faster, smaller, and more efficient chip designs are needed to calculate the Fourier
transform in real-time. The FPGA designed has far-reaching, important applications in
the fields of electrical and computer engineering. The FPGA-chip that the team has
designed can be used as a building block for larger systems that, for example, could
accurately record a note strummed by a skilled musician. Digital signals processing was
becoming an important part of everyday life, but the current technology to calculate
transforms is trailing the demand for speed. Through research and study, an efficient
algorithm for the Fourier transform, in this case, FFT was adapted from the software
world and implemented into the design of an FPGA. The end product of this design was
a hardware chip used for calculating Fourier transforms that would ultimately improve
current industry technology.




                                                                                    66
Figure 30 : Circuit Board




                            67
7.8    References
Hue-Sung Kim. “Towards adaptive balanced computing (ABC) using reconfigurable
functional caches (RGCs)”, 2001.

Kathryn Foutaian Gossett. “The Use of a Reconfigurable Functional Cache in a Digital
Signal, Processor: power and performance”, 2002.

Nathan A, VanderHorn, Michael T. Frederick, Jonathan A. Lucas, and Arun K. Somani.
“Real-Time Radon Transform Engine Optimized for Hardware Implementation”, Dec 19,
2003.



7.9    Appendices
The following appendices provide additional information relating to the project.

A: Testing Forms - Forms to be used for testing and project evaluation.

B: Testing Forms Completed – These forms were used in the actual testing.




                                                                                       68
A.         Testing Forms
The following appendix includes sample forms for recording test results. The forms
correspond to the four testing stages developed in Section 5.1.5 of this document.
The following pages contain:

        Unit testing form
        Integration testing form
        System testing form
        Acceptance testing form




                                                                                     A-1
A.1 Unit Testing Form

Name: _____________________
Date: _____________________

Component used in testing: _____________________

Description of test:




Description of results:




Description of problems/failures:




Overall Testing (Circle One)


       Successful                                  Failed




                                                            A-2
A.2 Integration Testing Form

Name: _____________________
Date: _____________________

Components used in testing: _____________________
Integration used in testing: ________________________

Description of test:




Description of results:




Description of problems/failures:




Overall Testing (Circle One)

              Successful                                Failed




                                                                 A-3
A.3 System Testing Form

Name: _____________________
Date: _____________________

Description of test:




Description of results:




Listing of results (speed, size, etc.):




Description of problems/failures:




Overall Testing (Circle One)


       Successful                         Failed




                                                   A-4
A.4 Acceptance Testing Form

Client Name: _____________________
Date: ___________________________

Description of test:




Description of results:




Overall Testing (Circle One)


       Successful                                       Failed


Overall system design (circle one):

Incomplete      Below Average         Satisfactory   Above Average   Excellent




                                                                             A-5

								
To top