Document Sample

Abstract This report details the Low Power Design of a 7-tap programmable Finite Impulse Response (FIR) Filter. Firstly, a 7-tap fir filter is designed .The folding transformation is used to systematically determine the control circuits in DSP architectures where multiple algorithm operations are time-multiplexed to a single functional unit .By executing multiple algorithm operations on a single functional unit, the number of functional units in the implementation is reduced, resulting in a integrated circuit with low silicon area. The simulation is done in VERIBEST software which is used for simulating VHDL codes .The inputs given are the co-efficients of the FIR filter which are generated using MATLAB.The decimal co-efficients are converted into Binary(8-Bit) and after the multiplication and addition the most significant 8-bits are taken. Low Power FIR Design Project 1 Table of Contents 1 INTRODUCTION ...................................................................................... 3 1.1 FIR Filters......................................................................................................... 3 1.2 7-TAP FIR FILTER ........................................................................................ 4 2 DATA PROCESSING ............................................................................... 5 2.1 Radix-2 .............................................................................................................. 6 2.2 Sign Magnitude ................................................................................................ 7 2.3 1s Complement ................................................................................................. 7 2.4 2s Complement ................................................................................................. 8 2.5 Mantissa Exponent........................................................................................... 8 2.6 Final Number Representation ...................................................................... 10 3 FOLDING TECHNIQUE ......................................................................... 12 3.1 Folding Transformation ................................................................................ 14 3.2 Cutset Method ................................................................................................ 16 4 SOFTWARE IMPLEMENTATIONS ........................................................ 19 4.1 Matlab ............................................................................................................. 19 4.2 OrCAD ............................................................................................................ 25 4.3 VHDL .............................................................................................................. 28 4.4 Veribest ........................................................................................................... 43 5 CONCLUSION........................................................................................ 47 Low Power FIR Design Project 2 1 Introduction This report investigates the power consumption of digital arithmetic circuits for use in the design and implementation of a 7-tap programmable Finite Impulse Response (FIR) filter. This section introduces the mathematical model of an FIR filter and discusses how this can be achieved in digital hardware. The report then investigates the technique for power consumption and discusses how the method works for our filter and results in building the hardware with low power. 1.1 FIR Filters The mathematical structure of a 3-tap FIR filter is shown in Figure 1. The signal input is a number representing the magnitude of a sampled analogue signal. The z-1 blocks store their input and delay it by one sample period. The Cox blocks contain the coefficients that shape the FIR filter frequency response. The design specification for the system to be designed details the coefficients for the test of the system as signed floating-point numbers. Signal Input z-1 z-1 z-1 Co1 × Co2 × Co3 × Multiplier + Adder Filtered Output Figure 1: FIR Filter Structure Low Power FIR Design Project 3 As can be seen from Figure 1, the arithmetic circuits needed for the design of a digital FIR filter are multipliers and adders, as well as storage elements. To keep the design of the system from becoming excessively complex, two input adders will be used in the system. This means that for the 3-tap filter above, the single adder shown would be created with two adders. The multipliers are to be parallel array multipliers, constructed with 1-bit full adders. Another aspect of the design which has been considered within this report is the number representation scheme, and how the data is to be processed. The coefficients, which have been provided as signed floating point numbers, may not necessarily be used as such within the digital system. The method of data conversion to a form more suited to simple and fast operation is investigated by this report. This enables the system designer to choose the appropriate arithmetic and control architecture for an efficient design. Using the information gained from the analysis of each component of the FIR structure, an overall design has been achieved. This design uses power-reduction techniques without compromising functionality. The design, test, and detailed description of this system will be included in the second assessed report. 1.2 7-TAP FIR FILTER Low Power FIR Design Project 4 Based on the knowledge we had from the 3-tap filter we design the 7-tap FIR filter which has four multipliers and six adders which we can see in the given below. This filter is in symmetry. Input X(n) D D D D D D D Output ut Figure2: Structure of a 7-TAP FIR FILTER 2 Data Processing The specification for the FIR filter contains the 4 filter coefficients to be stored in the filter and used to carry out operations. These coefficients are shown below: [0.0284, 0.2370, 0.4692, 0.2370] These numbers give the frequency response shown in Figure . Low Power FIR Design Project 5 Figure 3: FIR Filter Frequency Response These numbers are floating point numbers and must be converted into a form easily stored and operated upon in 8-bit binary. There are numerous possible ways to encode numbers using binary. Below is a summary of several possible formats that could be used to represent the above coefficients. 2.1 Radix-2 Radix-2 is the simplest possible encoding format. It allows positive integers in the range 0 to 2n-1 to be encoded. In this format each bit of the binary number has a weight associated with it, as in the following example: Weight: 128 64 32 16 8 4 2 1 Number: 1 0 1 1 0 0 1 0 Low Power FIR Design Project 6 128 + 32 + 16 + 2 = 178 Table 1: Radix-2 encoding 2.2 Sign Magnitude Sign magnitude notation allows negative numbers to be encoded by allocating one bit to specify the sign of the number. If the bit is a logical ‘1’ then the number is negative, otherwise it is positive. The remaining bits are then encoded using the radix- 2 method explained above. This allows the system to store numbers in the range – (2n- 1) to 2n-1. Example: Weight: Sign 64 32 16 8 4 2 1 Number: 1 0 1 1 0 0 1 0 (32 + 16 + 2) = -50 Table 2: Sign Magnitude encoding Sign magnitude has the disadvantage that there are two separate encodings for 0. In the case of an 8-bit number these are 10000000 and 00000000. 2.3 1s Complement 1s complement also allows numbers in the range – (2n-1) to 2n-1 to be encoded. To encode a 1s complement the bit by bit complement of the binary number is taken. For example: 1s Complement Decimal Equivalent 01111111 127 10000000 -127 Low Power FIR Design Project 7 11111001 -6 00000110 6 Table 3: 1s complement encoding 2.4 2s Complement 2s complement has become the standard method of storing signed binary integers. It allows the representation of numbers in the range – (2n) to 2n-1, and has the major advantage of only having one encoding for 0. To perform 2s complement encoding the bits of the binary number are complemented, and then 1 is added. For example: 2s Complement Decimal Equivalent 01111111 127 10000000 -128 11111010 -6 00000110 6 Table 4: 2s complement encoding 2.5 Mantissa Exponent In order to store floating-point numbers, the standard method is to use a mantissa exponent representation. In this, a number is represented in scientific notation as mantissa × radixsign × exponent Low Power FIR Design Project 8 Normally the radix is fixed and only the mantissa, sign and exponent are encoded. Using this method up to 2n numbers can be encoded, although obviously with limited accuracy. The IEEE standard 754 defines a standard for binary floating-point arithmetic using either 32 or 64 bit numbers. The 32-bit format is below: S Exponent (8 bits) Mantissa (23 bits) Table 5: Mantissa Exponent encoding The range of numbers which can be stored using this standard is approximately 1.8 × 10-38 to 3.40 × 1038. However, floating point arithmetic is not feasible for this project due to the complexity of the adders and multipliers which would be required. For example, a block diagram of an IEEE 754 adder is shown below in figure 4: Low Power FIR Design Project 9 Figure 4: IEEE 754 Adder 2.6 Final Number Representation Sign-magnitude representation of the coefficients has been chosen as the Number representation to be used to store the coefficients in this project. Mantissa Exponent encoding was rejected on the basis that the Multipliers and Adders involved would be too complex to build and apply power reducing techniques to. Radix-2 cannot represent negative numbers and so is not suitable for this filter. While 1s complement and 2s complement are both sound number representations, the Adders and Multipliers needed to implement operations with these numbers are more complicated than those needed for signed magnitude operations, and so would be harder to make more power efficient. Low Power FIR Design Project 10 After sign-magnitude representation was chosen, the original coefficients were converted into a form easily stored in this encoding (i.e. positive or negative integers). To make this conversion, the following formula was used (except in the case of zero coefficients, which are left stored as zero, “00000000”): Signed _ Magnitude_ Coeff Round(Coeff *128) The coefficients after this conversion are shown below: [4 30 60 30] The final binary representation of the coefficients is shown in Table 6. Coefficient Original Converted Value Binary Number Value Representation 0 0.0284 4 00000100 1 0.2370 30 00011110 2 0.4692 60 00111100 3 0.2370 30 00011110 Table 6: Final Number Representation Low Power FIR Design Project 11 3 Folding Technique In synthesizing DSP architectures, it is important to minimize the silicon area of the integrated circuits, which is achieved by reducing the number of functional units (such as adders and multipliers), registers, multiplexers, and interconnection wires. The Folding transformation is used to systematically determine the control circuits in DSP architectures where multiple algorithm operations are time-multiplexed to a single functional unit .By executing multiple algorithm operations on a single functional unit; the number of functional units in the implementation is reduced, resulting in an integrated circuit with low silicon area. The figures 5&6 given below gives an example of how two addition operations can be time multiplexed on a single pipelined hardware adder .The DSP program in the fig 5 computes y(n) = a(n) + b(n) + c(n) .In fig 6 , the two addition operations shown in fig 5 are time multiplexed on a single pipelined adder. The time-multiplexed hardware in fig 6 operates as follows .In cycle 0 ,the samples a(0) and b(0) are switched into the adder ,and the sum ( a(0) + b(0) ) is stored in the delay element unit cycle 1,when ( a(0) + b(0) ) is switched into the adder along with c(0).The sum (a(0) + b(0) + c(0) ) is stored in the delay unit cycle 2 ,when the sum (a(0) + b(0) + c(0) ) is output and the intermediate result ( a(1) +b(1) ) is computed by the adder . b(n) c(n) a(n) y(n) Figure 5: A simple DSP program with 2 addition operations Low Power FIR Design Project 12 2l+0 2l+1 b(n) c(n) 2l+ 0 2l + 0 a(n) D y(n) 2l+1 Figure 6: A Folded architecture where the 2 addition operations are folded to a single hardware adder with 1 stage pipelining. Cycle Adder Input Adder Input System Output (left) (Top) 0 a(0) b(0) -- 1 a(0) + b(0) c(0) - 2 a(1) b(1) a(0) + b(0) + c(0) 3 a(1) + b(1) c(1) - 4 a(2) b(2) a(1) + b(1) + c(1) 5 a(2) + b(2) c(2) - Table 7: Operation of the first six cycles of the folded hardware in the figure This process continues as shown in the table 7. One output sample is produced every 2 clock cycles ,and one sample of each input signal is consumed every 2 clock cycles .As a result ,an input sample must valid for 2 clock cycles before changing. In general, the data on the input of the folded realization is assumed to be valid for ‘N’ cycles before changing, where N is the number of algorithm operations executed on a single functional unit in hardware. The architecture in fig 6 is simple enough to be designed using ad hoc techniques; however, general DSP programs are more complex than the program in fig 5 .In such cases, the systematic folding techniques can be used to design the time multiplexed architectures Low Power FIR Design Project 13 Folding provides a means for trading area for time in a DSP architecture .One way to implement the DSP program in fig 5 is to use two adders in hardware with a pipelining delay between the two adders .This implementation requires two hardware adders and computes one iteration of the program in the time required to perform addition Tadd .On the other hand ,program in the folded implementation in fig 6 uses one hardware adder and computes one iteration of the program in 2 Tadd time .In general ,folding can be used to reduce the number of hardware functional units by a factor of N at expense of increasing the computation time by a factor of N .The two extremes of this area are when a fully parallel implementation is used. 3.1 Folding Transformation The folding transformation provides a systematic technique for designing control circuits for hardware where several algorithm operations are time multiplexed on a single functional unit. The derivation of the folding equation, which is the basis for this technique, is included in this section along with the derivation of the retiming for the folding equation used to retime a DFG prior to folding. U w(e)D V Figure 7: An edge UV with w(e) delays Nl + v HU PUD DF(UV) HV Figure 8: The corresponding folded data path Low Power FIR Design Project 14 Consider the edge e connecting nodes U and V with w (e) delays, as shown in fig 7. Let the executions of the l-th iteration of the nodes U and V be scheduled at the time units Nl + u and Nl + v, respectively , where u and v are the folding orders of the nodes U and V that satisfies 0≤ u, v ≤ N-1 .The folding order of a node is the time partition to which the node is scheduled to execute in hardware. The functional units that execute the nodes U and V are denoted HU and HV, respectively. Note that N is the number of operations folded to a single functional unit and is also referred to as the folding factor. If HU is pipelined by PU stages, then the result of the l-th iteration of the node U is available at the time unit Nl + u + PU. Since the edge U V has w(e) delays, the result of the l-th iteration of the node U is used by the (l + w(e))-th iteration of the node V, which is executed at N(l + w(e))+v. Therefore, the result must be stored for DF (U V) = [N(l + w(e)) + v] – [Nl+ PU +u] = Nw(e) - PU + v - u time units, which is independent of the iteration number l. The edge U V is implemented as a path from HU to HV in the architecture with DF (U V) delays, and data on this path are input to HV at Nl + v, as illustrated in Fig. 8. A folding set is an ordered set of operations executed by the same functional unit. Each folding set contains N entries, some of which may be null operations. The operation in the j-th position within the folding set ( where j goes from 0 to N – 1) is executed by the functional unit during the time partition j. For example, consider the folding set S1 = {A1, 0, A2} for N=3. The operation A1 belongs to the folding set S1 with folding order 0 ( this is also denoted as (S1|0) ), and the operation A2 belongs to the folding set S1 with folding order 2 (this is also denoted as (S1|2)). Due to the null operation in position 1 within S1, the functional unit that executes operations A1 and A2 will not be utilized at time instances 3l+1. Low Power FIR Design Project 15 For a folded system to be realizable ,DF ( U V ) ≥ 0 must hold for all of the edges in the DFG .Since our 7-tap filter is in symmetry DF ( U V ) ≥ 0 is not valid for some of the edges .To achieve this we use a method called Cutset where delays are introduced which will not effect the working of the filter. 3.2 Cutset Method The cycle cutset method cuts each cycle of the given constraints graph by instantiating one variable per cycle. A delay going into each cutset is replaced by a delay coming out of each cutset. This results in the SFG of a form II FIR filter, shown in Figure 9. The current input sample is simultaneously multiplied by all the filter coefficients. The output of each adder is added to a previously computed partial result and then delay. By using this method the filter operation is not changed only the number of delay elements are increased .The 7-tap filter after using the cutset method is changed as shown below. Low Power FIR Design Project 16 Input X(n) D D D D D D D D D D D Output ut Figure 9: The retimed 7-tap filter with folding sets assigned The use of systematic folding techniques is demonstrated by folding the 7-tap programmable fir filter in Fig. 9. Assume that addition and multiplication require 1 and 2 u.t., respectively, and 1-stage pipelined adders and 2-stage pipelined multipliers are available. The functional units can be clocked with period 1 u.t. This filter is folded with folding factor N = 6 using the folding sets shown in the figure. The folding factor N = 6 means that the iteration period of the folded hardware is 6 u.t., i.e., each node of the 7-taped filter is executed exactly once every 6 u.t. in the folded architecture. It also means that a functional unit in the folded hardware executes 6 operations ( nodes ) of the DSP program. To see this, the folding sets in Fig. 9 can be written as S1 = { 2, 3, 4, 1 } and S2 = { 8, 10, 5 , 6 , 7, 9 }. The folding set S1 contains only multiplication operations, and the nodes in this folding set are executed by the same hardware multiplier. Similarly, the folding set S2 contains only addition operations, and the nodes in this folding set are executed by the same hardware adder. Low Power FIR Design Project 17 For example, node 3 is executed in the folded architecture at time instances 4l+2 on the multiplied that implements the operations in the folding set S1. The added architecture is shown in Fig 10. This is obtained from the data-flow graph (DFG) in Fig 9 by writing the folding equation DF (U V) = [N(l + w(e)) + v] – [Nl+ PU +u] = Nw(e) - PU + v – u for each of the 12 edges in the DFG. These 12 equations are DF ( 1 8 ) = 6(1) – 2 + 0 – 3 = 1 DF ( 1 5 ) = 6(1) – 2 + 2 – 3 = 3 DF ( 2 5 ) = 6(0) – 2 + 2 – 0 = 0 DF ( 2 9 ) = 6(1) – 2 + 5 – 0 = 9 DF ( 3 6 ) = 6(0) – 2 + 3 – 1 = 0 DF ( 3 10 ) = 6(1) – 2 + 1 – 1 = 4 DF ( 4 7 ) = 6(0) – 2 + 4 – 2 = 0 DF ( 5 6 ) = 6(1) – 1 + 3 – 2 = 6 DF ( 6 7 ) = 6(1) – 1 + 4 – 3 = 6 DF ( 7 10 ) = 6(2) – 1 + 1 – 4 = 8 DF ( 9 8 ) = 6(1) – 1 + 5 – 1 = 9 DF ( 1 8 ) = 6(1) – 2 + 0 – 3 = 1 For example, DF ( 2 9 ) = 9 means that there is an edge from the multiplier to adder in the folded DFG with 9 delays .Since this edge ends at the node 9 ,Which has a folding order 5 in the fig 9 ,the folded is switched at the input of the adder in the folded DFG at 6 l + 1 .This folded edge can be seen in the fig given below. Low Power FIR Design Project 18 A0 A1 A2 A3 2D D 2D D 5D A4 A5 8 10 5 ∑ D 6D 2D D 6 7 9 8 10 5 6 7 9 Figure 10 : The folded 7-tap filter using the folding sets given in fig 9 The above fig is the our final hardware ckt which we need to simulate .As we can see in the fig we have only one adder and only one multiplier which is of low power than our original circuit . 4 Software implementations 4.1 Matlab Matlab (Matrix laboratory) is an interactive software system for numerical computations and graphics. As the name suggests, Matlab is especially designed for matrix computations: solving systems of linear equations, computing eigenvalues and Low Power FIR Design Project 19 eigenvectors, factoring matrices, and so forth. In addition, it has a variety of graphical capabilities, and can be extended through programs written in its own programming language. Many such programs come with the system; a number of these extend Matlab's capabilities to nonlinear problems, such as the solution of initial value problems for ordinary differential equations.Matlab is one of the fastest and most enjoyable ways to solve problems numerically. The computational problems arising in most courses can be solved much more quickly with Matlab, than with the standard programming languages (Fortran, C, Java, etc.). It is particularly easy to generate some results, draw graphs to look at the interesting features, and then explore the problem further. By minimizing human time, Matlab is particularly useful in the initial investigation of real problems; even though they may eventually have to be solved using more computationally efficient ways on super computers. This language is useful in our project for generating the co-efficients required for our ciucuits .We need four co-efficients for our circuit .We take two filters with frequencies 50Hz and 100Hz and generate the coefficients for these filters. The code for the generation of these coefficients and converting them into binary is as follows. f1=50;f2=100; %Sine wave frequencies fs=500; %Sampling frequency N=[1:1:1000]; for n=1:1000 y1(n)=sin(2*pi*f1*n/fs); %Sine wave generation with f = 50Hz y2(n)=sin(2*pi*f2*n/fs); %Sine wave generation with f = 100Hz end Low Power FIR Design Project 20 figure plot(N,y1) figure plot(N,y2) %Design of lowpass FIR filter n=4; %Order of the filter Wn1=100/500;Wn2=200/500 %Cutoffs for the filter b1=fir1(n,Wn1) b2=fir1(n,Wn2) figure freqz(b1,fs) figure freqz(b2,fs) k1=round(b1*128) %Signed Magnitude representation a1=dec2bin(k1,8) %Decimal to Binary conversion k2=round(b2*128) %Signed Magnitude representation a2=dec2bin(k2,8) %Decimal to Binary conversion The outputs obtained when this code is run the matlab are Wn2 = 0.4000 b1 = Columns 1 through 5 0.0284 0.2370 0.4692 0.2370 0.0284 Low Power FIR Design Project 21 b2 = Columns 1 through 5 0.0101 0.2203 0.5391 0.2203 0.0101 k1 = 4 30 60 30 4 a1 = 00000100 00011110 00111100 00011110 00000100 k2 = 1 28 69 28 1 a2 = 00000001 00011100 01000101 00011100 00000001 The wave forms obtained are Low Power FIR Design Project 22 Figure 11 :Sine Wave for Freq 50Hz Figure 12: Sine Wave for Freq 100Hz Low Power FIR Design Project 23 Figure 13: Wave form for Freq 50Hz Figure 14 : Wave form for Freq 50Hz Low Power FIR Design Project 24 These results are tabulated as Signed Binary values Frequencies Co-efficients Magnitude Conversion 50Hz 0.0284 4 00000100 0.2370 30 00011110 0.4692 60 00111100 0.2370 30 00011110 0.0284 4 00000100 100Hz 0.0101 1 00000001 0.2203 28 00011100 0.5391 69 01000101 0.2203 28 00011100 0.0101 1 00000001 Table 8 : Matlab Results 4.2 OrCAD The two different adder structures were schematically designed in Capture CIS, which has the ability to produce a SPICE netlist. The designs were then stimulated and analysed within pSPICE, producing waveforms that can be inspected by the user. However pSPICE does not have the implicit capability to detect and count the number of 0-1 transitions on gates, especially those deep within the design hierarchy. It was suggested to use 4-bit binary counters to detect the 0-1 gate transitions. This was achieved by creating a part from a 4-bit binary counter and adding these within the design being examined. An example is shown in figure 15. The output of every gate was fed into the clock input of an incrementing counter that is reset at the start of the stimulus. Every time a 0-1 transition occurs, the counter will increment once, producing a set of numbers from all the gates to sum at the end of the simulation. This provides an accurate representation of the switching activity, and therefore power consumption, of the design. Low Power FIR Design Project 25 The test setup is reproduced in figure 15. A typical test schematic is shown in Appendix 1. B Count B C A Count C Count A preset Q0 Q1 0-1 Binary Q2 Transition Counter Q3 Counter CLK System nRST Figure 15: Transition monitoring setup The stimuli for each test were applied using the Capture CIS digital stimulus part from the Source.olb library file. There were three main limitations with this technique: The maximum number of stimuli that could be applied at any one input was 19. This is a limitation of pSPICE. If a glitch occurred, the binary counters’ clock width could be violated, resulting in a persistent error. The technique was labour intensive; adding the necessary counters to the design was a time consuming, and error prone task. By applying grey code to one of the adder inputs and keeping the other input constant it was possible to keep glitches to a reasonable level and good, usable results were Low Power FIR Design Project 26 achieved. The detailed results for the alternative architectures can be seen in Appendix 2. Power analysis of a 1-bit Ripple adder versus a 1-bit Carry Look-ahead adder would be trivial, in fact at 1-bit size, they are identical in structure. However to estimate the performance of the 1-bit adders in a large multiplier structure (where they are chained together) analysis of larger adders is completed. 1-bit, 4-bit and 8-bit adders of both designs were analysed, the overall results are shown in table 9, and a more detailed tabulation of the results can be found in Appendix 2. Adder Size Ripple-Through Carry Look-Ahead 1-Bit 5 5 4-Bit 26.55 31.33 8-Bit 26.2 41.8 Table 9: Average 0-1 transitions per test To generate stimulus, one input to the adder was kept constant, while several possible values were applied to the other input. The same tests were performed on each of the designs. Comparing these results, and then extrapolating for 16-Bit and 32-Bit adders it is obvious that the power consumption of the Carry Look-ahead will be considerably more than that of the Ripple adder, an estimated 49% difference when considering a 16-Bit Adder. Adder Percentage Ripple Carry Size Difference 1 Bit 5.00 5 0% 4 Bit 26.55 31.33 15.25% 8 Bit 26.20 41.80 37.32% 16 Bit* 26 51 49% 32 Bit* 26 61 57.38% Table 10: Adder Transition Test Comparison (*Estimates) Low Power FIR Design Project 27 These results, although displaying the trends expected of the two adder designs, are flawed by the small number of stimulus (and the data dependence intrinsic to the process) that were applied to the adders. Secondly, pSPICE has several limitations with regards to simulation of this type. As a result, the OrCAD layout was used to generate a VHDL netlist, to be simulated with VHDL testbenches in Veribest to give more comprehensive and representative results. 4.3 VHDL VHDL is a programming language that has been designed and optimized for describing the behavior of digital systems.VHDL has many features appropriate for describing the behavior of electronic components ranging from simple logic gates to complete microprocessors and custom chips. Features of VHDL allow electrical aspects of circuit behavior (such as rise and fall times of signals, delays through gates, and functional operation) to be precisely described. The resulting VHDL simulation models can then be used as building blocks in larger circuits (using schematics, block diagrams or system-level VHDL descriptions) for the purpose of simulation. VHDL is also a general-purpose programming language: just as high-level programming languages allow complex design concepts to be expressed as computer programs, VHDL allows the behavior of complex electronic circuits to be captured into a design system for automatic circuit synthesis or for system simulation. Like Pascal, C and C++, VHDL includes features useful for structured design techniques, and offers a rich set of control and data representation features. Unlike these other programming languages, VHDL provides features allowing concurrent events to be described. This is important because the hardware described using VHDL is inherently concurrent in its operation. One of the most important applications of VHDL is to capture the performance specification for a circuit, in the form of what is commonly referred to as a test bench. Test benches are VHDL descriptions of circuit stimuli and corresponding expected outputs that verify the Low Power FIR Design Project 28 behavior of a circuit over time. Test benches should be an integral part of any VHDL project and should be created in tandem with other descriptions of the circuit. For our project we use the VHDL coding for both of our initial 7-tap filter and the folded architecture .The individual elements used for our circuits are 6-1 multiplexer Delay element 8-bit multiplier 8-bit adder Clk Signals The input samples are given as a sine wave. The sine wave package code is given as library ieee; use ieee.std_logic_1164.all; package sine_package is constant max_table_value: integer := 255; subtype table_value_type is integer range 0 to max_table_value; constant max_table_index: integer := 255; subtype table_index_type is integer range 0 to max_table_index; function get_table_value (table_index: table_index_type) return table_value_type; end; package body sine_package is function get_table_value (table_index: table_index_type) return table_value_type is variable table_value: table_value_type; begin case table_index is when 0 => table_value := 1; when 1 => table_value := 2; when 2 => table_value := 4; when 3 => table_value := 5; when 4 => table_value := 7; Low Power FIR Design Project 29 when 5 => table_value := 9; when 6 => table_value := 10; when 7 => table_value := 12; when 8 => table_value := 13; when 9 => table_value := 15; when 10 => table_value := 16; when 11 => table_value := 18; when 12 => table_value := 20; when 13 => table_value := 21; when 14 => table_value := 23; when 15 => table_value := 24; when 16 => table_value := 26; when 17 => table_value := 27; when 18 => table_value := 29; when 19 => table_value := 30; when 20 => table_value := 32; when 21 => table_value := 34; when 22 => table_value := 35; when 23 => table_value := 37; when 24 => table_value := 38; when 25 => table_value := 40; when 26 => table_value := 41; when 27 => table_value := 43; when 28 => table_value := 44; when 29 => table_value := 46; when 30 => table_value := 47; when 31 => table_value := 49; when 32 => table_value := 51; when 33 => table_value := 52; when 34 => table_value := 54; Low Power FIR Design Project 30 when 35 => table_value := 55; when 36 => table_value := 57; when 37 => table_value := 58; when 38 => table_value := 60; when 39 => table_value := 61; when 40 => table_value := 63; when 41 => table_value := 64; when 42 => table_value := 66; when 43 => table_value := 67; when 44 => table_value := 69; when 45 => table_value := 70; when 46 => table_value := 72; when 47 => table_value := 73; when 48 => table_value := 75; when 49 => table_value := 76; when 50 => table_value := 78; when 51 => table_value := 79; when 52 => table_value := 81; when 53 => table_value := 82; when 54 => table_value := 84; when 55 => table_value := 85; when 56 => table_value := 87; when 57 => table_value := 88; when 58 => table_value := 90; when 59 => table_value := 91; when 60 => table_value := 93; when 61 => table_value := 94; when 62 => table_value := 95; when 63 => table_value := 97; when 64 => table_value := 98; Low Power FIR Design Project 31 when 65 => table_value := 100; when 66 => table_value := 101; when 67 => table_value := 103; when 68 => table_value := 104; when 69 => table_value := 105; when 70 => table_value := 107; when 71 => table_value := 108; when 72 => table_value := 110; when 73 => table_value := 111; when 74 => table_value := 113; when 75 => table_value := 114; when 76 => table_value := 115; when 77 => table_value := 117; when 78 => table_value := 118; when 79 => table_value := 120; when 80 => table_value := 121; when 81 => table_value := 122; when 82 => table_value := 124; when 83 => table_value := 125; when 84 => table_value := 126; when 85 => table_value := 128; when 86 => table_value := 129; when 87 => table_value := 130; when 88 => table_value := 132; when 89 => table_value := 133; when 90 => table_value := 134; when 91 => table_value := 136; when 92 => table_value := 137; when 93 => table_value := 138; when 94 => table_value := 140; Low Power FIR Design Project 32 when 95 => table_value := 141; when 96 => table_value := 142; when 97 => table_value := 144; when 98 => table_value := 145; when 99 => table_value := 146; when 100 => table_value := 147; when 101 => table_value := 149; when 102 => table_value := 150; when 103 => table_value := 151; when 104 => table_value := 153; when 105 => table_value := 154; when 106 => table_value := 155; when 107 => table_value := 156; when 108 => table_value := 158; when 109 => table_value := 159; when 110 => table_value := 160; when 111 => table_value := 161; when 112 => table_value := 162; when 113 => table_value := 164; when 114 => table_value := 165; when 115 => table_value := 166; when 116 => table_value := 167; when 117 => table_value := 168; when 118 => table_value := 170; when 119 => table_value := 171; when 120 => table_value := 172; when 121 => table_value := 173; when 122 => table_value := 174; when 123 => table_value := 175; when 124 => table_value := 176; Low Power FIR Design Project 33 when 125 => table_value := 178; when 126 => table_value := 179; when 127 => table_value := 180; when 128 => table_value := 181; when 129 => table_value := 182; when 130 => table_value := 183; when 131 => table_value := 184; when 132 => table_value := 185; when 133 => table_value := 186; when 134 => table_value := 187; when 135 => table_value := 188; when 136 => table_value := 189; when 137 => table_value := 191; when 138 => table_value := 192; when 139 => table_value := 193; when 140 => table_value := 194; when 141 => table_value := 195; when 142 => table_value := 196; when 143 => table_value := 197; when 144 => table_value := 198; when 145 => table_value := 199; when 146 => table_value := 200; when 147 => table_value := 201; when 148 => table_value := 202; when 149 => table_value := 202; when 150 => table_value := 203; when 151 => table_value := 204; when 152 => table_value := 205; when 153 => table_value := 206; when 154 => table_value := 207; Low Power FIR Design Project 34 when 155 => table_value := 208; when 156 => table_value := 209; when 157 => table_value := 210; when 158 => table_value := 211; when 159 => table_value := 212; when 160 => table_value := 212; when 161 => table_value := 213; when 162 => table_value := 214; when 163 => table_value := 215; when 164 => table_value := 216; when 165 => table_value := 217; when 166 => table_value := 218; when 167 => table_value := 218; when 168 => table_value := 219; when 169 => table_value := 220; when 170 => table_value := 221; when 171 => table_value := 221; when 172 => table_value := 222; when 173 => table_value := 223; when 174 => table_value := 224; when 175 => table_value := 225; when 176 => table_value := 225; when 177 => table_value := 226; when 178 => table_value := 227; when 179 => table_value := 227; when 180 => table_value := 228; when 181 => table_value := 229; when 182 => table_value := 230; when 183 => table_value := 230; when 184 => table_value := 231; Low Power FIR Design Project 35 when 185 => table_value := 232; when 186 => table_value := 232; when 187 => table_value := 233; when 188 => table_value := 233; when 189 => table_value := 234; when 190 => table_value := 235; when 191 => table_value := 235; when 192 => table_value := 236; when 193 => table_value := 236; when 194 => table_value := 237; when 195 => table_value := 238; when 196 => table_value := 238; when 197 => table_value := 239; when 198 => table_value := 239; when 199 => table_value := 240; when 200 => table_value := 240; when 201 => table_value := 241; when 202 => table_value := 241; when 203 => table_value := 242; when 204 => table_value := 242; when 205 => table_value := 243; when 206 => table_value := 243; when 207 => table_value := 244; when 208 => table_value := 244; when 209 => table_value := 245; when 210 => table_value := 245; when 211 => table_value := 246; when 212 => table_value := 246; when 213 => table_value := 246; when 214 => table_value := 247; Low Power FIR Design Project 36 when 215 => table_value := 247; when 216 => table_value := 248; when 217 => table_value := 248; when 218 => table_value := 248; when 219 => table_value := 249; when 220 => table_value := 249; when 221 => table_value := 249; when 222 => table_value := 250; when 223 => table_value := 250; when 224 => table_value := 250; when 225 => table_value := 251; when 226 => table_value := 251; when 227 => table_value := 251; when 228 => table_value := 251; when 229 => table_value := 252; when 230 => table_value := 252; when 231 => table_value := 252; when 232 => table_value := 252; when 233 => table_value := 253; when 234 => table_value := 253; when 235 => table_value := 253; when 236 => table_value := 253; when 237 => table_value := 253; when 238 => table_value := 254; when 239 => table_value := 254; when 240 => table_value := 254; when 241 => table_value := 254; when 242 => table_value := 254; when 243 => table_value := 254; when 244 => table_value := 254; Low Power FIR Design Project 37 when 245 => table_value := 254; when 246 => table_value := 255; when 247 => table_value := 255; when 248 => table_value := 255; when 249 => table_value := 255; when 250 => table_value := 255; when 251 => table_value := 255; when 252 => table_value := 255; when 253 => table_value := 255; when 254 => table_value := 255; when 255 => table_value := 255; end case; return table_value; end; end; The code for the sine wave samples library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; use work.sine_package.all; entity sine_wave is port( clk: in std_logic; reset: in std_logic; enable: in std_logic; wave_out: out std_logic_vector(7 downto 0) ); end; architecture arch1 of sine_wave is type state_type is ( counting_up, change_down, counting_down, change_up ); signal state, next_state: state_type; signal table_index: table_index_type; signal positive_cycle: boolean; begin process( clk, reset ) begin if reset = '1' then state <= counting_up; elsif rising_edge( clk ) then if enable = '1' then state <= next_state; end if; Low Power FIR Design Project 38 end if; end process; process( state, table_index ) begin next_state <= state; case state is when counting_up => if table_index = max_table_index then next_state <= change_down; end if; when change_down => next_state <= counting_down; when counting_down => if table_index = 0 then next_state <= change_up; end if; when others => -- change_up next_state <= counting_up; end case; end process; process( clk, reset ) begin if reset = '1' then table_index <= 0; positive_cycle <= true; elsif rising_edge( clk ) then if enable = '1' then case next_state is when counting_up => table_index <= table_index + 1; when counting_down => table_index <= table_index - 1; when change_up => positive_cycle <= not positive_cycle; when others => -- nothing to do end case; end if; end if; end process; process( table_index, positive_cycle ) variable table_value: table_value_type; begin table_value := get_table_value( table_index ); if positive_cycle then wave_out <= std_logic_vector(to_signed(table_value,8)); else wave_out <= std_logic_vector(to_signed(-table_value,8)); end if; end process; end; The code for our initial ckt (non folded 7-tap FIR filter cicuit) is Library IEEE; Use IEEE.std_logic_1164.all; Low Power FIR Design Project 39 Use IEEE.std_logic_arith.all; Use IEEE.std_logic_unsigned.all; Entity FIR2 is Port( CLK :std_logic; RESET :in std_logic; DIN :in std_logic_vector(7 downto 0);--8bit Input Data A0 :in std_logic_vector(7 downto 0);--8bit Coefficients A1 :in std_logic_vector(7 downto 0); A2 :in std_logic_vector(7 downto 0); A3 :in std_logic_vector(7 downto 0); DOUT :out std_logic_vector(21 downto 0) --22bit Data Out ); end; Architecture BEH2 of FIR2 is Signal X0, X1, X2, X3 : std_logic_vector(15 downto 0); Begin X0 <= DIN*A0; --4 multipliers X1 <= DIN*A1; X2 <= DIN*A2; X3 <= DIN*A3; Process(CLK, RESET) variable REG5 : std_logic_vector(15 downto 0); variable REG6 : std_logic_vector(16 downto 0); variable REG7 : std_logic_vector(17 downto 0); variable REG10: std_logic_vector(18 downto 0); variable REG9 : std_logic_vector(19 downto 0); variable REG8 : std_logic_vector(20 downto 0); begin if (RESET = '1') then DOUT <= (others =>'0'); REG5 := (others =>'0'); REG6 := (others =>'0'); REG7 := (others =>'0'); REG8 := (others =>'0'); REG9 := (others =>'0'); REG10:= (others =>'0'); elsif (CLK'event and CLK = '1') then DOUT <= ('0' & (REG8+X3)); --6 adders REG5 := X3; REG6 := ('0' & (X0+REG5)); REG7 := ('0' & (X1+REG6)); REG10:= ('0' & (X2+REG7)); REG9 := ('0' & (X1+REG10)); REG8 := ('0' & (X0+REG9)); end if; end process; end; The VHDL code for our final circuit (folded architecture) is Low Power FIR Design Project 40 Library IEEE; Use IEEE.std_logic_1164.all; Use IEEE.std_logic_arith.all; Use IEEE.std_logic_unsigned.all; Entity FIR3 is Port( CLK :std_logic; RESET :in std_logic; DIN :in std_logic_vector(7 downto 0);--8bit Input Data A0 :in std_logic_vector(7 downto 0);--8bit Coefficients A1 :in std_logic_vector(7 downto 0); A2 :in std_logic_vector(7 downto 0); A3 :in std_logic_vector(7 downto 0); DOUT :out std_logic_vector(21 downto 0) --22bit Data Out ); end; Architecture BEH3 of FIR3 is Signal SUM_OF_TAPS : std_logic_vector(21 downto 0); Signal PROD_OF_COEFF : std_logic_vector(15 downto 0); Signal MUX_0: std_logic_vector(7 downto 0); Signal MUX_1: std_logic_vector(15 downto 0); Signal MUX_2: std_logic_vector(21 downto 0); Begin PROD_OF_COEFF <= MUX_0*DIN; --1 multiplier SUM_OF_TAPS <= MUX_1+MUX_2; --1 adder Process(CLK, RESET) variable N1: std_logic_vector(2 downto 0); variable N2: std_logic_vector(2 downto 0); variable TEMP1, TEMP2, TEMP3, TEMP4: std_logic_vector(15 downto 0); variable TEMP5, TEMP6, TEMP7, TEMP8: std_logic_vector(15 downto 0); variable TEMP9, TEMP10, TEMP11: std_logic_vector(15 downto 0); variable TEMP12, TEMP13, TEMP14, TEMP15: std_logic_vector(21 downto 0); variable TEMP16, TEMP17, TEMP18: std_logic_vector(21 downto 0); variable TEMP19, TEMP20, TEMP21: std_logic_vector(21 downto 0); begin if (RESET = '1') then DOUT <= (others => '0'); N1 := "000"; N2 := "000"; MUX_0 <= A0; MUX_1 <= TEMP3; MUX_2 <= TEMP12; TEMP1 := (others => '0'); TEMP2 := (others => '0'); TEMP3 := (others => '0'); TEMP4 := (others => '0'); TEMP5 := (others => '0'); TEMP6 := (others => '0'); TEMP7 := (others => '0'); TEMP8 := (others => '0'); TEMP9 := (others => '0'); TEMP10 := (others => '0'); TEMP11 := (others => '0'); Low Power FIR Design Project 41 TEMP12 := (others => '0'); TEMP13 := (others => '0'); TEMP14 := (others => '0'); TEMP15 := (others => '0'); TEMP16 := (others => '0'); TEMP17 := (others => '0'); TEMP18 := (others => '0'); TEMP19 := (others => '0'); TEMP20 := (others => '0'); TEMP21 := (others => '0'); else if (CLK'event and CLK = '1') then DOUT <= TEMP12; if (N1 = 0) then MUX_0 <= A0; elsif (N1 = 1) then MUX_0 <= A1; elsif (N1 = 2) then MUX_0 <= A2; elsif (N1 = 3) then MUX_0 <= A3; end if; if (N1 < 4) then N1 := N1 + 1; else N1 := "000"; end if; if (N2 = 0) then MUX_1 <= TEMP3; MUX_2 <= TEMP12; elsif (N2 = 1) then MUX_1 <= TEMP6; MUX_2 <= TEMP20; elsif (N2 = 2) then MUX_1 <= TEMP5; MUX_2 <= ("000000" & (TEMP2)); elsif (N2 = 3) then MUX_1 <= TEMP2; MUX_2 <= TEMP18; elsif (N2 = 4) then MUX_1 <= TEMP2; MUX_2 <= TEMP18; elsif (N2 = 5) then MUX_1 <= TEMP11; MUX_2 <= TEMP21; end if; if (N2 < 6) then N2 := N2 + 1; else N2 := "000"; end if; TEMP1 := PROD_OF_COEFF; TEMP2 := TEMP1; --TAP 5 (2) --TAPS 6,7 (1) Low Power FIR Design Project 42 TEMP3 := TEMP2; --TAP 8 (1) TEMP4 := TEMP3; TEMP5 := TEMP4; --TAP 5 (1) TEMP6 := TEMP5; --TAP 10 (1) TEMP7 := TEMP6; TEMP8 := TEMP7; TEMP9 := TEMP8; TEMP10 := TEMP9; TEMP11 := TEMP10; --TAP 9 (1) TEMP12 := SUM_OF_TAPS; --TAP 8 (2) TEMP13 := TEMP12; TEMP14 := TEMP13; TEMP15 := TEMP14; TEMP16 := TEMP15; TEMP17 := TEMP16; TEMP18 := TEMP17; --TAPS 6,7 (2) TEMP19 := TEMP18; TEMP20 := TEMP19; --TAP 10 (2) TEMP21 := TEMP20; --TAP 9 (2) end if; end if; end Process; end; These codes are simulated using a simulator tool called Veribest. 4.4 Veribest The Veribest VHDL simulator is a simulation environment which includes a Vhdl simulator that is VHDL ’93 and supports VHDL ’87.The Veribest VHDL simulator simulates existing designs created with VHDL simulator’s existing window ,designs created with Veribest HDL writer or another text editor ,and designs converted to VHDL from schematics and graphical HDL tools. It provides tools for creating new libraries .It supports synopsis smart models and Modelling systems hardware models .It optimizations for the vital specification (IEEE 1076.4).It supports standard delay format (SDF) for back annotation of timing data to vital compliant designs. The above given VHDL code is simulated using this veribest simulator. The test bench for sine wave is Low Power FIR Design Project 43 Library IEEE; Use IEEE.std_logic_1164.all; Use IEEE.numeric_std.all; Use work.sine_package; entity sine_wave_tb is end; architecture bench of sine_wave_tb is component sine_wave port( CLK: in std_logic; RESET: in std_logic; ENABLE: in std_logic; wave_out: out std_logic_vector(7 downto 0) ); end component; signal CLK: std_logic:= '0'; signal RESET: std_logic:= '0'; signal ENABLE: std_logic:= '1'; signal wave_out: std_logic_vector(7 downto 0):= (others => '0'); begin u0: sine_wave port map ( CLK, RESET, ENABLE, wave_out ); CLK <= not CLK after 10ns; end; The test bench for our initial 7-tap filter ckt is Library IEEE; Use IEEE.std_logic_1164.all; Use IEEE.std_logic_arith.all; Entity FIR2_TB is end; Architecture BEH2_TB of FIR2_TB is Component FIR2 is Port ( CLK :std_logic; RESET:in std_logic; DIN :in std_logic_vector(7 downto 0); --8bit Input Data A0 :in std_logic_vector(7 downto 0); --8bit Coefficients A1 :in std_logic_vector(7 downto 0); A2 :in std_logic_vector(7 downto 0); A3 :in std_logic_vector(7 downto 0); DOUT :out std_logic_vector(21 downto 0) --22bit Data Out Low Power FIR Design Project 44 ); end component; Component sine_wave is port( CLK : in std_logic; RESET: in std_logic; ENABLE:in std_logic; wave_out: out std_logic_vector(7 downto 0) ); end component; Signal CLK :std_logic:='0'; Signal RESET:std_logic:='0'; Signal ENABLE:std_logic:='0'; Signal DIN :std_logic_vector(7 downto 0):="00000000"; Signal A0 :std_logic_vector(7 downto 0):="00000000"; Signal A1 :std_logic_vector(7 downto 0):="00000000"; Signal A2 :std_logic_vector(7 downto 0):="00000000"; Signal A3 :std_logic_vector(7 downto 0):="00000000"; Signal DOUT :std_logic_vector(21 downto 0):=(others => '0'); begin U0: FIR2 port map(CLK, RESET, DIN, A0, A1, A2, A3, DOUT); U1: sine_wave port map(CLK, RESET, ENABLE, DIN); ENABLE<= '1'; RESET <= '1', '0' after 50ns; CLK <= not CLK after 10ns; A0 <= "00011110"; --30 A1 <= "00111100"; --60 A2 <= "00011110"; --30 A3 <= "00001000"; --08 end; The test bench for our final ckt (folded architecture) is Library IEEE; Use IEEE.std_logic_1164.all; Use IEEE.std_logic_arith.all; Entity FIR3_TB is end; Architecture BEH3_TB of FIR3_TB is Component FIR3 is Port ( CLK :std_logic; RESET:in std_logic; DIN :in std_logic_vector(7 downto 0); --8bit Input Data A0 :in std_logic_vector(7 downto 0); --8bit Coefficients Low Power FIR Design Project 45 A1 :in std_logic_vector(7 downto 0); A2 :in std_logic_vector(7 downto 0); A3 :in std_logic_vector(7 downto 0); DOUT :out std_logic_vector(21 downto 0) --22bit Data Out ); end component; Component sine_wave is port( CLK : in std_logic; RESET: in std_logic; ENABLE:in std_logic; wave_out: out std_logic_vector(7 downto 0) ); end component; Signal CLK :std_logic:='0'; Signal RESET:std_logic:='0'; Signal ENABLE:std_logic:='0'; Signal DIN :std_logic_vector(7 downto 0):="00000000"; Signal A0 :std_logic_vector(7 downto 0):="00000000"; Signal A1 :std_logic_vector(7 downto 0):="00000000"; Signal A2 :std_logic_vector(7 downto 0):="00000000"; Signal A3 :std_logic_vector(7 downto 0):="00000000"; Signal DOUT :std_logic_vector(21 downto 0):=(others => '0'); begin U0: FIR3 port map(CLK, RESET, DIN, A0, A1, A2, A3, DOUT); U1: sine_wave port map(CLK, RESET, ENABLE, DIN); ENABLE<= '1'; RESET <= '1', '0' after 35ns; CLK <= not CLK after 10ns; A0 <= "00011110"; --30 A1 <= "00111100"; --60 A2 <= "00011110"; --30 A3 <= "00001000"; --08 end; These are all implemented in the veribest simulating tool and the outputs for both the circuits are tabulated as Outputs for initial ckt Outputs for final ckt 0 0 8 30 234 120 468 270 912 326 Low Power FIR Design Project 46 5 Conclusion This report has described the analysis performed to design a low power FIR filter system by using the folding technique. The hand analysis proved that by using the folding technique we can reduce the number of adders and multipliers in the system to only one adder and multiplier, which in turn reduces the silicon area and thus reduces the power. The hardware structure of both designs ( the folded and the non folded ) were simulated and the results came out to be close to make sure that the design was right. Thus the Low power FIR filter was designed. Low Power FIR Design Project 47

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 2 |

posted: | 9/12/2012 |

language: | Unknown |

pages: | 47 |

OTHER DOCS BY 5IB3dq2k

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.