Introduction to DSP What is a DSP

Document Sample
Introduction to DSP What is a DSP Powered By Docstoc
					                  Introduction to DSP

                         Maurizio Palesi




Maurizio Palesi                                                            1




What is a DSP?
   Digital
       Operating by the use of discrete signals to represent data in the
       form of numbers
   Signal
       A variable parameter by which information is conveyed through an
       electronic circuit
   Processing
       To perform operations on data according to programmed
       instructions
   Digital Signal Processing
       Changing or analysing information which is measured as discrete
       sequences of numbers



Maurizio Palesi                                                            2




                                                                               1
Main Characteristics
   Compared to other embedded computing applications,
   DSP applications are differentiated by the following
       Computationally demanding
            Iterative numeric algorithms
       Sensitivity to small numeric errors (audible noise)
       Stringent real-time requirements
       Streaming data
       High data bandwidth
       Predictable (though often eccentric) memory access pattern
       Predictable program flow (nested loops)




Maurizio Palesi                                                                     3




DSP Processors
                                       1970
                                        1970
     Not adequate              DSP techniques in
                                DSP techniques in                     Not adequate
     performance                                                      flexibility and
                               telecommunication
                                telecommunication                     reusability
                                    equipment
                                     equipment


                                                        Custom
                                                         Custom
              Microprocessor
              Microprocessor                        fixed function
                                                     fixed function
                                                       hardware
                                                        hardware



                                 DSP processors
                                 DSP processors


Maurizio Palesi                                                                     4




                                                                                        2
DSP vs. General Purpose
   DSPs adpot a range of specialized features
       Single-cycle multiplier
                                                 VLIW, Superscalar, SIMD,
       Multiply-accumulate operations            multiprocessing, ...
       Saturation arithmetic
       Separate program and data memories
       Dedicated, specilized addressing hw
       Complex, specialized instruction sets    GP
                                                GP                 DSP
                                                                   DSP




   Today, virtually very commercial 32-bit microprocessor architecture
   (from ARM to 80x86) has been subject to some kind of DSP-oriented
   enhancement


Maurizio Palesi                                                               5




Converting Analogue Signals
                                                A continuous signal




                                                Measured against a clock




                                                Is first held at each clock tick




                                                The signal is measured, and
                                                the measurement converted to
                                                a digital value


Maurizio Palesi                                                               6




                                                                                   3
Aliasing
   Some higher frequencies can be incorrectly interpreted
       Aliasing problem: One frequency looks like another


                                                             A high frequency signal



                                                             sampled at too low rate



                                                             looks like...



                                                             … a lower frequency signal


Maurizio Palesi                                                                         7




Aliasing
   We must sample faster than twice the frequency of the highest
   frequency component [Nyquist’s theorem]
       This avoids aliasing
            Actually, Nyquist says that we have to sample faster than the signal bandwidth

                                                             A signal



                                                             sampled twice per cycle



                                                             has enough information



                                                             to be reconstructed


Maurizio Palesi                                                                         8




                                                                                             4
Frequency Resolution
   We cannot see slow changes in the signal if we don't wait
   long enough
       We must sample for at least one complete cycle of the lowest
       frequency we want to resolve
   Compromise
       We must sample fast to avoid and for a long time to achieve a good
       frequency resolution
       Sampling fast for a long time means we will have a lot of samples
            Lots of samples means lots of computation
       So we will have to compromise between resolving frequency
       components of the signal, and being able to see high frequencies



Maurizio Palesi                                                                9




Quantisation
   When the signal is converted to digital form, the precision is limited by
   the number of bits available
   The errors introduced by digitisation are both
       Non linear: We cannot calculate their effects using normal maths
       Signal dependent: the errors are coherent and so cannot be reduced by
       simple means




                                      Limited precision leads to errors...



                                      which are signal dependent

Maurizio Palesi                                                                10




                                                                                    5
Time Domain Processing
   Correlation
   Autocorrelation to extract a signal from
   noise
   Cross correlation to locate a know signal
   Cross correlation to identify a signal
   Convolution




Maurizio Palesi                                                                       11




Correlation
   Correlation is a weighted moving average
                                                                x


    r ( n) = ∑ x ( k ) × y ( k + n)                             y

                  k                                             Shift y by n

                                                                Multiply the two together


   Requires a lot of calculation                                Integrate

       If one signal is of length M and the other is of length N, then we
       need (N * M) multiplications, to calculate the whole correlation
       function
            Note that really, we want to multiply and then accumulate the result -
            this is typical of DSP operations and is called a multiply & accumulate
            operation

Maurizio Palesi                                                                       12




                                                                                            6
Correlation
   Correlation is a maximum when two signals are similar in shape
   Correlation is a measure of the similarity between two signals as a
   function of time shift between them

                                   If two signals are similar and unshifted...


                                   their product is all positive




                                   But as the shift increase...


                                   parts of it become negative...

                                   and the correlation function shows where the
                                   signals are similar and unshifted


Maurizio Palesi                                                                               13




Detecting Periodicity
                                                                                 EEG signal




                                                                                     EEG
                                                                                 autocorrelation




   Autocorrelation as a way to detect periodicity in signals

Maurizio Palesi                                                                               14




                                                                                                   7
Detecting Periodicity
                                                                           EEG signal
                                                                           with noise




                                                                           EEG with noise
                                                                           autocorrelation




   Although a rhythm is not even visible (upper trace) it is detected by
   autocorrelation (lower trace)
Maurizio Palesi                                                                         15




Align Signals
                                                                                Signal x




                                                                                Signal y
                                6




                                                                                corr(x,y)



                   6
Maurizio Palesi                                                                         16




                                                                                             8
Align Signals


                                                             x

                                                             y




Maurizio Palesi                                                                 17




Cross correlation
   Cross correlation (correlating a signal with another) can be
   used to detect and locate known reference signal in noise

                           A radar or sonar ‘chirp’ signal

                           bounced off a target may be buried in noise...

                           bounced but correlating with the ‘chirp’ reference


                           crearly reveals when the echo comes




Maurizio Palesi                                                                 18




                                                                                     9
Cross Corelation to Identify a Signal
   Cross correlation (correlating a signal with another) can be
   used to identify a signal by comparison with a library of
   known reference signals

                  The chirp of a nightingale...

                  correlates strongly with another nightgale...

                  but weakly with a dove...

                  or a heron...




Maurizio Palesi                                                   19




Cross Corelation to Identify a Signal
   Cross correlation is one way in which sonar can
   identify different types of vessel
        Each vessel has a unique sonar signature
       The sonar system has a library of pre-recorded echoes
       from different vessels
       An unknown sonar echo is correlated with a library of
       reference echoes
        The largest correlation is the most likely match




Maurizio Palesi                                                   20




                                                                       10
Convolution
   Correlation is a weighted moving average with one signal
   flipped back to front                    To convolve one signal


    r ( n) = ∑ x ( k ) × y ( k − n)                          with another signal


                  k                                          first flip the second signal

                                                             Then shift it

                                                             Then multiply the two together

   Requires a lot of calculation                             And integrate under the curve

       If one signal is of length M and the other is of length N, then we
       need (N * M) multiplications, to calculate the whole convolution
       function
            We need to multiply and then accumulate the result - this is typical of
            DSP operations and is called a multiply & accumulate operation

Maurizio Palesi                                                                             21




Convolution vs. Correlation
   Convolution is used for digital filtering
       Convolving two signals is equivalent to multiplying the
       frequency spectra of the two signals together
            It is easily understood, and is what we mean by filtering
       Correlation is equivalent to multiplying the complex
       conjugate of the frequency spectrum of one signal by
       the frequency spectrum of the other
            It is not so easily understood and so convolution is used for
            digital filtering
   Convolving by multiplying frequency spectra is
   called fast convolution


Maurizio Palesi                                                                             22




                                                                                                 11
Fourier Transform
   The Fourier Transform is a methamatical procedure that allows to convert a signal from
   the time domain to the frequency domain
   Any signal or waveform could be made up just by adding together a series of sine waves
   with appropriate amplitude and phase

                             A square wave can be made by adding...



                             the fundamental



                             minus 1/3 of the third harmonic



                             plus 1/5 of the fifth harmonic...



                             minus 1/7 of the 7th harmonic...



Maurizio Palesi                                                                        23




Fourier Transform
   The Fourier transform is an equation to calculate the frequency,
   amplitude and phase of each sine needed to make up any given signal
       The Fourier Transform (FT) is a mathematical formula using integrals
       The Discrete Fourier Transform (DFT) is a discrete numerical equivalent
       using sums instead of integrals
       The Fast Fourier Transform (FFT) is just a computationally fast way to
       calculate the DFT
   The Discrete Fourier Transform involves a summation

                         H ( f ) = ∑ c[k ]× e −2πjk ( f∆ )
                                          k
   DFT and the FFT involve a lot of multiply and accumulate the result
       This is typical of DSP operations and is called a multiply & accumulate
       operation


Maurizio Palesi                                                                        24




                                                                                            12
Filtering
                   Raw                                     Filtered
                  signal
                                     Filter
                                     Filter                 signal



   The function of a filter is to remove unwanted parts of the
   signal
       Random noise
       Extract useful parts of the signal
            Components lying within a certain frequency range

   Filters
       Analog
       Digital

Maurizio Palesi                                                       25




Analog Filters
   An analog filter uses analog electronic
   circuits
       Use components such as resistors, capacitors
       and op amps
   Widely used in such applications
        Noise reduction
        Video signal enhancement
        Graphic equalisers in hi-fi systems
        ..., and many other areas
Maurizio Palesi                                                       26




                                                                           13
Digital Filters
   A digital filter uses a digital processor to
   perform numerical calculations on sampled
   values of the signal
        Specialised DSP chip


                  A/D
                  A/D               DSP
                                    DSP                    D/A
                                                           D/A
  Unfiltered            Sampled                Digitally         Filtered
   analog               digitised              filtered          analog
   signal                signal                 signal            signal




Maurizio Palesi                                                             27




Advantage of Digital Filters
   Programmability
       The digital filter can easily be changed without affecting the circuitry
   Analog filter circuits are subject to drift and are dependent
   on temperature
   Digital filters can handle low frequency signals accurately
   As the speed of DSP technology continues to increase,
   digital filters are being applied to high frequency signals in
   the RF domain
   Versatility
        Adapt to changes in the characteristics of the signal




Maurizio Palesi                                                             28




                                                                                  14
DSP Processors
   Characteristic features of DSP processors
   Special features for arithmetic
   I/O interfaces
   Memory architectures
   Data formats
   Some basic DSP chip designs
   Brief overview of some major DSP
   processors

Maurizio Palesi                                    29




Characteristics of DSP Processors
   DSP processors are mostly designed with
   the same few basic operations in mind
   They share the same set of basic
   characteristics
        Specialised high speed arithmetic
        Data transfer to and from the real world
        Multiple access memory architectures




Maurizio Palesi                                    30




                                                        15
Characteristics of DSP Processors
   The basic DSP operations                               c[0]
                                                x                       y
       Additions and multiplications
            Fetch two operands                      Z-1
            Perform the addition or                       c[1]
            multiplication (usually both)
            Store the result or hold it for a       Z-2
            repetition                                    c[2]
       Delays
            Hold a value for later use
       Array handling
            Fetch values from consecutive
            memory locations
            Copy data from memory to
            memory


Maurizio Palesi                                                        31




Characteristics of DSP Processors
   To suit these fundamental operations DSP processors
   often have
       Parallel multiply and add
       Multiple memory accesses (to fetch two operands and store the
       result)
       Lots of registers to hold data temporarily
       Efficient address generation for array handling
       Special features such as delays or circular addressing




Maurizio Palesi                                                        32




                                                                            16
Address Generation
   The ability to generate new addresses efficiently is a characteristic
   feature of DSP processors
   Usually, the next needed address can be generated during the data
   fetch or store operation, and with no overhead
   DSP processors have rich sets of address generation operations

         *rP
          *rP      register indirect read the data pointed to by the address in register rP
                    register indirect read the data pointed to by the address in register rP
                                      having read the data, postincrement the address
                                       having read the data, postincrement the address
         *rP++
          *rP++     postincrement pointer to point to the next value in the array
                   postincrement       pointer to point to the next value in the array
                                      having read the data, postdecrement the address
                                       having read the data, postdecrement the address
         *rP--
          *rP--    postdecrement pointer to point to the previous value in the array
                    postdecrement pointer to point to the previous value in the array
                                      having read the data, postincrement the address
                                       having read the data, postincrement the address
                   register
                    register          pointer by the amount held in register rIrIto point to rIrI
                                       pointer by the amount held in register to point to
         *rP++rI postincrement
          *rP++rI postincrement values further down the array
                                       values further down the array
                                      having read the data, postincrement the address
                                       having read the data, postincrement the address
                                      pointer to point to the next value in the array, as ififthe
                                       pointer to point to the next value in the array, as the
         *rP++rIr bit reversed
          *rP++rIr bit reversed       address bits were in bit reversed order
                                       address bits were in bit reversed order



Maurizio Palesi                                                                                     33




Bit Reversed Addressing
   DSPs are tightly targeted to a small number of algorithms
       It is surprising that an addressing mode hase been specifically
       defined for just one application (the FFT)

Addresses generated by a radix-2 FFT                      Whithout special support such
                                                          address transformations would
 0 (0002)                          0 (0002)
                                                                Take an extra memory access to
 1 (0012)                          4 (1002)
                                                                get the new address
 2 (0102)                          2 (0102)
                                                                Involve a fair amount of logical
 3 (0112)                          6 (1102)
                                                                instructions
 4 (1002)                          1 (0012)
 5 (1012)                          5 (1012)
 6 (1102)                          3 (0112)
 7 (1112)                          7 (1112)



Maurizio Palesi                                                                                     34




                                                                                                         17
Memory Addressing
   As DSP programmers migrate toward larger programs, they are more
   attracted to compilers
        Such compilers are not able to fully exploit such specific addressing modes
        DSP community routinely uses library routines
             Programmers may benefit even if they write at a high level

                             Addressing mode
                              Addressing mode                                  Percent
                                                                                Percent
  Immediate
   Immediate                                                                    30,02%
                                                                                 30,02%
  Displacement
   Displacement                                                                10,82%
                                                                                10,82%
  Register indirect
   Register indirect                                                           17,42%
                                                                                17,42%          ~90%
  Direct
   Direct                                                                      11,99%
                                                                                11,99%
  Autoincrement, postincrement
   Autoincrement, postincrement                                                18,84%
                                                                                18,84%
  Autoincrement, preincrement with 16 bit immediate
   Autoincrement, preincrement with 16 bit immediate                            0,77%
                                                                                 0,77%
  Autoincrement, preincrement with circular addresing
   Autoincrement, preincrement with circular addresing                          0,08%
                                                                                 0,08%
  Autoincrement, postincrement by contents of AR0
   Autoincrement, postincrement by contents of AR0                              1,54%
                                                                                 1,54%
  Autoincrement, postincrement by contents of AR0, with circular addressing
   Autoincrement, postincrement by contents of AR0, with circular addressing    2,15%
                                                                                 2,15%
  Autodecrement, postdecrement
   Autodecrement, postdecrement                                                 6,08%
                                                                                 6,08%

Maurizio Palesi                                                                                         35




DSP Processors: Input/Output
                                                             DSP is mostly dealing with the real world
                                                             • Communication with an overall system controller
                                                             • Signals coming in and going out
                                                             • Communication with other DSP processors

     System controller




Signal In                               DSP
                                        DSP                                  Signal Out




                         DSP
                         DSP                             DSP
                                                         DSP          Other DSP


Maurizio Palesi                                                                                         36




                                                                                                                 18
DSP Evolution
   When DSP processors first came out, they were rather fast processors
       The first floating point DSP, the AT&T DSP32, ran at 16 MHz at a time when PC
       computer clocks were 5 MHz
       A fashionable demonstration at the time was to plug a DSP board into a PC and run
       a fractal (Mandelbrot) calculation on the DSP and on a PC side by side
            The DSP fractal was of course faster

   Today…
       The fastest DSP processor is the Texas TMS320C6201 which runs at 200 MHz
            This is no longer very fast compared with an entry level PC
              – ...And the same fractal today will actually run faster on the PC than on the DSP!

       But…
            Try feeding eight channels of high quality audio data in and out of a Pentium simultaneously
            in real time, without impacting on the processor performance




Maurizio Palesi                                                                                     37




Signals
   They are usually handled by high speed
   synchronous serial ports
   Serial ports are inexpensive
        Having only two or three wires
       Well suited to audio or telecommunications
       data rates up to 10 Mbit/s
        Usually operate under DMA
            Data presented at the port is automatically written
            into DSP memory without stopping the DSP


Maurizio Palesi                                                                                     38




                                                                                                           19
Host Communications
   Many systems will have another, general purpose,
   processor to supervise the DSP
       For example, the DSP might be on a PC plug-in card
   Whereas signals tend to be continuous, host
   communication tends to require data transfer in batches
       for instance to download a new program or to update filter
       coefficients
   Some DSP processors have dedicated host ports
       Lucent DSP32C has a host port which is effectively an 8 bit or 16
       bit ISA bus
       the Motorola DSP56301 and the Analog Devices ADSP21060 have
       host ports which implement the PCI bus

Maurizio Palesi                                                               39




Interprocessor Communications
   Interprocessor communications is needed when a
   DSP application is too much for a single processor
   The Texas TMS320C40 and the Analog Devices
   ADSP21060 both have six link ports
       Would ideally be parallel ports at the word length of the
       processor, but this would use up too many pins
        A hybrid called serial/parallel is used
            'C40, comm ports are 8 bits wide and it takes four transfers to
            move one 32 bit word
            21060, link ports are 4 bits wide and it takes 8 transfers to move
            one 32 bit word


Maurizio Palesi                                                               40




                                                                                   20
Memory Architectures
   Additions and multiplications require us to
        Fetch two operands
        Perform the addition or multiplication (usually both)
        Store the result or hold it for a repetition
   To fetch the two operands in a single instruction
   cycle, we need to be able to make two memory
   accesses simultaneously
        Plus one access to write back the result
        Plus one access to fetch the instruction itself


Maurizio Palesi                                                 41




Memory Architectures
   There are two common methods to achieve
   multiple memory accesses per instruction cycle
        Harvard architecture
        Modified von Neuman architecture




Maurizio Palesi                                                 42




                                                                     21
Harvard Architecture

          Program
          Program                        DSP
                                         DSP                         Data
                                                                     Data


   DSP operations usually involve at least two operands
       DSP Harvard architectures usually permit the program bus to be
       used also for access of operands
       It is often necessary to fetch the instruction too
            The Harvard architecture is inadequate to support this
            Super Harvard architecture (SHARC)
              – DSP Harvard architectures often also include a cache memory, leaving
                both Harvard buses free for fetching operands



Maurizio Palesi                                                                        43




Modified von Neuman Architectures
   The Harvard architecture requires two memory
   buses
        This makes it expensive to bring off the chip
   Even the simplest DSP operation requires four
   memory accesses (three to fetch the two operands
   and the instruction, plus a fourth to write the result)
        This exceeds the capabilities of a Harvard architecture
       Some processors get around this by using a modified
       von Neuman architecture



Maurizio Palesi                                                                        44




                                                                                            22
Modified von Neuman Architectures
                               Program
                               Program
                                  &&                        DSP
                                                            DSP
                                 Data
                                 Data

   The modified von Neuman architecture allows multiple
   memory accesses per instruction
       Run the memory clock faster than the instruction cycle
   Lucent DSP32C runs with an 80 MHz clock
       This is divided by four to give 20 MIPS
       The memory clock runs at the full 80 MHz
            Each instruction cycle is divided into four 'machine states' and a
            memory access can be made in each machine state


Maurizio Palesi                                                                              45




Example Processor
       Address generation         Lots of registers               Signal in




                                                                  Signal out


       Parallel multiply/add                Efficient I/O


                                                                                System
                                                                                controller


                      Multiple memories




                                                                               Other DSP


Maurizio Palesi                                                                              46




                                                                                                  23
 Example Processor: Lucent DSP32C
                                                               Modified von Neuman architecture
22x24 bit registers
Also serve for integer arithmetic




                             Address   24
                                            Data 32




                                                  40


 Maurizio Palesi                                                                           47




 Example Processor: ASP21060
Super Harvard architecture
                                                                                Six link
                                                                                 ports




                         Prog. address      24
                   Prog data   48

                                                        Data address 32
                                                       Data data 40


    Two
   serial
   ports



 Maurizio Palesi                                                                           48




                                                                                                  24
Data Formats
   DSP processors store data in fixed or floating point formats
                                                 Integer
                   0    1     0     1     0     0     1     1     = 26 + 24 + 21 + 20= 83
                  -27   26    25    24    23    22    21    20
                                               Fixed point
                   0    1     0     1     0     0     0     0     = 2-1 + 2-3 = 0.5 + 0.125 = 0.625
                  -20   2-1   2-2   2-3   2-4   2-5   2-6   2-7

   The programmer has to make some decisions
       If a fixed point number becomes too large for the available word
       length, he has to scale the number down, by shifting it to the right
       If a fixed point number is small, he has to scale the number up, in
       order to use more of the available word length

Maurizio Palesi                                                                                49




Fixed Point
   Fixed point can be thought of as just low-
   cost floating point
       It does not include an exponent in every word
       No hw that automatically aligns and normalizes
       operands
            DSP programmer take cares to keep the exponent in
            a separate variable
            Often this variable is shared by a set of fixed-point
            variables
              – Blocked floating point



Maurizio Palesi                                                                                50




                                                                                                      25
Floating Point
   Floating point format has the remarkable property of
   automatically scaling all numbers by moving, and keeping
   track of, the binary point so that all numbers use the full
   word length available but never overflow

                            Mantissa                    Exponent
                   0    1   1   0   1   0   0   0   0   0   1   1   0
                  -2-1 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7   -23 22 21 20

                       Mantissa = 20 + 2-1 + 2-3= 1 + 0.5 + 0.125 = 1.625
                       Exponent = 22 + 21 = 6
                       Decimal value = 1.625 × 26



Maurizio Palesi                                                                 51




Data Formats
   In Floating Point the HW automatically scales and
   normalises every number
   Errors due to truncation and rounding depend on the size
   of the number
   These errors can be seen as a source of quantisation
   noise
       Then the noise is modulated by the size of the signal
       The signal dependend modulation of the noise is undesiderable
       because is audible
            The audio industry prefers to use fixed point DSP processors over
            floating point




Maurizio Palesi                                                                 52




                                                                                     26
Saturating Arithmetics
   DSPs are often used in real-time applications
       No exception on arithmentic overflow
            It could miss an event
       To support such an environment, DSP architectures use saturating
       arithmetic
            If the result is too large to be represented, it is set to the largest representable
            number
Normal two’s
complement arithmetic                                 Saturating arithmetic




Maurizio Palesi                                                                                53




Programming a DSP Processor
   A simple FIR filter program
   Using pointers
   Avoiding memory bottlenecks
   Assembler programming




Maurizio Palesi                                                                                54




                                                                                                    27
A Simple FIR Filter
   The simple FIR filter equation is
                           y[n ] = ∑ c[k ]× x[n − k ]
                                   k

   Which can be implemented quite directly in C language

                  y[n] = 0.0;
                  for (k=0; k<N; k++)
                     y[n] = y[n] + c[k] * x[n-k];



              Accessed                 Accessing by     Arithmetic is
              repeatedly               array index is    needed to
                                         inefficient    calculate this
                                                         array index

Maurizio Palesi                                                          55




Problem in Addressing
   Five operation to calculate the address of
   the element x[n-k]
        Load the start address of the table in memory
        Load the value of the index n
        Load the value of the index k
        Calculate the offset [n - k]
        Add the offset to the start address of the array
   Only after all five operations can the
   compiler actually read the array element

Maurizio Palesi                                                          56




                                                                              28
Using Pointers
                  y[n] = 0.0;
                  for (k=0; k<N; k++)
                     y[n] = y[n] + c[k] * x[n-k];




            float *y_ptr, *c_ptr, *x_ptr;
            y_ptr = &y[n];
            for (k=0; k<N; k++)
               *y_ptr = *y_ptr + *c_ptr++ * *x_ptr--;

c                       x                               y


    c_ptr                             x_ptr                 y_ptr

Maurizio Palesi                                                     57




Using Pointers
            float *y_ptr, *c_ptr, *x_ptr;
            y_ptr = &y[n];
            for (k=0; k<N; k++)
               *y_ptr = *y_ptr + *c_ptr++ * *x_ptr--;

    Each pointer still has to be initialised
        But only once, before the loop
        Not requiring any arithmetic to calculate offsets
    Using pointers is more efficient than array indices on any
    processor
        It is especially efficient for DSP processors
            Address increments often come for free


Maurizio Palesi                                                     58




                                                                         29
Using Pointers
          *rP       register indirect read the data pointed to by the address in register rP
                                      having read the data, postincrement the address
          *rP++     postincrement pointer to point to the next value in the array
                                      having read the data, postdecrement the address
          *rP--     postdecrement pointer to point to the previous value in the array
                                      having read the data, postincrement the address
                    register          pointer by the amount held in register rI to point to rI
          *rP++rI   postincrement values further down the array


   The address increments are performed in the same
   instruction as the data access to which they refer
       They incur no overhead at all
       Most DSP processors can perform two or three address increments
       for free in each instruction
            So the use of pointers is crucially important for DSP processors


Maurizio Palesi                                                                                  59




Limiting Memory Accesses
           float *y_ptr, *c_ptr, *x_ptr;
           y_ptr = &y[n];
           for (k=0; k<N; k++)
              *y_ptr = *y_ptr + *c_ptr++ * *x_ptr--;


                    Store             Load              Load                  Load

   Four memory accesses
       Even without counting the need to load the instruction,
       this exceeds the capacity of a DSP processor
   Fortunately, DSP processors have lots of registers


Maurizio Palesi                                                                                  60




                                                                                                      30
   Limiting Memory Accesses
                       register float temp;   This initialization
                       temp = 0.0;               is wasted!

                       for (k=0; k<N; k++)
                          temp = temp + *c_ptr++ * *x_ptr--;




                       register float temp;
                       temp = *c_ptr++ * *x_ptr--;
                       for (k=1; k<N; k++)
                          temp = temp + *c_ptr++ * *x_ptr--;


   Maurizio Palesi                                                                                                            61




   Compiler for DSPs
        Despite the well documented advantages in programmer productivity
        and software maintenance...

                       Ratio to assembly in Ratio to assembly in    TMS320C6203 (C62) for Ratio to assembly in Ratio to assembly in
                        Ratio to assembly in Ratio to assembly in    TMS320C6203 (C62) for Ratio to assembly in Ratio to assembly in
TMS320C54 D (C54) execution time (>1          code space (>1        EEMBC Telecom               execution time (>1  code space (>1
  TMS320C54 D (C54) execution time (>1         code space (>1        EEMBC Telecom               execution time (>1  code space (>1
for DSPstone kernels     means slower)         means bigger)        kernels                       means slower)      means bigger)
  for DSPstone kernels     means slower)        means bigger)        kernels                       means slower)      means bigger)
Convolution                    11,8                 16,5            Convolution encoder                44,0              0,5
  Convolution                    11,8                 16,5           Convolution encoder                44,0              0,5
FIR                            11,5                  8,7            Fixed-point complex FFT            13,5              1,0
  FIR                            11,5                 8,7            Fixed-point complex FFT            13,5              1,0
Matrix 1x3                      7,7                  8,1            Viterbi GSM decoder                13,0              0,7
  Matrix 1x3                     7,7                  8,1            Viterbi GSM decoder                13,0              0,7
FIR2dim                         5,3                  6,5            Fixed-point bit allocation         7,0               1,4
  FIR2dim                        5,3                  6,5            Fixed-point bit allocation         7,0               1,4
Dot product                     5,2                 14,1            Autocorrelation                    1,8               0,7
  Dot product                    5,2                  14,1           Autocorrelation                    1,8               0,7
LMS                             5,1                  0,7
  LMS                            5,1                  0,7
N real update                   4,7                 14,1
  N real update                  4,7                  14,1
IIR n biquad                    2,4                  8,6
  IIR n biquad                   2,4                  8,6
N complex update                2,4                  9,8
  N complex update               2,4                  9,8
Matrix                          1,2                  5,1
  Matrix                         1,2                  5,1
Complex update                  1,2                  8,7
  Complex update                 1,2                  8,7
IIR one biquad                  1,0                  6,4
  IIR one biquad                 1,0                  6,4
Real update                     0,8                 15,6
  Real update                    0,8                  15,6
C54 geometric mean              3,2                  7,8            C62 geometric mean             10,0                0,8
  C54 geometric mean             3,2                  7,8            C62 geometric mean             10,0                0,8




   Maurizio Palesi                                                                                                            62




                                                                                                                                       31
Introduction
   The TMS320C3x generation of DSPs are
   high performance 32-bit floating-point
   devices in the TMS320 family
   Extensive internal busing
   Powerful DSP instruction set
   60 MFLOPS
   High degree of on-chip parallelism
        Up to 11 operations in a single instruction


Maurizio Palesi                                       63




General Features
   General-purpose register file
   Program cache
   Dedicated auxiliary register arithmetic units
   (ARAU)
   Internal dual-access memories
   Direct memory access (DMA)
   Short machine-cycle time


Maurizio Palesi                                       64




                                                           32
Block Diagram




Maurizio Palesi                                       65




Central Processing Unit (CPU)
   The ’C3x devices have a register-based CPU
   architecture
   The CPU consists of the following components
        Floating-point/integer multiplier
        Arithmetic logic unit (ALU)
        32-bit barrel shifter
        Internal buses (CPU1/CPU2 and REG1/REG2)
        Auxiliary register arithmetic units (ARAUs)
        CPU register file



Maurizio Palesi                                       66




                                                           33
                    Block diagram of the CPU




Maurizio Palesi                                  67




                  Single-cycle multiplications
                     24-bit integer
                         Result 32-bit
                     32-bit floating-point
                         Result 40-bit




Maurizio Palesi                                  68




                                                      34
                  The ALU performs single-
                  cycle operations on
                      32-bit integer
                      32-bit logical
                      40-bit floating-point data
                  Single-cycle integer and
                  floating point conversions
                  The barrel shifter is used to
                  shift up to 32 bits left or right
                  in a single cycle




Maurizio Palesi                                    69




                  Four internal buses
                      CPU1, CPU2, REG1, and
                      REG2 carry
                          Two operands from
                          memory
                          Two operands from the
                          register file
                      Allowing parallel
                      multiplies and
                      adds/subtracts on four
                      integer or floating-point
                      operands in a single cycle




Maurizio Palesi                                    70




                                                        35
                  Two auxiliary register
                  arithmetic units (ARAU0 and
                  ARAU1) can generate two
                  addresses in a single cycle
                  The ARAUs operate in
                  parallel with the multiplier and
                  ALU
                  They support addressing with
                      Displacements
                      Index registers (IR0 and IR1)
                      Circular
                      Bit-reversed addressing




Maurizio Palesi                                     71




                  28 registers in a multiport register
                  file
                  All of the primary registers can be
                      Operated upon by the multiplier and
                      ALU
                      Used as general-purpose registers
                  The registers also have some
                  special functions
                      The eight extended-precision
                      registers are especially suited for
                      maintaining extended-precision
                      floating-point results
                      The eight auxiliary registers support
                      a variety of indirect addressing
                      modes and can be used as general-
                      purpose 32-bit integer and logical
                      registers
                      The remaining registers provide
                      such system functions as
                      addressing, stack management,
                      processor status, interrupts, and
                      block repeat

Maurizio Palesi                                     72




                                                              36
Peripherals
                                Timers
                                   The two timer modules
                                   are general-purpose 32-
                                   bit timer/event counters
                                Serial ports
                                   The bidirectional serial
                                   ports are totally
                                   independent
                                   Each serial port can be
                                   configured to transfer 8,
                                   16, 24, or 32 bits of data
                                   per word




Maurizio Palesi                                            73




Direct Memory Access (DMA)
                  The DMA controller can read/write any
                  location in the memory map without
                  interfering with the CPU operation
                  Dedicated DMA address and data buses
                  minimize conflicts between the CPU and
                  the DMA controller
                  When the CPU and DMA access the
                  same resources priorities must be
                  established
                     CPU
                     DMA
                     Rotating




Maurizio Palesi                                            74




                                                                37
Extended Precision Registers (R7-R0)

   Can store and support operations on 32-bit
   integer and 40-bit floating-point numbers




Maurizio Palesi                                                        75




Auxiliary Registers (AR7-AR0)
   The CPU can access the
       Eight 32-bit auxiliary registers (AR7−AR0)
       Two auxiliary register arithmetic units (ARAUs)
   The primary function of the auxiliary registers is the
   generation of 24-bit addresses
       They can also operate as loop counters in indirect addressing
       32-bit general purpose registers that can be modified by the
       multiplier and ALU




Maurizio Palesi                                                        76




                                                                            38
  Other Registers
      Index Registers (IR1, IR0)
            Used by the ARAU for indexing the address
      Block size register (BK)
            Used by the ARAU in circular addressing to specify the data block
            size
      System-stack Pointer (SP)
            Contains the address of the top of the system stack
            SP always points to the last element pushed onto the stack
            SP is manipulated by interrupts, traps, calls, returns, and the
            PUSH, PUSHF, POP, and POPF instructions




  Maurizio Palesi                                                                            77




  Status Register (ST)
      Contains global information about the state of the CPU
            Operations usually set the condition flags of the status register
            according to whether the result is 0, negative, etc.
 Global        Cache         Repeat          Latched          floating         Zero        Carry
interrupt      enable        mode            floating           point
 enable                                        point         underflow
                                             overflow




    Clear           Cache             Overflow          Latched          Negative     Overflow
    cache           freeze             mode             overflow




  Maurizio Palesi                                                                            78




                                                                                                   39
Repeat Counter (RC) and Block Repeat (RS,RE)

   RC is a 32-bit register that specifies the number of
   times a block of code is to be repeated when a
   block repeat is performed
        If RC=n, the loop is executed n+1 times
   RS register contains the starting address of the
   program-memory block to be repeated when the
   CPU is operating in the repeat mode
   RE register contains the ending address of the
   program-memory block to be repeated when the
   CPU is operating in the repeat mode

Maurizio Palesi                                          79




Instruction Cache
   64×32-bit instruction cache
       2-way set associative
       LRU replacement policy
   It allows the use of slow, external memories while still
   achieving single-cycle access performances
   The cache also frees external buses from program fetches
   so that they can be used by the DMA or other system
   elements




Maurizio Palesi                                          80




                                                              40
Addressing Modes
   Five types of addressing
        Register addressing
        Direct addressing
        Indirect addressing
        Immediate addressing
        PC-relative addressing
   Plus two specialized addressing modes
        Circular addressing
        Bit-reverse addressing

Maurizio Palesi                                    81




Register Addressing
   A CPU register contains the operand

             ABSF     R1 ; R1 = |R1|


       Every CPU’s registers can be used (R0-R7,
       AR0-AR7, DP, IR0, IR1, …)



Maurizio Palesi                                    82




                                                        41
Direct Addressing
   The data address is formed by the concatenation of the
   eight LSBs of the data-page pointer (DP) with the 16 LSBs
   of the instruction word (expr)
       This results in 256 pages (64K words per page)




Maurizio Palesi                                                         83




Direct Addressing

                   ADDI                 @0BCDEh, R7
                  Before Instruction               After Instruction

           R7        00 0000 0000             R7     00 1234 5678

          DP                       8A        DP                    8A

                     Data memory                     Data memory


  8ABCDEh               1234 5678        8ABCDEh         1234 5678




Maurizio Palesi                                                         84




                                                                             42
Indirect Addressing
   Specifies the address of an operand in memory through
   the contents of
       an auxiliary register,
       optional displacements,
       and index registers
   The auxiliary register arithmetic units (ARAUs) perform the
   unsigned arithmetic




Maurizio Palesi                                             85




Indirect Addressing
                  Indirect addressing with displacement




Maurizio Palesi                                             86




                                                                 43
Indirect Addressing
                  Indirect addressing with index register IR0




Maurizio Palesi                                                 87




Indirect Addressing
                     Indirect addressing (special cases)




Maurizio Palesi                                                 88




                                                                     44
Immediate Addressing
   The operand is a 16-bit (short) or 24-bit (long) immediate
   value contained in the instruction word
   Depending on the data types assumed for the instruction,
   the immediate operand can be
       A 2s-complement integer
       an unsigned integer,
       or a floating-point number

                             SUB        1, R0
                  Before Instruction         After Instruction

           R0        00 0000 0000       R0    00 FFFF FFFF


Maurizio Palesi                                                  89




PC-relative Addressing
   It adds the contents of the 16 or 24 LSBs of the instruction
   word to the PC register
   The assembler takes the src (a label or address) specified
   by the user and generates a displacement
       The displacement is equal to
       [src − (instruction address+1)]

                                             ; pc=1001h,
                    BU            Label      ; Label = 1005h
                                             ; --> displacement = 3

                  Before Instruction         After Instruction
                  Decode phase               Execution phase
           PC                    1002   PC                1005


Maurizio Palesi                                                  90




                                                                      45
Circular Addressing
   Many DSP algorithms, such as convolution and correlation, require a
   circular buffer in memory
   In convolution and correlation, the circular buffer acts as a sliding
   window that contains the most recent data to process
   As new data is brought in, the new data overwrites the oldest data

                   Logical representation             Physical representation



                                              Start




                                              End




Maurizio Palesi                                                                 91




Circular Addressing

                   Logical representation             Physical representation



                       value0
                            6   value5        Start           value6
                                                                   0

                                                              value7
                                                                   1

                                                              value2
                  value7
                       1             value4
                                                              value3
                                                              value4
                       value2   value3        End             value5




Maurizio Palesi                                                                 92




                                                                                     46
  Implementation
         BK          Length of the circular buffer
                (16 bit, <64K)
         The K LSB of the start address of the buffer
         must be 0
                K is such that 2K > buffer length

        Length of buffer BK register value     Starting address of buffer
              31                31            XXXXXXXXXXXXXXXXXXX00000
              32                32            XXXXXXXXXXXXXXXXXX000000
             1024              1024           XXXXXXXXXXXXX00000000000



   Maurizio Palesi                                                          93




  Algorithm for Circular Addressing

                                           if (0 ≤ index+step < BK)
        Start                                 index = index+step;
Index

                                  Buffer
                                           else if (index+step ≥ BK)
                                  length
                                  (BK)        index = index+step-BK;
         End                               else
                                              index = index+step+BK;




   Maurizio Palesi                                                          94




                                                                                 47
Circular Addressing - Example
        *ARn++(disp)%        ; addr = ARn
                             ; ARn = circ(ARn+disp)
                                        Addr   Memory
             ; AR0 is 0
             ; BK is 6                   0
             *AR0++(5)%                  1
             ; Now AR0 is circ(0+5)=5
                                         2
             *AR0++(2)%                  3
             ; Now AR0 is circ(5+2)=1
                                         4
             *AR0−−(3)%                  5
             ; Now AR0 is circ(1-3)=4
                                         6
             *AR0++(6)%                  7
             ; Now AR0 is circ(4+6)=4
                                         8
             *AR0−−%
             ; Now AR0 is circ(4-1)=3    ...
             *AR0%

Maurizio Palesi                                         95




ISA Overview
   The instruction set contains 113 instructions
        Load and store
        2-operand arithmetic/logical
        3-operand arithmetic/logical
        Program control
        Interlocked operations
        Parallel operations




Maurizio Palesi                                         96




                                                             48
Load & Store
   The ’C3x supports 13 load and store instructions
        Load a word from memory into a register
        Store a word from a register into memory
        Manipulate data on the system stack




Maurizio Palesi                                        97




2-Operand Instructions
   The ’C3x supports 35 2-operand arithmetic
   and logical instructions
   The two operands are the source and
   destination
        The source operand can be
            Memory word
            Register
            Part of the instruction word
        The destination operand is always a register


Maurizio Palesi                                        98




                                                            49
2-Operand Instructions




Maurizio Palesi                                                        99




3-Operand Instructions
   3-operand instructions have two source operands and a destination
   operand
       A source operand can be
            Memory word
            Register
       The destination is always a register




Maurizio Palesi                                                        100




                                                                             50
Program-Control Instructions
   The program-control instruction group consists of all of
   those instructions that affect program flow




Maurizio Palesi                                               101




Low-Power Control Instructions
   The low-power control instruction group consists
   of 3 instructions that affect the low-power modes




Maurizio Palesi                                               102




                                                                    51
Interlocked-Operations Instructions
   The five interlocked-operations instructions support multiprocessor
   communication and the use of external signals to allow for powerful
   synchronization mechanisms
   They also ensure the integrity of the communication and result in a
   high-speed operation




Maurizio Palesi                                                          103




Parallel Operations
   The 13 parallel-operations instructions make a
   high degree of parallelism possible
   Some of the ’C3x instructions can occur in pairs
   that are executed in parallel
       Parallel loading of registers
       Parallel arithmetic operations
       Arithmetic/logical instructions used in parallel with a
       store instruction




Maurizio Palesi                                                          104




                                                                               52
Parallel Operations
   Parallel arithmetic with store instructions




                                               Many other

Maurizio Palesi                                             105




Parallel Operations
   Parallel load instructions




   Parallel multiply and add/subtract instructions




Maurizio Palesi                                             106




                                                                  53
Examples
   FIR Filter
   Matrix-Vector Multiplication




Maurizio Palesi                                                                     107




Data Structure for FIR Filters
   Circular addressing is especially useful for the implementation of FIR filters


              Impulse response                        Input samples
  AR0              h(N-1)                AR1               x(0)
                   h(N-2)                                  x(1)
                   h(N-3)                                  x(2)




                    h(2)                                  x(N-3)
                    h(1)                                  x(N-2)
                    h(0)                                  x(N-1)




Maurizio Palesi                                                                     108




                                                                                          54
FIR Filter Code
                                       Addr      Memory
* Impulse Response
      .sect ”Impulse_Resp”                         ...
H     .float 1.0                          H        1.0
      .float 0.99                                 0.99
      .float 0.95                                             Impulse_Resp
                                                   ...
      ...
                                                   0.1
      .float 0.1
                                                   ...
* Input Buffer                            X         ?
X     .usect ”Input_Buf”,128                        ?
                                                              Input_Buf
                                                   ...
      .data                                         ?
HADDR .word H                                      ...
XADDR .word X                        HADDR         H
N     .word 128                      XADDR         X
                                          N        128
                                                   ...

Maurizio Palesi                                                      109




FIR Filter Code (cnt’d)
* Initialization
       LDP     HADDR
       LDI     @N,BK           ; Load block size
       LDI     @HADDR,AR0      ; Load pointer to impulse response
       LDI     @XADDR,AR1      ; Load pointer to input samples

TOP       LDF     IN,R3        ;   Read input sample
          STF     R3,*AR1++%   ;   Store the samples
          LDF     0,R0         ;   Initialize R0
          LDF     0,R2         ;   Initialize R2

* Filter
       RPTS   N−1           ; Repeat next instruction
       MPYF3 *AR0++%,*AR1++%,R0
       || ADDF3 R0,R2,R2    ; MAC
       ADDF   R0,R2         ; Last product accumulated

          STF     R2,Y         ; Save result
          B       TOP          ; Repeat



Maurizio Palesi                                                      110




                                                                             55
Matrix-Vector Multiplication
   [P]K×1=[M]K×N × [V]N×1

        for (i=0; i<K;
        for (i=0; i<K;   i++)
                         i++)
        {
        {
             p[i] = 0
             p[i] = 0
             for (j=0;
             for (j=0;   j<N; j++)
                         j<N; j++)
                  p[i]
                  p[i]   = p[i] + m[i,j] * v[j]
                         = p[i] + m[i,j] * v[j]
        }
        }




Maurizio Palesi                                   111




Matrix-Vector Multiplication
   Data memory organization




Maurizio Palesi                                   112




                                                        56
Matrix-Vector Multiplication
     ** AR0 :: ADDRESS OF M(0,0)
         AR0    ADDRESS OF M(0,0)
     ** AR1 :: ADDRESS OF V(0)
         AR1    ADDRESS OF V(0)
     ** AR2 :: ADDRESS OF P(0)
         AR2    ADDRESS OF P(0)
     ** AR3 :: NUMBER OF ROWS -- 11 (K-1)
         AR3    NUMBER OF ROWS       (K-1)
     ** R1 :: NUMBER OF COLUMNS -- 22 (N-2)
         R1     NUMBER OF COLUMNS       (N-2)

     MAT
      MAT      LDI
                LDI     R1,IR0
                         R1,IR0       ;; Number of columns-2 -> IR0
                                          Number of columns-2 -> IR0
               ADDI
                ADDI    2,IR0
                         2,IR0        ;; Number of columns -> IR0
                                          Number of columns -> IR0
      ROWS LDF
     ROWS       LDF     0.0,R2
                         0.0,R2       ;; Initialize R2
                                          Initialize R2
               MPYF3
                MPYF3   *AR0++(1),*AR1++(1),R0 ;; m(i,0) ** v(0) -> R0
                         *AR0++(1),*AR1++(1),R0        m(i,0)    v(0) -> R0
               RPTS
                RPTS    R1
                         R1           ;; Multiply aa row by aa column
                                          Multiply    row by    column
               MPYF3
                MPYF3   *AR0++(1),*AR1++(1),R0 ;; m(i,j) ** v(j) -> R0
                         *AR0++(1),*AR1++(1),R0        m(i,j)    v(j) -> R0
            || ADDF3
             || ADDF3   R0,R2,R2
                         R0,R2,R2     ;; m(i,j-1) ** v(j-1) ++ R2 -> R2
                                          m(i,j-1)    v(j-1)    R2 -> R2
               SUBI
                SUBI    1,AR3
                         1,AR3
               BNZD
                BNZD    ROWS
                         ROWS         ;; Counts the no. of rows left
                                          Counts the no. of rows left
               ADDF
                ADDF    R0,R2
                         R0,R2        ;; Last accumulate
                                          Last accumulate
      Delay
       slot    STF
                STF     R2,*AR2++(1) ;; Result -> p(i)
                         R2,*AR2++(1)     Result -> p(i)
               NOP
                NOP     *––AR1(IR0) ;; Set AR1 to point to v(0)
                         *––AR1(IR0)      Set AR1 to point to v(0)

Maurizio Palesi                                                               113




C Programming Tips
   After writing your application in C language, debug the
   program and determine whether it runs efficiently
   If the program does not run efficiently
       Use the optimizer with –o2 or –o3 options when compiling
       Use registers to pass parameters (–ms compiling option)
       Use inlining (–x compiling option)
       Remove the –g option when compiling
       Follow some of the efficient code generation tips
            Use register variables for often-used variables
            Precompute subexpressions
            Use *++ to step through arrays
            Use structure assignments to copy blocks of data



Maurizio Palesi                                                               114




                                                                                    57
Use Register Variables
   Exchange one object in memory with another

             register float *src, *dest, temp;
             register float *src, *dest, temp;

             do {
             do {
                  temp = *++src;
                  temp = *++src;
                  *src = *++dest;
                  *src = *++dest;
                  *dest = temp;
                  *dest = temp;
             } while (––n);
             } while (––n);




Maurizio Palesi                                             115




Precompute Subexpression and use *++
main() {
 main() {
      float a[10], b[10];
       float a[10], b[10];
      int i;
       int i;
      for (i = 0; i < 10; ++i)
       for (i = 0; i < 10; ++i)                 19 cycles
            a[i] = (a[i] * 20) + b[i];
             a[i] = (a[i] * 20) + b[i];
}}

main() {
 main() {
      float a[10], b[10];
       float a[10], b[10];
      int i;
       int i;
      register float *p = a, *q = b;
       register float *p = a, *q = b;           12 cycles
      for (i = 0; i < 10; ++i)
       for (i = 0; i < 10; ++i)
             *p++ = (*p * 20) + *q++;
              *p++ = (*p * 20) + *q++;
}}


Maurizio Palesi                                             116




                                                                  58
Structure Assignments
   The compiler generates very efficient code for structure
   assignments
       Nest objects within structures and use simple assignments to copy
       them


                                            struct Pixel {
                                             struct Pixel {
     int x1, y1, c1;
      int x1, y1, c1;                           int x, y, c;
                                                int x, y, c;
     int x2, y2, c2;
      int x2, y2, c2;                       };
                                             };
     x1 = x2;
      x1 = x2;                              struct Pixel p1, p2;
                                             struct Pixel p1, p2;
     y1 = y2;
      y1 = y2;
     c1 = c2;
      c1 = c2;                              p1 = p2;
                                             p1 = p2;




Maurizio Palesi                                                       117




Hints for Assembly Coding
   Use delayed branches
       Delayed branches execute in a single cycle
       Regular branches execute in four cycles
       The next three instructions are executed
       whether the branch is taken or not
            If fewer than three instructions are required, use the
            delayed branch and append NOPs
              – A reduction in machine cycles still occurs




Maurizio Palesi                                                       118




                                                                            59
Hints for Assembly Coding
   Apply the repeat single/block construct
        In this way, loops are achieved with no
       overhead
        Note that using RPTS instruction the executed
       instruction is not refetched for execution
            This frees the buses for operand fetches




Maurizio Palesi                                        119




Hints for Assembly Coding
   Use parallel instructions
   Maximize the use of registers
   Use the cache
   Use internal memory instead of external
   memory
   Avoid pipeline conflicts




Maurizio Palesi                                        120




                                                             60

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:8/16/2011
language:English
pages:60