H-QuAD ALosslessHigh Quality Audio Decoder Detailed Design

Document Sample
H-QuAD ALosslessHigh Quality Audio Decoder Detailed Design Powered By Docstoc
					H-QuAD: A Lossless High Quality Audio Decoder
              Detailed Design

Mark Eaves    Colin Lancaster    Jason Shirtliff   Cole Stewart


                       Group 2008.021
       Department of Electrical & Computer Engineering
                   University of Waterloo
                        Waterloo, ON

                        June 4, 2007
Contents

I     Overall Design                                                                                                                    1

II    Component Design                                                                                                                  3
1 Stream Decoder Design                                                                                                                 3
  1.1 Stream Synchronization . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    3
  1.2 Stream Parsing . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    5
  1.3 Memory Control . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8
      1.3.1 Memory Selection . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8
      1.3.2 Memory Organization . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
      1.3.3 Simultaneous Memory Access .           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
  1.4 Cyclic Redundancy Check (CRC) . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   10
  1.5 Channel Normalization . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
  1.6 Rice Decoder . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
      1.6.1 Rice Decoder State Machine . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
      1.6.2 Unsigned to Signed Conversion          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13

2 Frame Decoder Design                                                                                                                 15
  2.1 LPC Decoder Design . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
      2.1.1 LPC Decoder State Machine .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
  2.2 Fixed Decoder Design . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
      2.2.1 Fixed Decoder State Machine        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19
  2.3 Constant Decoder Design . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
  2.4 Verbatim Decoder Design . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22

3 System Controller Design                                                                                                             24
  3.1 System Controller Hardware Selection and Design                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   24
  3.2 System Controller Software Design . . . . . . . .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   25
      3.2.1 Metadata Decoder Design . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   25
      3.2.2 LCD Controller Design . . . . . . . . . . .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   27
      3.2.3 DAC Controller Design . . . . . . . . . . .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   28
      3.2.4 USB Controller Design . . . . . . . . . . .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   29


III    Work Breakdown                                                                                                                  32

IV     Appendices                                                                                                                      33
A LPC Decoder                                                                                                                          33


                                           i
B Fixed Decoder        37

C Rice Decoder         40




                  ii
List of Tables
  1   Stream Synchronization Comparison . . . . . . . . .           . . . . . . .   . . . . . .    4
  2   UTF-8 Data Encoding . . . . . . . . . . . . . . . . .         . . . . . . .   . . . . . .    6
  3   Finite State Machine for Rice Decoder . . . . . . . .         . . . . . . .   . . . . . .   13
  4   LPC Decoder Performance Metrics . . . . . . . . . .           . . . . . . .   . . . . . .   16
  5   Fixed Decoder Performance Metrics . . . . . . . . . .         . . . . . . .   . . . . . .   18
  6   Finite State Machine for Fixed Decoder . . . . . . . .        . . . . . . .   . . . . . .   19
  7   Metadata Block Types . . . . . . . . . . . . . . . . .        . . . . . . .   . . . . . .   26
  8   Register configuration data for the WM8731L audio              CODEC.(X        denotes a
      reserved bit) . . . . . . . . . . . . . . . . . . . . . . .   . . . . . . .   . . . . . .   29
  9   Group Responsibilities . . . . . . . . . . . . . . . . .      . . . . . . .   . . . . . .   32




                                             iii
List of Figures
  1   System Block Diagram . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    2
  2   Sync Code Detector Schematic Diagram       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    4
  3   Stream Decoder Finite State Machine . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
  4   LPC Data Flow Diagram . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
  5   LPC Decoder State Diagram . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
  6   Fixed Decoder Data Flow Diagrams . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   20
  7   Fixed Decoder State Diagram . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   21




                                         iv
Listings
  1    Channel Coding Inversion Implementation [1] . . . . . . . . . . . .      .   .   .   .   .   12
  2    Unsigned to Signed Algorithm . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   14
  3    Constant Decoder Algorithm . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   22
  4    Verbatim Decoder Algorithm . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   22
  5    C-Style Metadata Decoder Implementation . . . . . . . . . . . . . .      .   .   .   .   .   27
  6    Hitachi HD44780 LCD Interface . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   28
  7    C code for configuring the I2 C Core and WM8731L Audio CODEC              .   .   .   .   .   30
  8    LPC Filter RTL Implementation . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   33
  9    Fixed Decoder RTL Implementation . . . . . . . . . . . . . . . . .       .   .   .   .   .   37
  10   Rice Decoder RTL Implementation . . . . . . . . . . . . . . . . . .      .   .   .   .   .   40




                                            v
                                           Abstract

    Digital music has revolutionized the consumer audio market, as well as the music industry
as a whole. Current popular audio formats offer portability, and convenience of acquisition
and storage due to their small file size. This reduction of file size comes at the cost of degraded
quality due to the use of lossy compression algorithms. This is suboptimal for use with a
high-end audio system. Today, the use of lossless audio compression is now practical due
to advances in storage capacity and the increased availability of high-speed Internet access.
Among the lossless audio formats, the FLAC digital audio format is quite popular and offers
a sufficient compression rate. This project will implement a device containing a hardware
FLAC decoder for use by audiophiles who demand quality, but desire the convenience of
digital audio formats. The decoder will be implemented on an FPGA using VHDL.
Part I

Overall Design
Our project is an end-to-end hardware decoder that decodes and plays FLAC encoded audio
streams, and is implemented on an Altera DE-2 board.
    The system will first read in a FLAC audio file through the board’s USB port by means
of a USB controller. Each frame within the stream can be either a metadata frame or an
audio encoded frame. The system’s stream decoder will distinguish between the two types
of frames, and route them accordingly. A Metadata controller will decode the metadata bit
stream and send the appropriate information to the LCD controller, which will output to
the board’s LCD screen. This information includes song title and artist information.
    For audio frames, the stream decoder must extract key information from the bit stream
in order to determine how to process each of the subframes. The stream decoder must
determine which of the four encoding mechanisms was used when encoding the subframe, as
well as extract all the compressed residual samples, the filter order (if applicable), and the
CRC polynomial. Based on the type of subframe encoding, the stream decoder will route
the frame to the appropriate decoder filter. These decoders will decompress the stream and
output the result to a output buffer. These samples must be read from the buffer and output
to the DAC to be played on the speakers. As such, a DAC controller will be responsible for
setting up the DAC and ensuring the synchronization of the buffering.
    A block diagram of the system is provided in Figure 1. It provides an overall view of
the system architecture of the H-QuAD reference implementation. The interfaces between
functional blocks, as well as initial block feasibility analysis are described in the H-QuAD
block verification document [2].




                                             1
                                     Inputs



                                      Data        Data
                                      Store

                                                                            Control

                                                          USB Controller                           Stream Parser
                                        Control
                                                                            Data
                                                                                                      Rice         Memory
                                                              Data                           CRC                                       Control
                                                                                                     Decoder       Control


                                                         Metadata Decoder
                                                                                               Stream Decoder
                                                                                                        Data

                                                          LCD Controller
                                                                                   Control




2
                                                                                                                                                                         Outputs
                                                          DAC Controller                           Constant Decode



                                                                                                                                                                             L   +
                                                                                                    Fixed Decode
                                                                                                                                                                                 -
                                                           System
                                                          Controller
                                                                                                    LPC Decode
                                                                                                                               Channel           SRAM          Wolfson
                                                                                                                             Normalization       Buffer         DAC




    Figure 1: System Block Diagram
                                                                                                                                                                          R      +

                                                                                                   Verbatim Decode                                                               -




                                                                                               Frame Decoder
                                                                                                                             Control




                                                                                                                                                          16x2 LCD Display
Part II

Component Design
1     Stream Decoder Design
1.1     Stream Synchronization
The FLAC subset specification requires that a decoder be able to synchronize on a frame
boundary. To enable this feature, the format provides a synchronization code at the begin-
ning of each frame. The code is a series of 13 ones followed by three zeros. This value is
expressed as 0xFFF8 in hexadecimal notation. The format guarantees that the beginning
of each new frame will be byte aligned through the addition of padding bits at the end of
previous frames.
    There are two potential ways of implementing stream synchronization:

    1. Serial: The stream decoder reads data one bit at a time. A counter will keep track of
       the number of consecutive 1s detected and reset the count when a 0 is detected. When
       the count reaches 13 and the next three bits are 0s, a sync code has been detected and
       a state transition occurs. This approach has a high latency.

    2. Parallel: The decoder reads data 8/16 bits at a time.

   Reading 16 bits in parallel when searching for sync codes has a certain advantage. The
length of a valid sync code is 16 bits, which means that a valid sync code can fully fit in one
data width read into the decoder.
   The problem with reading data in parallel is that a sync code may begin in one word
transfer, but finish in the next word transfer. The following example illustrates this point.

Example: Data is byte aligned, however not word aligned.


      data[0]:     00101100{11111111} ← end of data


      data[1]:     start of data → {11111000}10101010




                                              3
     The beginning of the sync code starts at the end of data[0], but continues at the start
     of data[1].

   This problem can be solved by setting a flag if the last 8 bits of the previous data
word were 0xFF, and checking if the first 8 bits of the current data word are 0xF8. In this
manner a sync code will be detected when contained within a single data word, or when split
between two consecutive data words. Figure 2 shows a schematic diagram that implements
the approach to sync code detection discussed above.




                    Figure 2: Sync Code Detector Schematic Diagram

   A comparison of the two implementation methods is provided in Table 1. Comparison is
based on complexity, latency, speed, and area.

                      Table 1: Stream Synchronization Comparison

             Category              Serial                 Parallel
             Complexity             Low                    High
              Latency               High             Lower (1+ Cycles)
               Area             Low (counter)             Moderate
               Speed      Fast (simple comparison) Fast (AND gate delays)



                                             4
   The dominant consideration for real-time audio decoding is latency. Excessive delays
between decoding audio frames will cause audio playback to be choppy. For this reason, the
parallel approach will be utilized in our design.


1.2    Stream Parsing
Following the detection of a sync code, the stream decoder will begin extracting the important
information for the frame from the frame header. If the sync code was spread across two data
words, then the stream must be realigned so specific data will remain at the same position
of the input. For example, the frame’s block size immediately follows the end of a sync code
and input should be realigned so that the block size can always be found at data in[0..4].
To accomplish this realignment the stream decoder must register at least 1.5 data words (24
bits) at its input. If the sync code is data word aligned then new input will be registered
in data in[0..15]. If the sync code is not data word aligned then data in[8..15] will
be shifted to data in[0..7] and the new input registered in data in[8..23]. This can be
accomplished by setting a flag during sync code detection to use as a mux select line for
data in.
    Frame information extracted from the stream must be stored so that it can be retrieved
when the frame is ready for playback. This frame information will be buffered for output in
the same way that the frame will be buffered for output.
    The frame header contains a UTF-8 encoded frame number. Because UTF-8 is a variable
length encoding, the stream decoder must determine how many bytes the frame number is
composed of. It should be noted that we are not interested in determining what the frame
number is, but rather how many bytes must be skipped to reach the end of the frame
number. Table 2 shows how UTF-8 encoded characters are formatted. The first byte in
a UTF-8 character can be examined to determine how many bytes are in the character.
Depending on the number of bytes contained in the UTF-8 character, an input realignment
may be required. This realignment will be performed in the same manner described earlier
in this section.
    Following the frame number is the CRC-8 checksum for the frame header. The CRC-8
checksum in the frame header must match the calculated CRC-8 checksum. If an error is
detected then the stream decoder will start searching for sync codes. CRC-8 calculations are
performed as data is read into the stream decoder. For more details on this, see §1.4.
    Subframe header information is contained in the stream after the frame header. Con-
tained within a subframe header is a code which specifies the subframe type. The subframe

                                              5
                               Table 2: UTF-8 Data Encoding

         Bytes     Binary Formatting
           1       0XXXXXXX
           2       110XXXXX 10XXXXXX
           3       1110XXXX 10XXXXXX 10XXXXXX
           4       11110XXX 10XXXXXX 10XXXXXX 10XXXXXX
           5       111110XX 10XXXXXX 10XXXXXX 10XXXXXX 10XXXXXX
           6       1111110X 10XXXXXX 10XXXXXX 10XXXXXX 10XXXXXX 10XXXXXX



type must be determined so that the stream decoder can forward the subframe’s contents
to the appropriate frame decoder. Each type of subframe contains different amounts of
data including initialization data. Data contained within Constant and Verbatim encoded
subframes is forwarded directly to their respective decoder units using a valid bit protocol.
Data contained within LPC and Fixed Order encoded subframes is part initialization data,
and the rest of the data is in the form of residuals. Residual data is transfered serially to
the rice decoder unit, which performs computations on the data and then forwards it to the
correct frame decoder. More information about the format of the various subframe types
can be found in the FLAC format specification [3].
    At the end of a frame (after 1+ subframes) is a CRC-16 checksum. Like the CRC-8
calculations performed on the frame header, CRC-16 calculations are performed as data is
read into the stream decoder and includes all data starting from the sync code to the end of
the frame. If an error is detected, a signal is sent to the memory controller for the output
buffer so the frame can be dropped, and the stream decoder will start searching for sync
codes.
    Figure 3 shows a finite state machine diagram for the stream decoder which summarizes
its operation. An error state exists so that the appropriate signals can be sent to other parts
of the FLAC decoder before starting on the next frame.




                                              6
Figure 3: Stream Decoder Finite State Machine




                     7
1.3     Memory Control
The subset specification limits the size of an individual frame to no more than 4608 sam-
ples [3]. Therefore, the maximum size of one frame is:

                              16 bits
         4608 Samples ∗                    ∗ 2 channels = 147, 456 bits = 18 kB           (1)
                          channel · Sample

which is the minimum size that the buffer memory must be in order to accommodate an
entire frame. However, a memory array must be sized to powers of 2 using the Quartus II
MegaFunction Wizard, so to accommodate one frame, 32 kB of on-chip memory must be
used.
    The decoder will not buffer the input, which means that a frame must be placed in an
output buffer and held until the end of the frame has been processed in order to verify
the CRC checksum. Therefore, in order to prevent playback skipping on the output, it is
necessary to buffer two frames on the output so that an already-verified frame can be output
while the decoder completes the subsequent frame. This requires a total of 64 kB of memory.
    The Cyclone II’s on-chip memory is not large enough to accommodate two entire frames,
as it contains only 59.06 kB of memory. Therefore, it is necessary to utilize off-chip memory
contained on the DE2 Board.

1.3.1   Memory Selection

The DE2 contains three different types of off-chip memory [4]:

  1. 512 kB of SRAM

  2. 8 MB of SDRAM

  3. 4 MB of Flash Memory

All three are of sufficient size to satisfy the desired application, but the easiest to use and
fastest is the SRAM.
    The SRAM is 512 kB in size and is word-addressable, which means that it has an 18-bit
address space and a 16-bit data port. It executes a read or write cycle in a maximum of 10 ns,
which means that its maximum clock period is 100 MHz. For the purpose of the prototype,
this is acceptable because the board’s clock frequency is only 50 MHz. The SRAM is also
single-port.[5]


                                              8
1.3.2   Memory Organization

The first 64 kB of the SRAM will be used; the remaining address lines will be fixed to zero
because they will not be needed. The topmost of the 15 active address lines will select
between the two frame buffers. The remaining lines will index into the memory array.
    During operation, the decoder will write to one of the buffers, while the DAC synchro-
nization unit will read from the other. It is possible that these two memory operations will
coincide with each other in the same clock cycle. Section 1.3.3 addresses this issue. The
decoder will never read from the buffer, and the DAC synchronization unit will never write
to the buffer, so the only possible memory access collision that can occur is the decoder
writing at the same time that the DAC synchronization unit requests a read.
    The DAC synchronization unit will have a state machine that allows it to switch between
the two buffers. When it arrives at the end of the data in the frame buffer it is reading, it
will toggle the upper bit of the address space, switching to the other buffer.
    In order for a frame to be sent to the DAC, it must first be verified by the CRC check
in the frame footer. In order to prevent the DAC synchronization unit from sending data
that has not been verified, or that has failed CRC verification, a single-bit flag will be used
for each buffer that indicates that it contains a valid frame. The stream decoder will set
this bit high once it has completed decoding a frame and verified its checksum. The DAC
synchronization unit will set it low when it has completed sending the entire frame to the
DAC. This bit will also be used by the stream decoder to determine if both buffers are full,
in which case it must wait to continue decoding until the DAC synchronization unit has sent
an entire frame to the DAC.
    The DAC synchronization unit will not begin sending a frame to the DAC unless the
valid bit is high. Therefore, in the case where there is not a valid frame in either buffer,
‘0000000000000000’ will be sent to the DAC repeatedly until a valid frame arrives.
    This buffering scheme will allow seamless transition from frame to frame without silence
in between, except in the case of a CRC-failing frame. However, the specification is that the
decoder is permitted to output blank data if a bad frame is encountered, so this is acceptable
behaviour.

1.3.3   Simultaneous Memory Access

During operation, it may occur that the decoder will produce an output that needs to be writ-
ten to the output buffer during the same cycle that the DAC synchronization unit requests
a read for the next sample to be sent to the DAC. In this case, the DAC synchronization

                                              9
takes precedence, because it must remain in sync and the arrival of the sample cannot be
delayed.
    To solve the problem of this memory access collision, a simple memory controller will be
implemented. This controller will detect the occurrence of a simultaneous write and read.
It will execute the read request in that cycle, and register the write data and address. It
will then perform the write in the subsequent cycle. This is acceptable, because the DAC
synchronization will not be requesting samples any faster than twice the audio sample rate
(two channels), which means that it will never make requests in two consecutive cycles at
system clock speed of 50 MHz.
    If the decoder requests a second write in the following cycle, the first write will be executed
first and the second will be registered and performed in the subsequent cycle. The only time
that this becomes a problem is when there are any consecutive writes following a read-write
collision. The solution to this problem is to ensure that there will never be many consecutive
write requests, which is done by requiring the latency for the decoder to be at least one cycle
between valid outputs. The LPC and fixed decoding blocks will have higher latency than
this, but the constant and verbatim decoding blocks effectively require no latency, so it will
be necessary to design them with one cycle of latency between outputs. This is discussed
further in Sections 2.3 and 2.4.


1.4    Cyclic Redundancy Check (CRC)
Error detection within the FLAC audio stream is handled on a frame by frame basis through
the use of Cyclic Redundancy Check (CRC) checksums. The header of each audio frame is
protected by a CRC-8 checksum, while the frame itself is proteced using a CRC-16 checksum.
By performing an exclusive-OR (XOR) operation on the incoming data with a known value
(the CRC polynomial), errors may be detected if the accumulated output is not zero after
the checksum value has passed through the checking circuit.
    The output of the CRC checking circuit is an input to the Stream Decoder state machine.
If an error is detected, decoding of the frame is abandoned, and the decoder attempts to
synchronize on a new frame sync code.
    The current reference circuit takes in 16 bit inputs and performs the calculation in par-
allel. As the operation is a combinational operation, it is very fast and takes up little area.
The operation is performed on the data as it is received from the USB Controller (on the
fly). This allows the stream decoder to keep a very small input buffer of only a few data
words, instead of buffering an entire frame.

                                               10
    Since CRC is such a common operation in digital electronics, especially in communi-
cations, many optimal reference implementations exist and are freely available. Reference
implementations of both CRC-8 and CRC-16 have been written and verified for functionality.


1.5    Channel Normalization
One of the methods FLAC uses for file compression is interchannel decorrelation [6]. This
process allows for coding smaller residual values by exploiting the high level of cross cor-
relation between the left and right audio channels. The format specifies four schemes with
which any frame may be coded. These are:

  1. Independent: Independently coded channels do not employ interchannel decorrela-
     tion. They do not require any extra computation by the decoder.

  2. Mid-side: Mid-side coding transmits the mean of the stereo signal in one subframe,
     and the difference between the channels (the error) in the other subframe. For highly
     correlated signals, the residual error signal will be significantly smaller, improving
     compression.

  3. Left-side: In this scheme, the left audio channel is independently coded, and the right
     channel is transmitted as a difference from the left channel.

  4. Right-side: This is the opposite of Left-side coding. The right channel is transmitted
     as is, and the left channel is coded as a difference from the right channel.

     If the FLAC audio frame has been coded with a method other than Independent coding,
it is necessary for the decoder to invert the encoders operation and restore the left and right
audio channels. This operation must take place after the normal frame decoding operations
(see §2) have been completed. This fact requires the decoder to buffer the output for each
decoded subframe to allow for the inversion of the decorrelation operation. The following
scheme is proposed to allow for this:
     The first subframe of every frame will have its decoded outputs placed in a temporary
buffer composed of on chip memory (utilizing M4K blocks on the FPGA). When the second
subframe is being decoded, the proper inversion operation will be performed using the freshly
calculated sample, and the corresponding sample from the first subframe. Following this,
both samples will be interleaved and placed in an output buffer implemented in SRAM. The
structure of this output buffer is described in §1.3.2.

                                              11
    The logic for inverting the encoder’s decorrelation operation is provided in Listing 1.
The logic for the code was obtained from the libFLAC stream decoder reference decoder
implementation [1]. Based on the value of the frame channel assignment stored by the
stream decoder, the decorrelation state machine will perform a different operation on the
decoder output samples. In all cases, these operations are simple additions, subtractions and
in the case of mid-side coding, multiplication and division by 2 (which will be implemented
as a wired shift).

                           Listing 1: Channel Coding Inversion Implementation [1]
  /∗ Undo any s p e c i a l c h a n n e l c o d i n g ∗/
  switch ( c h a n n e l a s s i g n m e n t ) {
      case INDEPENDENT:
                      /∗ do n o t h i n g ∗/
       case LEFT SIDE :
                      f o r ( i = 0 ; i < b l o c k s i z e ; i ++)
                                     output [ 1 ] [ i ] = o ut p u t [ 0 ] [ i ] − o u tp u t [ 1 ] [ i ] ;
       case RIGHT SIDE :
                      f o r ( i = 0 ; i < b l o c k s i z e ; i ++)
                                     output [ 0 ] [ i ] += o u t p ut [ 1 ] [ i ] ;
       case MID SIDE :
                      f o r ( i = 0 ; i < b l o c k s i z e ; i ++) {
                                     mid = output [ 0 ] [ i ] ;
                                      s i d e = output [ 1 ] [ i ] ;
                                     mid <<= 1 ;
                                      i f ( s i d e & 1 ) /∗ i . e . i f \ ’ s i d e \ ’ i s odd . . . ∗/
                                                    mid++;
                                      l e f t = mid + s i d e ;
                                      r i g h t = mid − s i d e ;
                                     output [ 0 ] [ i ] = l e f t >> 1 ;
                                     output [ 1 ] [ i ] = r i g h t >> 1 ;
                      }
  }




1.6       Rice Decoder
In order to achieve its high compression ratios, the FLAC format has built in support for
multiple residual coding schemes. The only scheme currently implemented by version 1.4 of
the specification is a form of run-length encoding known as partitioned Rice encoding [7].
Rice coding is a computationally efficient form of the more generalized entropy encoding
scheme known as Golomb Coding [8]. The general idea behind Golomb coding is to assign
the shortest code word to the smallest number, based on the assumption that the probability
of transmitting small values is much higher than the probability of transmitting large values.
Since FLAC uses this coding scheme to encode the residual (error) signal, the difference
between the actual input and the model of the signal generated by the linear predictor (see
§2.1 and §2.2), the assumption of small valued signals having high probability is valid.

                                                                      12
    The Golomb code for a given number is calculated through an operation similar to long
division. The quotient is represented by a string of 0’s of length equal to the quotient.
The remainder is appended to this value, unencoded, separated by a stop bit of 1. Rice’s
extension to Golomb’s coding scheme is computationally efficient because it limits the value
of the divisor to a power of two. This makes it ideal for implementation using a computer
or digital hardware.

1.6.1   Rice Decoder State Machine

Based on the format of a Rice encoded number, there are two logically separate operations
that must be performed during the decoding process. These are:

  1. Reading and decoding the quotient, and

  2. Reading the remainder and restoring the value

    This lends itself to implementation via a state machine with three states (including an
idle state). These states are shown in Table 3. The Rice decoder will read bits in serially
and provide a parallel output that is 16 bits wide (to account for pathological cases of
incompressible residual samples). The implementation of the Rice decoder can be found in
Appendix C.

                      Table 3: Finite State Machine for Rice Decoder

              State                       Description
               S0               Idle State - Waiting for i valid
               S1            Decode Unary Quotient (Count Zeros)
               S2     Read binary remainder and perform decode operation




1.6.2   Unsigned to Signed Conversion

Rice encoding is optimized for unsigned numbers in two ways due to the ‘two’s compliment’
representation of signed binary numbers in computer systems. In this system, the value ‘-0’
would be assigned the second shortest code word, even though the number does not exist
in real samples. Additionally, small negative deviations from the predicted value would be
encoded using some of the largest code words in the scheme, which is obviously undesirable.


                                            13
Since the error signal is a signed number representing the deviation from the statistical model
of the signal, an alternative representation for small signed numbers is required to optimize
the use of the code domain for the given application. To do this, FLAC converts the signed
values to unsigned values using the following mapping:
                                                
                                                2|x|     = 2x, x ≥ 0,
                                            x =                                                            (2)
                                                2|x| + 1 = −2x + 1, x < 0

     The decoding operation of the mapping written in C is presented in Listing 2.

                                       Listing 2: Unsigned to Signed Algorithm
uval = unsigned value ;
i f ( u v a l & 1 ) //The number i s odd
    ∗ v a l = −(( i n t ) ( u v a l >> 1 ) ) − 1 ; // D i v i d e by 2 , n e g a t e , s u b t r a c t 1
e l s e //The number i s even
    ∗ v a l = ( i n t ) ( u v a l >> 1 ) ;  // D i v i d e by 2




                                                                     14
2     Frame Decoder Design
The Frame Decoder functional block consists of four sub-blocks which implement the four
types of decoding in the FLAC subset specification. These decoding methodologies are:

    1. Verbatim Decoding

    2. Constant Decoding

    3. Fixed Order Decoding

    4. LPC Decoding

    Verbatim samples do not require mathematical decoding operations since they are uncom-
pressed. Constant decoding is also a simple process since a constant encoded frame consists
of a block of samples of the same value. The LPC and Fixed Order decoding contain complex
mathematical computations. Initially, a certain number of uncompressed warm-up samples
as well as a compressed residual sample are used by each decoder in order to begin decoding
the first frame. Once the first output sample is computed, it is fed back to be used for the
next computation by replacing one of the warm-up samples. Sections 2.1 and 2.2 discuss the
design of each of these two decoders in greater detail.


2.1     LPC Decoder Design
Restoration of a signal that has been encoded using an n-th order linear predictor involves
the implementation of an n-th order finite impulse response (FIR) digital filter. Such a filter
may be implemented in hardware using a series of multipliers and accumulators. The FLAC
subset specification requires that a decoder must be able to restore frames encoded with a
12th order linear predictor [3].
    The 12th order digital filter has been implemented using a parallel, non-pipelined ar-
chitecture. Pipelining was avoided due to area constraints of the implementation technol-
ogy, namely the number of available hardware multipliers. Additionally, the speed metrics
achieved during prototyping were deemed acceptable. These achieved metrics are shown in
Table 4.
    The structural implementation of the LPC filter is shown in Figure 4. The design uses
12 of the available 35 hardware multipliers. This is currently the only portion of the de-
sign that has required multiplications by two variable arguments, and therefore is the only
component requiring dedicated multiplication hardware. In all other cases, multiplication

                                            15
                                                       Table 4: LPC Decoder Performance Metrics

                                                                     Metric    Prototype Value
                                                                      Speed        54 MHz
                                                                       Area        888 LEs
                                                                    Throughput        1



by a constant can be implemented using additions and logical shifts, which are efficient both
computationally and with respect to area. The filters accumulation function is implemented
using an adder tree. This structure was chosen to balance the tradeoff between area and
register-to-register delay. When compared with a cascaded structure, the delay is reduced
by almost a factor of 3. Other components used within the LPC decoder are two shift reg-
isters to hold the delayed samples required for the filter calculation, and a set of registers
to hold the filter coefficients. There is also a variable shift that is determined by the LPC
quantization resolution. This resolution is an input that is decoded by the stream decoder.

        -1               -2               -3               -4                        -5                    -6                -7               -8               -9               -10               -11                -12            data[]



                 0                1                2                3                         4                     5                 6                7                8                 9                 10                 11   qlp_coeff[]




             *                *                *                *                         *                     *                 *                *                *                 *                 *                  *


                     +                                 +                                               +                                  +                                 +                                    +


                                      +                                                                                 +                                                                     +


                                                                                                                                                           +

                                                                        i_lpcshift
                                                                                                   +


                                                            residual                          >>
                                                               [0]


                                                                                          +


                                                                                          D




                                                                Figure 4: LPC Data Flow Diagram



2.1.1        LPC Decoder State Machine

The state machine for the LPC decoder is shown in Figure 5. The state machine was designed
to follow the Mealy model for finite state machines. That is, the next state of the system is
a function of the current state and the inputs to the system. There are three states that the


                                                                                                                        16
decoder can be in. These are:

S0: In S0, the idle state, the decoder waits for the i valid signal to go high, and clocks in
     the filter coefficient and warmup sample data on its respective input ports.

S1: In S1, the decoder loads up all warmup samples and filter coefficients, based on the
     order of the filter that is required. This is obtained from the i order input.

S2: In S2, the decoder decompresses the data obtained from its i residual port using
     the previous decoded samples. It remains in this state until it is reset by the stream
     decoder.


                                   /i_reset = '1'
 /i_valid = '0'
                                                                         /order < 1



                                  /i_valid = '1'
                      S0                                            S1




                                /i_reset = '1'
                                                                    S2




                                                                  /i_valid = '1'

                          Figure 5: LPC Decoder State Diagram

   One change is planed from the current LPC decoder prototype state machine. Currently,
both warmup samples and filter coefficients are loaded simultaneously, however, this would
require the stream decoder to buffer samples and coefficients before initializing the filter,
which adds complexity. The planned modification is to make the loading of coefficients and

                                             17
warmup samples independent of each other. This would potentially add a fourth state, but
will be a simple modification from the current prototype.
    The RTL code used to implement the filter is presented in Appendix A.


2.2    Fixed Decoder Design
The Fixed decoder will restore a signal using a fixed order FIR digital filter, with predictor
order between 1 and 4. This filter can be implemented in hardware using a series of shift
registers, adders and subtractors to compute the appropriate mathematical operations to
decode the data frames.
    Since the incoming warm-up samples and residual samples are 16-bit numbers, the de-
coder will be implemented using a parallel architecture in order to speed up processing.
Additionally, since area is a major design constraint, the decoder will be implemented using
a non-pipelined architecture in order to use less registers and re-use components.
    The algorithms used for each of the different fixed predictor orders have been translated
into the data flow diagrams in Figure 6. Figure 6.a corresponds to order one, Figure 6.b
corresponds to order two, and so on.
    The number of warm-up samples that are loaded into the decoder (through the i data
port) will correspond to the predictor order value. Thus, a minimum of one warm-up sample
will be loaded, up to a maximum of four warm-up samples. As these samples are received,
they are shifted through the four shift registers. These shift registers will serve as our
mechanism for delaying a sample. Once these samples have been received, the decoder will
load each residual signal (through the i residual port) and execute the algorithm. The
resulting signal will be output (through the o data port) and will also be fed back into the
first shift register, while all other samples are shifted over by one.
    The performance metrics for the fixed decoder prototype are summarized in Table 5.

                       Table 5: Fixed Decoder Performance Metrics

                              Metric    Prototype Value
                               Speed        100 MHz
                                Area        404 LEs
                             Throughput       1/2




                                            18
2.2.1   Fixed Decoder State Machine

Based on the above analysis, the decoder can be in one of three different states. These three
states are described in Table 6 and depicted in Figure 7.

                     Table 6: Finite State Machine for Fixed Decoder

                 State                   Description
                  S0             Waiting for warm-up samples
                  S1             Waiting for residual samples
                  S2     Computing the appropriate fixed order algorithm


    States S0 and S1 are idle states in which the decoder is waiting for either a warm-
up sample or a residual sample. The distinction between the states is necessary since the
warm-up samples must be loaded into the shift registers, while the residual samples must
be loaded into a separate register and computed with the pre-loaded warm-up samples.
Therefore, depending on which of these two states the decoder is in, incoming data will be
place into separate locations for future use. State S2 is the state in which the algorithm is
computed, since the decoder has all the necessary information (all warm-up samples and a
residual sample) it requires for this computation.




                                             19
Figure 6: Fixed Decoder Data Flow Diagrams




                   20
                 /i_reset = '1'




      S0                                           S1



/i_reset = '1'


                            S2


           Figure 7: Fixed Decoder State Diagram




                            21
2.3       Constant Decoder Design
Constant encoding is used when an entire blocksize consists of the same value. This means
that all samples within the subframe are a constant value. As a result, the subframe contains
only one sample which will be output n times, where n corresponds to the frame’s blocksize.
Since the subset specification requires frames to have a sample rate of 16 bits per sample,
all constant subframes will be exactly 16 bits [3].
    Some C code is provided in Listing 3 for the constant decoder implementation.

                                     Listing 3: Constant Decoder Algorithm
      f o r ( i = 0 ; i < frame . h e a d e r . b l o c k s i z e ; i ++) {
              output [ i ] = x ; // where x i s a 16− b i t sample
      }



    As discussed in Section 1.3, the constant decoder cannot output data at each clock cycle.
In order to avoid consecutive memory writes, the constant decoder will have a single cycle
of latency between consecutive outputs.


2.4       Verbatim Decoder Design
Verbatim signals have zero compression, and therefore a verbatim signal is the same as
the raw signal. The verbatim decoder needs only to output each of the signal’s samples.
Therefore the number of bits contained in a verbatim subframe will equal the frame’s bits
per sample multiplied by the corresponding frame’s blocksize. The subset specification limits
the size of an individual frame to no more than 4608 samples and to a sample rate of 16 bits
per sample [3]. Therefore, the maximum size of a verbatim subframe is:

                                                     16 bits
                             4608 Samples ∗                  = 73728 bits = 9.216 kB      (3)
                                                     Sample
   Some C code is provided in Listing 4 for the verbatim decoder implementation.

                                     Listing 4: Verbatim Decoder Algorithm
      f o r ( i = 0 ; i < frame . h e a d e r . b l o c k s i z e ; i ++) {
              x = getNextSample ( ) ;
              residual [ i ] = x;       // where x i s a 16− b i t sample
      }



    The decoder will read the samples, 16-bits at a time, for up to 4608 iterations. As
discussed in Section 1.3, the verbatim decoder cannot output data at each clock cycle. In

                                                               22
order to avoid consecutive memory writes, the verbatim decoder will have a single cycle of
latency between consecutive outputs.




                                           23
3     System Controller Design
The system must have an intelligent controller that is responsible for configuration of pe-
ripherals, including the USB controller and the audio DAC. The controller will also be
coordinating the transfer of data from the filesystem on the USB mass storage device to the
decoder. The decoder will be acting as master of this relationship, meaning it will use a
flag to indicate to the system controller that it is ready for the next data. In the process of
controlling this data flow, the system controller will have software that will decode metadata
in the FLAC files for display on the DE2’s LCD display.


3.1    System Controller Hardware Selection and Design
The system controller will be implemented on the same Cyclone II FPGA as the decoder.
Altera’s Nios II soft-core processor is well suited for building the system controller, as its
hardware is configurable, so that interfaces to all the peripheral hardware will be customized
to suit the purpose of the system.
    The Nios II will need a 16-bit wide parallel output port to transfer data to the decoder.
It will also require a single-bit input pin with a rising-edge triggered interrupt so that the
decoder can indicate that it is ready for more data.
    In order to drive the LCD display, the Nios II will have a 16x2 character LCD display
controller built into it. This hardware interface is included in the SOPC Builder, the tool
used to configure the Nios II hardware. A software API is also included.
    The system controller must have four single-bit parallel input ports with interrupts en-
abled on the rising edge. These ports will be connected to the pushbuttons on the DE2
for playback control. One button will be used for play, one for stop, and one each for skip
forward and backward.
    In order to communicate with and configure the Wolfson WM8731L audio codec, the
Nios II processor must have an I2 C master core attached to it. As outlined in Section 3.1.3.3
of [2], there exists an I2 C core on OpenCores.org that is compatible with the Nios II. Software
to use the I2 C core to communicate with the WM8731L has been written and is presented
in Section 3.2.3.
    The Nios II will also have an interface for the Philips ISP1362 USB controller chip. Altera
provides the hardware in a demo project that accompanies the DE2. Altera also provides
example software that is used to communicate with the ISP1362. This software will be
modified to work with the USB Mass Storage subtype.


                                              24
    The Nios II requires system memory, so the soft-core will include a SDRAM controller
so that it can make use of the 8 MB of SDRAM on the board. This will allow it to buffer
data from the USB Mass Storage device to send to the decoder.
    The Nios II will be implemented without hardware multipliers, to preserve the multiplier
units on the Cyclone II for use by the LPC frame decoder. This will slow operation of
the Nios II somewhat, but the necessary software should not require many multiplication
operations.
    For testing and debugging purposes, the Nios II will have parallel input/output ports
connected to the LEDs on the DE2. These can be used when debugging to indicate that
operations are completing appropriately without sending a string to the console. This is done
because the JTAG UART that is used to send strings to the console consumes processor time
in doing so, and has been known to cause unpredictable behaviour regarding interrupts.


3.2     System Controller Software Design
The system controller must run software that manages the operation of the decoder, the
USB Mass Storage device, and the audio DAC. It must also decode the metadata contained
at the beginning of files for the purpose of displaying artist and track information on the
DE2’s LCD display.

3.2.1   Metadata Decoder Design

The metadata decoder is a software function responsible for decoding FLAC metadata
frames. These are frames that are not essential for audio playback as they do not con-
tain any encoded audio subframes. Instead, metadata frames contain information such as
artist name, song title, and album art. Table 7 lists all the different types of metadata blocks
that exist [3].
    For this implementation we will only be using decoded metadata information to display
the stream’s song title and artist information on the board’s LCD display. As such, the
VORBIS COMMENT is the only metadata block type that we are interested in; the decoder will
skip all other blocks. There is only a single VORBIS COMMENT block per audio stream, which
consists of name/value pairs encoded in UTF-8.
    The first metadata block is a STREAMINFO block which immediately follows the 32-bit
FLAC stream marker at the start of each new stream. This STREAMINFO block is skipped
because the relevant information for this implementation that is found in this block is also


                                              25
                              Table 7: Metadata Block Types

                   Metadata Block Type             Number of Bits
                      STREAMINFO                          272
                        PADDING                   integer multiple of 8
                      APPLICATION              32 + integer multiple of 8
                       SEEKTABLE                          144
                    VORBIS COMMENT                      variable
                       CUESHEET                         variable
                        PICTURE                         variable



found in all audio frame blocks.
    Zero or more metadata blocks follow the STREAMINFO block. The metadata decoder
will be required to identify each block, skip the appropriate number of bits for all non-
VORBIS COMMENT blocks, extract the information from the VORBIS COMMENT blocks and pass
this information on to the LCD controller. The amount of bits found in each type of metadata
block is listed in Table 7.
    A VORBIS COMMENT header block contains 32-bit length, little-endian encoded user com-
ments. The pseudocode for decoding this block is given below as per the Ogg Vorbis I
Specification [9].

  1. [vendor_length] = read an unsigned integer of 32 bits
  2. [vendor_string] = read a UTF-8 vector as [vendor_length] octets
  3. [user_comment_list_length] = read an unsigned integer of 32 bits
  4. iterate [user_comment_list_length] times {
       5. [length] = read an unsigned integer of 32 bits
       6. this iteration’s user comment = read a UTF-8 vector as [length] octets
     }
  7. [framing_bit] = read a single bit as boolean
  8. if ( [framing_bit] unset or end of packet ) then ERROR
  9. done

   Steps 1 and 2 do not provide any useful data for this implementation as it involves reading
vendor information. In step 3, we determine the total number of user comments by reading
an unsigned 32-bit value. We then read in the number of bits associated with the next user
comment, which is represented as an unsigned 32-bit number (step 5), then read this user

                                             26
comment (step 6). We keep iterating through i times, where i is the total number of user
comments. Once we have finished iterating and obtained all the user comments, we do a
quick error check by ensuring that the remaining bit is set.
   The pseudocode below depicts how a user comment would appear in C [9]

  comment[0]="ARTIST=me";
  comment[1]="TITLE=the sound of Vorbis";

   There are other standard fields that may be included within the block, however the above
two fields are the only ones of interest for this implementation.
   Finally, the last task in decoding the metadata will be to decode the UTF-8 encoded user
comment and then pass it to the LCD controller. A C-style implementation of the metadata
decoding algorithm is presented in Listing 5.

                              Listing 5: C-Style Metadata Decoder Implementation
  vendor length = read unsigned int32 ( ) ;
  v e n d o r s t r i n g = r e a d U T F 8 v e c t o r ( v e n d o r l e n g t h ∗ 8 ) ; /∗ r e a d v e n d o r l e n g t h # o f o c t e t s ∗/
  user comment list length = read unsigned int32 ( ) ;
  f o r ( i = 0 ; i > u s e r c o m m e n t l i s t l e n g t h ; i ++){
      length = read unsigned int32 ( ) ;
      user comment [ i ] = r e a d U T F 8 v e c t o r ( l e n g t h ∗ 8 ) ;
  }
  f r a m i n g b i t = (BOOL) r e a d b i t ( ) ;
  i f ( f r a m i n g b i t == 0 | | e n d o f p a c k e t ( ) )
      return ERROR;
  return DONE;




3.2.2       LCD Controller Design

The LCD Controller functional block allows the reference decoder implementation to in-
terface with the on board Hitachi HD44780 LCD display [10]. The control logic will be
implemented in software on the system controller, and will prepare character data for the
screen.
    The VHDL interface to the LCD panel is presented in Listing 6.
    Altera provides a simple API for writing character arrays to the display, in addition to
such tasks as clearing the display and setting the cursor position. This API is described
in greater detail in [11]. Primary functionality for the menu system and interface will be
implemented using the provided API.
    To enable to display of artist information, a pointer to the current memory location of
the the decoded metadata will be passed to the LCD controller from the metadata decoder.

                                                                          27
                              Listing 6: Hitachi HD44780 LCD Interface
−− LCD Module (16 x 2) −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

LCD DATA   :   INOUT s t d l o g i c v e c t o r ( 7 D W T 0 ) ; −− LCD Data Bus 8 B i t s
                                                      O NO
LCD ON     :   OUT s t d l o g i c ;                  −− LCD Power On / O f f
LCD BLON   :   OUT s t d l o g i c ;                  −− LCD Back L i g h t On / O f f
LCD RW     :   OUT s t d l o g i c ;                  −− LCD Read / Write S e l e c t , 0 = Write , 1 = Read
LCD EN     :   OUT s t d l o g i c ;                  −− LCD Enable
LCD RS     :   OUT s t d l o g i c ;                  −− LCD Command / Data S e l e c t , 0 = Command, 1 = Data




This simple form of message passing via function call will be sufficient for the needs of the
system reference implementation.

3.2.3   DAC Controller Design

The Nios II processor will be responsible for configuring the Wolfson WM8731L audio chip
with settings appropriate to the audio stream being played. The WM8731L is designed to
communicate via the I2 C bus for configuration. Therefore, it is necessary to include an I2 C
core on the Nios II processor’s bus. OpenCores.org provides an I2 C master core that is freely
available under the GPL [12].
    The I2 C core must be configured with the bus frequency and then enabled. To do this,
two registers on the core must be written; the period register and the control register. The
period register contains a prescale that is used to divide the system clock to achieve the I2 C
internal clock, which is five times the I2 C bus clock. The desired bus clock is 100 kHz, and
the system clock is 50 MHz. The prescale is the system clock divided by five times the bus
clock minus 1. Therefore,

                                        50 M Hz
                               prescale =            − 1 = 99 = 0x63                      (4)
                                      5 ∗ 100 kHz
The I2 C core’s control register must have its master and interrupt enable bits high, and the
reserved bit must be set low. A listing of sample C code that will perform this function on
the Nios II is found in Listing 7.
    Once the I2 C master has been initialized, it can communicate with the audio DAC. There
are several registers that need to be configured in order for the DAC to properly play the
PCM audio output that the decoder will generate. These pertain to data synchronization,
sample rates and bit rates, and power management. A summary of these registers and their
required values is found in Table 8. These values are for the decoders nominal operating
conditions of 16 bits/sample at a sampling rate of 44.1 kHz, and were obtained from the
Wolfson datasheet [13]. These settings are the same as those for CD quality audio. Of the

                                                        28
registers described, the two most important registers are the Digital Audio Interface Format
Register (R7) and the Sampling Control Register (R8).
    The Digital Audio Interface Format Register controls the audio input synchronization
format, the sample precision, and the input clock phasing. It is here that we must select the
16 bits per sample setting by writing ‘00’ to bits [3:2] of the register.
    The Sampling Control Register is used to select what mode the input clock is operating
in, as well as the sampling rate and clock division to generate the correct sample clocks.
Two options exist for the clock input; the WM8731L may operate in USB Mode or Normal
Mode. USB Mode is used when it is necessary to switch sample rates without changing the
master clock frequency. The master clock must operate at 12.0 MHz in this mode. Normal
Mode is used when there will only be one sample rate used, and the input clock frequency
is dependent on the sample rate being used. Since sampling rates may change between files,
we intend to have the DAC controller configure the DAC in USB mode. A listing of C code
used to configure the WM8731L can be found in Listing 7.

Table 8: Register configuration data for the WM8731L audio CODEC.(X denotes a reserved
bit)
         Register                          Address (Binary) Data (Binary)
         Left Line In                          0000000        01XX10111
         Right Line In                         0000001        01XX10111
         Left Headphone Out                    0000010        001111001
         Right Headphone Out                   0000011        001111001
         Analog Audio Path Control             0000100        X00010010
         Digital Audio Path Control            0000101        XXXX00000
         Power Down Control                    0000110        X00000000
         Digital Audio Interface Format        0000111        X00000001
         Sampling Control                      0001000        X00100011
         Active Control                        0001001        XXXXXXXX1
         Reset                                 0001111        XXXXXXXXX



3.2.4   USB Controller Design

The system controller must interface with the Philips ISP1362 USB controller device. It
must configure the ISP1362 to function as a USB Host, and it must be designed to transfer
data to and from the ISP1362 for sending to a USB Mass Storage Device that is connected
as a USB Slave on the ISP1362’s host port.


                                             29
          Listing 7: C code for configuring the I2 C Core and WM8731L Audio CODEC
void I 2 C I n i t ( unsigned i n t P e r i o d )
{
  I2C Ctrl Reg a ;
  I2C Write Period ( Period ) ;
  a . I 2 C C t r l F l a g s .CORE ENABLE=1;
  a . I 2 C C t r l F l a g s . INT ENABLE=1;
  a . I 2 C C t r l F l a g s .RESERVED=0;
  I 2 C W r i t e C t r l ( a . Value ) ;
}

void i n i t a u d i o c o d e c ( )
{
  I 2 C I n i t ( 1 0 0 0 0 0 0 0 / I2C FREQ−1);

    // Check Audio CODEC on I2C Bus , Address = 0 x34
    i f ( I2C Send ( 0 x34 , 1 , 0 ) )
    {
            i n t count = 0 ;
                    p r i n t f ( ” \ nFind Audio CODEC on I2C Bus , Address = 0 x34 . \ n” ) ;
            count += I2C Send ( 0 x08 , 0 , 0 ) ; // Analog Audio Path C o n t r o l MSB
                   count += I2C Send ( 0 x12 , 0 , 1 ) ; // Analog Audio Path C o n t r o l LSB
                   I2C Send ( 0 x34 , 1 , 0 ) ;
            count += I2C Send ( 0 x0A , 0 , 0 ) ; // D i g i t a l Audio Path C o n t r o l MSB
            count += I2C Send ( 0 x00 , 0 , 1 ) ; // D i g i t a l Audio Path C o n t r o l LSB
            I2C Send ( 0 x34 , 1 , 0 ) ;
                   count += I2C Send ( 0 x0C , 0 , 0 ) ; // Power Down C o n t r o l MSB
                   count += I2C Send ( 0 x00 , 0 , 1 ) ; // Power Down C o n t r o l LSB
                   I2C Send ( 0 x34 , 1 , 0 ) ;
            count += I2C Send ( 0 x0E , 0 , 0 ) ; // D i g i t a l Audio I n t e r f a c e Format MSB
            count += I2C Send ( 0 x10 , 0 , 1 ) ; // D i g i t a l Audio I n t e r f a c e Format LSB
            I2C Send ( 0 x34 , 1 , 0 ) ;
                   count += I2C Send ( 0 x10 , 0 , 0 ) ; // Sampling C o n t r o l R e g i s t e r MSB
                   count += I2C Send ( 0 x02 , 0 , 1 ) ; // Sampling C o n t r o l R e g i s t e r LSB
                   I2C Send ( 0 x34 , 1 , 0 ) ;
            count += I2C Send ( 0 x12 , 0 , 0 ) ; // A c t i v e C o n t r o l R e g i s t e r MSB
            count += I2C Send ( 0 x01 , 0 , 1 ) ; // A c t i v e C o n t r o l R e g i s t e r LSB
    }
    else
            p r i n t f ( ” \nCan ’ t Find Audio CODEC on I2C Bus . \ n” ) ;
}




    In order to receive data from the USB Mass Storage Device, the system controller must
issue SCSI commands requesting the data as necessary. These commands must be packaged
within Command Block Wrappers (CBW) [14], which are then placed in Philips Transfer
Descriptors (PTD) [15]. This packaging of data lends itself to a layered protocol stack
architecture.
    The PTDs are written to the ATL buffer on the ISP1362, and the ISP1362 is notified
that a payload is ready to be sent. The ISP1362 then processes the PTDs and sends the
CBW to the appropriate device [15]. When the Mass Storage device receives the data request
command, it initiates a bulk transfer containing the data [14]. When the ISP1362 receives
this bulk transfer, it notifies the system controller and buffers the data in the ATL buffer.


                                                              30
The system controller then reads the data packet from the ISP1362’s ATL buffer using
indirect addressing (relative to the beginning of the ATL buffer). This data is the file system
data that was requested with the SCSI command.
    The system controller must also have software for properly reading the FAT32 file system.
This is the file system used on larger-sized (1 GB or larger) USB Mass Storage devices. The
controller’s software will read the file allocation table and start with the first file on the
drive. When the play button is pressed on the DE2, the first file will be read in to the Nios
II’s system memory, block by block, and sent to the decoder as it is needed.
    The system controller will buffer one logical block ahead of what is being decoded to
ensure that it is always able to provide the decoder with data on request. This is acceptable,
because the SDRAM is 8 MB large and this buffering will only require 1 MB, leaving 7 MB
for system variables and filesystem information.
    The skip forward and backward buttons will fire interrupts that cause the decoder to
discontinue decoding the current file and start the next or previous one in the file table.




                                             31
Part III

Work Breakdown
This section outlines the core responsibilities of each group member, as described in Table 9.

                              Table 9: Group Responsibilities

                     Component and Duties           Group Member(s)
                      Stream Synchronization         Colin Lancaster
                           Stream Parsing              Mark Eaves
                          Memory Control              Jason Shirtliff
                 Inverse Interchannel Decorrelation    Cole Stewart
                            Rice Decoder              Jason Shirtliff
                            LPC Decoder                Cole Stewart
                           Fixed Decoder               Mark Eaves
                         Constant Decoder             Jason Shirtliff
                         Verbatim Decoder            Colin Lancaster
                          USB File Reader             Jason Shirtliff
                         Metadata Decoder              Mark Eaves
                           LCD Controller              Cole Stewart
                           DAC Controller            Colin Lancaster




                                             32
Part IV
Appendices
A          LPC Decoder

                                         Listing 8: LPC Filter RTL Implementation
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−− T i t l e             : LPC
−− P r o j e c t         :
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−− F i l e               : l p c . vhd
−− Author                : Cole Matthew Evan CME S t e w a r t <cmestewa@skynet . u w a t e r l o o . ca>
−− Company               :
−− Created               : 2007−04−30
−− L a s t u p d a t e : 2007/05/25
−− P l a t f o r m       :
−− Standard              : VHDL’ 8 7
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−− D e s c r i p t i o n : LPC b e h a v i o r a l d e s c r i p t i o n
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−− C o p y r i g h t ( c ) 2007
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−− R e v i s i o n s     :
−− Date                    Version Author D e s c r i p t i o n
−− 2007−04−30 1 . 0                    cmestewa Cre a t ed
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

library i e e e ;
use i e e e . s t d l o g i c 1 1 6 4 . a l l ;
use i e e e . n u m e r i c s t d . a l l ;

entity l p c m u l t i s

   port (
     i a           : in  s t d l o g i c v e c t o r ( 1 5 downto 0 ) ;     −− d a t a
     i b           : in  s t d l o g i c v e c t o r ( 1 3 downto 0 ) ;     −− c o e f f
     o r e s u l t : out s t d l o g i c v e c t o r ( 2 9 downto 0 ) ) ;   −− o u t p u t

end l p c m u l t ;

architecture main of l p c m u l t i s
s i g n a l r e s u l t : s i g n e d ( 2 9 downto 0 ) ;

begin −− main
  r e s u l t <= s i g n e d ( i a ) ∗ s i g n e d ( i b ) ;
  o r e s u l t <= s t d l o g i c v e c t o r ( r e s u l t ) ;
end main ;




library i e e e ;
use i e e e . s t d l o g i c 1 1 6 4 . a l l ;
use i e e e . n u m e r i c s t d . a l l ;

entity l p c i s

   port (
     i d a t a , i r e s i d u a l : in s i g n e d ( 1 5 downto 0 ) ;                       −− d a t a i n p u t


                                                                   33
    i c o e f f : in s i g n e d ( 1 3 downto 0 ) ;
    i c l o c k , i r e s e t : in s t d l o g i c ;    −− c l o c k and r e s e t s i g n a l s
    i v a l i d : in s t d l o g i c ;                   −− c o n t r o l f o r SR’ s
    i l p c s h i f t : in s i g n e d ( 4 downto 0 ) ;
    i o r d e r : in u n s i g n e d ( 3 downto 0 ) ;
    o d a t a : out s i g n e d ( 1 5 downto 0 ) ) ;
end l p c ;

architecture main of l p c i s
  component mult18
  port (
    i a , i b : in s i g n e d ( 1 7 downto 0 ) ;
    o r e s u l t : out s i g n e d ( 3 5 downto 0 ) ) ;
  end component ;

   type a r r a y 1 6 i s array ( 1 1 downto 0 ) of s i g n e d ( 1 5 downto 0 ) ;
   type a r r a y 1 4 i s array ( 1 1 downto 0 ) of s i g n e d ( 1 3 downto 0 ) ;
   type a r r a y 3 0 i s array ( 1 1 downto 0 ) of s t d l o g i c v e c t o r ( 2 9 downto 0 ) ;

     signal    x : array16 ;
     signal    c o e f f : array14 ;
     signal    mult out : array30 ;
     signal    o u t d a t a : s i g n e d ( 1 5 downto 0 ) ;
     signal    r 0 0 , r 0 1 , r 0 2 , r 0 3 , r 0 4 , r 0 5 : s i g n e d ( 2 9 downto 0 ) ;        −− f i r s t   tree level
     signal    r 1 0 , r 1 1 , r 1 2 : s i g n e d ( 2 9 downto 0 ) ; −− 2nd t r e e l e v e l
     signal    r 2 0 , sum : s i g n e d ( 2 9 downto 0 ) ; −− 3 rd t r e e l e v e l ;
     signal    tmp out : s i g n e d ( 2 9 downto 0 ) ;

     s i g n a l s t a t e : s t d l o g i c v e c t o r ( 3 downto 0 ) := ” 0000 ” ;
     s i g n a l o r d e r : u n s i g n e d ( 3 downto 0 ) ;
     s i g n a l l p c s h i f t : s i g n e d ( 4 downto 0 ) ;

  component mult
        port (
          dataa       : in  std logic               v e c t o r ( 1 5 downto 0 ) ;
          datab       : in  std logic               v e c t o r ( 1 3 downto 0 ) ;
          r e s u l t : out s t d l o g i c         v e c t o r ( 2 9 downto 0 ) ) ;
    end component ;
    f o r a l l : mult
        use entity work . mult ( r t l             );

−−     l p c m u l t d o e s n ot r e q u i r e c y c l o n e i i s p e c i f i c p a r t s
−−     I t i s needed f o r some s i m u l a t i o n t e s t s .
−−     component l p c m u l t
−−         port (
−−           i a            : in    s t d l o g i c v e c t o r (15 downto 0 ) ;
−−           i b            : in    s t d l o g i c v e c t o r (13 downto 0 ) ;
−−           o r e s u l t : o u t s t d l o g i c v e c t o r (29 downto 0 ) ) ;
−−     end component ;

begin      −− main


  −− p u r p o s e : c o n t r o l d e c o d e r s t a t e
  −− t y p e          : combinational
  −− i n p u t s : i c l o c k
  −− o u t p u t s :
  s t a t e m a c h i n e : process ( i c l o c k )
  begin −− p r o c e s s s t a t e m a c h i n e
      i f r i s i n g e d g e ( i c l o c k ) then
           i f i r e s e t = ’ 1 ’ then
              s t a t e <= ” 0000 ” ;
              f o r i in 11 downto 0 loop
                  x ( i ) <= ” 00 0000 0000 000 000 ” ;
                  c o e f f ( i ) <= ” 00000000000000 ” ;


                                                                           34
       end loop ; −− i
     end i f ;
     case s t a t e i s
       when ” 0000 ” =>
           i f i v a l i d = ’ 1 ’ then
              s t a t e <= ” 0001 ” ;
              o r d e r <= i o r d e r ;
              x ( 0 ) <= i d a t a ;
              c o e f f ( 0 ) <= i c o e f f ;
               l p c s h i f t <= i l p c s h i f t ;
              −−o r d e r <= ( o r d e r − 1 ) ;
              o u t d a t a <= i d a t a ;
           end i f ;
       when ” 0001 ” =>
           i f ( i v a l i d = ’ 1 ’ ) then
              −− i f ( o r d e r > 0) t h e n
               i f ( o r d e r > 0 ) then
                   f o r i in 11 downto 1 loop
                       x ( i ) <= x ( i −1);
                       c o e f f ( i ) <= c o e f f ( i −1);
                  end loop ; −− i
                  x ( 0 ) <= i d a t a ;
                   c o e f f ( 0 ) <= i c o e f f ;
                   o r d e r <= o r d e r − 1 ;
                   o u t d a t a <= i d a t a ;
                   i f ( o r d e r = 1 ) then
                       s t a t e <= ” 0010 ” ;
                  end i f ;
              end i f ;
           end i f ;
       when ” 0010 ” =>
           i f i v a l i d = ’ 1 ’ then
              f o r i in 11 downto 1 loop
                       x ( i ) <= x ( i −1);
                  end loop ; −− i
                  x ( 0 ) <= tmp out ( 1 5 downto 0 ) ;
                   o u t d a t a <= tmp out ( 1 5 downto 0 ) ;
           end i f ;
       when others => n u l l ;
     end case ;
   end i f ;
 end process s t a t e m a c h i n e ;


−−   M u l t i p l i e r s implemented as LE’ s
−−   Needed t o perform some s i m u l a t i o n s
−−
−−   m u l t s : f o r i i n 0 t o 11 g e n e r a t e
−−      mult i : lpc mult
−−           p o r t map (
−−               i a           => s t d l o g i c v e c t o r ( x ( i ) ) ,
−−               i b           => s t d l o g i c v e c t o r ( c o e f f ( i ) ) ,
−−               o r e s u l t => m u l t o u t ( i ) ) ;
−−   end g e n e r a t e m u l t s ;

−− M u l t i p l i e r s using c y c l o n e i i l i b r a r y
 mults : f o r i in 0 to 11 generate
   m u l t i : mult
   port map (
          dataa             => s t d l o g i c v e c t o r ( x ( i ) ) ,
          datab             => s t d l o g i c v e c t o r ( c o e f f ( i ) ) ,
          r e s u l t => m u l t o u t ( i ) ) ;
 end generate mults ;




                                                                            35
  −− Adder       Tree
  −− F i r s t   Level
  r 0 0 <=       signed ( mult   out ( 0 ) ) + signed ( mult out ( 1 ) ) ;
  r 0 1 <=       signed ( mult   out ( 2 ) ) + signed ( mult out ( 3 ) ) ;
  r 0 2 <=       signed ( mult   out ( 4 ) ) + signed ( mult out ( 5 ) ) ;
  r 0 3 <=       signed ( mult   out ( 6 ) ) + signed ( mult out ( 7 ) ) ;
  r 0 4 <=       signed ( mult   out ( 8 ) ) + signed ( mult out ( 9 ) ) ;
  r 0 5 <=       signed ( mult   out ( 1 0 ) ) + signed ( mult out ( 1 1 ) ) ;

  −− Second l e v e l
  r 1 0 <= r 0 0 + r 0 1 ;
  r 1 1 <= r 0 2 + r 0 3 ;
  r 1 2 <= r 0 4 + r 0 5 ;

  −−Third L e v e l

  r 2 0 <= r 1 0 + r 1 1 ;
  sum <= r 1 2 + r 2 0 ;
  tmp out <= i r e s i d u a l + ( sum s r l t o i n t e g e r ( l p c s h i f t ) ) ;
  o d a t a <= o u t d a t a ;


end main ;




                                                                  36
B         Fixed Decoder

                                      Listing 9: Fixed Decoder RTL Implementation
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−− F i l e               : f i x e d . vhd
−− D e s c r i p t i o n : Part o f t h e frame d e c o d e r t h a t d e c o d e s t h e FIXED−t y p e s u b f r a m e s

library i e e e ;
use i e e e . s t d l o g i c 1 1 6 4 . a l l ;

package s t a t e p k g       is
    subtype s t a t e         ty is      s t d l o g i c v e c t o r (1   downto 0 ) ;
    constant s 0 :            state     t y := ” 00 ” ;             −−    i d l e s t a t e : l o a d warm−ups as t h e y come i n ( i v a l i d )
    constant s 1 :            state     t y := ” 01 ” ;             −−    l o a d r e s i d u a l s as t h e y come i n ( i v a l i d )
    constant s 2 :            state     t y := ” 10 ” ;             −−    compute a l g o r i t h m
end s t a t e p k g ;

library i e e e ;
use i e e e . s t d l o g i c 1 1 6 4 . a l l ;
use i e e e . n u m e r i c s t d . a l l ;
use work . s t a t e p k g . a l l ;

entity f i x e d      is
  port (
     i data ,       i r e s i d u a l : in s i g n e d ( 1 5 downto 0 ) ;
     i order        : in u n s i g n e d ( 2 downto 0 ) ;
     i clock ,       i r e s e t , i v a l i d : in s t d l o g i c ;
     o data :        out s i g n e d ( 1 5 downto 0 )
  );
end f i x e d ;

architecture main of f i x e d i s
  type a r r a y 4 i s array ( 3 downto 0 ) of s i g n e d ( 1 5 downto 0 ) ;
  signal x : array4 ;
      signal bShiftEnable : s t d l o g i c ;                   −− i f 1 , s h i f t o u t p u t i n t o s h i f t r e g i s t e r s
  s i g n a l output temp : s i g n e d ( 1 5 downto 0 ) ;
      signal s t a t e : s t a t e t y ;
      s i g n a l c o u n t e r : u n s i g n e d ( 1 downto 0 ) ; −− c o u n t e r o n l y g o e s from 0−3
      s i g n a l o r d e r : u n s i g n e d ( 2 downto 0 ) ;     −− r e g i s t e r f o r t h e p r e d i c t o r o r d e r
      s i g n a l r e s i d u a l : s i g n e d ( 1 5 downto 0 ) ;               −− r e g i s t e r f o r t h e c u r r e n t r e s i d u a l s i g n a l


begin −− main

    −− p u r p o s e : s w i t c h b e t w e e n f i x e d d e c o d e r s t a t e s
  −− i n p u t s : i c l o c k
  −− o u t p u t s : s t a t e
    s t a t e m a c h i n e : process
    begin         −− p r o c e s s s t a t e m a c h i n e
            wait u n t i l r i s i n g e d g e ( i c l o c k ) ;
             i f i r e s e t = ’ 1 ’ then
                   s t a t e <= s 0 ;       −− r e s e t t o s t a t e s0

                   −− r e s e t s h i f t r e g i s t e r s
          f o r i in 0 to 3 loop
                x ( i ) <= ” 00 0000 0000 000 000 ” ;
          end loop ; −− i

                 −− r e s e t i m p o r t a n t r e g i s t e r s
                  c o u n t e r <= ” 00 ” ;
                  b S h i f t E n a b l e <= ’ 0 ’ ;
             else
                  case s t a t e i s


                                                                            37
                      when s 0 =>
                          i f i v a l i d = ’ 1 ’ then
                               −− r e a d p r e d i c t o r o r d e r from p o r t
                               o r d e r <= i o r d e r ;

                                   −− l o a d t h e warm−up sample
                          f o r i in 3 downto 1 loop
                                x ( i ) <= x ( i −1);
                          end loop ; −− i
                                         x ( 0 ) <= i d a t a ;

                                i f i o r d e r = ( c o u n t e r + 1 ) then
                                        s t a t e <= s 1 ;         −− we have l o a d e d a l l warm−up s a m p l e s
                                end i f ;
                               −− i n c r e m e n t c o u n t e r
                                c o u n t e r <= c o u n t e r + 1 ;
                       end i f ;
                 when s 1 =>
                        i f b S h i f t E n a b l e = ’ 1 ’ then
                               −− S h i f t t h e o u t p u t i n t o t h e s h i f t r e g i s t e r s
                                f o r i in 3 downto 1 loop
                            x ( i ) <= x ( i −1);
                   end loop ; −− i
                                x ( 0 ) <= output temp ;
                                b S h i f t E n a b l e <= ’ 0 ’ ;
                       end i f ;
                        i f i v a l i d = ’ 1 ’ then
                                s t a t e <= s 2 ;
                       end i f ;
                 when s 2 =>
                       −− compute r e s t o r e a l g o r i t h m
                       case o r d e r i s
                                when ” 000 ” =>            −− o r d e r 0
                                        output temp <= r e s i d u a l ;
                                when ” 001 ” =>            −− o r d e r 1
                                output temp <= r e s i d u a l + x ( 0 ) ;
                   when ” 010 ” =>                 −− o r d e r 2
                       output temp <= r e s i d u a l + ( ( x ( 0 ) s l l 1 ) − x ( 1 ) ) ;
                   when ” 011 ” =>                 −− o r d e r 3
                       output temp <= r e s i d u a l + ( ( ( x ( 0 ) − x ( 1 ) ) s l l 2 ) + ( x ( 0 ) − x ( 1 ) ) ) + x ( 2 ) ;
                   when ” 100 ” =>                 −− o r d e r 4
                       output temp <= r e s i d u a l + ( ( x ( 0 ) + x ( 2 ) ) s l l 2 ) − ( ( x ( 1 ) s l l 2 ) + ( x ( 1 ) s l l 1 ) ) − x ( 3 ) ;
                   when others =>
                       null ;
                       end case ;              −− o r d e r
                        b S h i f t E n a b l e <= ’ 1 ’ ;         −− s h i f t o u t p u t i n t o s h i f t r e g i s t e r n e x t c y c l e
                        s t a t e <= s 1 ;
                 when others =>
                       null ;
           end case ;          −− s t a t e
       end i f ;
   end process s t a t e m a c h i n e ;

  −− p u r p o s e : r e a d i n t h e r e s i d u a l s i g n a l
−− i n p u t s : i r e s i d u a l
−− o u t p u t s : r e s i d u a l
  r e a d r e s i d u a l : process
  begin
         wait u n t i l r i s i n g e d g e ( i c l o c k ) ;
         i f ( s t a t e = s 1 OR s t a t e = s 2 ) then
                 i f i v a l i d = ’ 1 ’ then
                        r e s i d u a l <= i r e s i d u a l ;
                end i f ;
         end i f ;
  end process ;


                                                                     38
  −− p u r p o s e : f e e d t h e o u t p u t b a c k i n t o t h e i n p u t
  −− i n p u t s : o u t p u t t e m p
  −− o u t p u t s : o d a t a , d a t a i n
  f e e d b a c k : process ( i c l o c k )
  begin −− p r o c e s s f e e d b a c k
       i f r i s i n g e d g e ( i c l o c k ) then
          o d a t a <= output temp ;
      end i f ;
  end process f e e d b a c k ;

end main ;




                                                                       39
C          Rice Decoder

                                      Listing 10: Rice Decoder RTL Implementation
library i e e e ;
use i e e e . s t d l o g i c 1 1 6 4 . a l l ;
use i e e e . n u m e r i c s t d . a l l ;

entity r i c e d e c o d e i s

   port (
     i bit   : in  std logic ;
     i valid  : in   std logic ;
     i logm   : in u n s i g n e d ( 3 downto 0 ) ;
     i logm valid  : in s t d l o g i c ;
     i reset  : in s t d l o g i c ;
     i clock  : in s t d l o g i c ;
     o number : out u n s i g n e d ( 1 5 downto 0 ) ;
     o valid  : out s t d l o g i c ) ;

end r i c e d e c o d e ;

architecture f u n c of r i c e d e c o d e i s

   signal      s t a t e : u n s i g n e d ( 1 downto 0 ) ;
   signal      count : u n s i g n e d ( 3 downto 0 ) ;
   signal      done r : s t d l o g i c ;
   signal      q : u n s i g n e d ( 7 downto 0 ) ;
   signal      q h o l d : u n s i g n e d ( 1 5 downto 0 ) ;
   signal      r : u n s i g n e d ( 7 downto 0 ) ;
   signal      r i c e p a r a m : u n s i g n e d ( 3 downto 0 ) ;
   signal      p r o d u c t : u n s i g n e d ( 1 5 downto 0 ) ;

begin      −− f u n c

   param : process ( i c l o c k )
   begin −− p r o c e s s param
     i f r i s i n g e d g e ( i c l o c k ) then
        i f i r e s e t = ’ 1 ’ then
            r i c e p a r a m <= ” 0000 ” ;
        e l s i f i l o g m v a l i d = ’ 1 ’ then
            r i c e p a r a m <= i l o g m ;
        end i f ;
     end i f ;
   end process param ;

   s t a t e m a c h i n e : process ( i c l o c k )
   begin −− p r o c e s s s t a t e m a c h i n e
       i f r i s i n g e d g e ( i c l o c k ) then
            i f i r e s e t = ’ 1 ’ or ( count = r i c e p a r a m and ( i v a l i d = ’ 0 ’ or i b i t = ’ 0 ’ ) ) then
                s t a t e <= ” 00 ” ;
            e l s i f i v a l i d = ’ 1 ’ and ( s t a t e = ” 00 ” or ( count = r i c e p a r a m and i b i t = ’ 1 ’ ) ) then
                s t a t e <= ” 01 ” ;
            e l s i f i v a l i d = ’ 1 ’ and s t a t e ( 0 ) = ’ 1 ’ and i b i t = ’ 1 ’ then
                s t a t e <= ” 10 ” ;
           end i f ;
       end i f ;
   end process s t a t e m a c h i n e ;

   i n p u t s : process ( i c l o c k )
   begin −− p r o c e s s i n p u t s
       i f r i s i n g e d g e ( i c l o c k ) then
           i f i r e s e t = ’ 1 ’ then
               q <= ” 00000000 ” ;


                                                                      40
          r <= ” 00000000 ” ;
      e l s i f i v a l i d = ’ 1 ’ then
          i f i b i t = ’ 0 ’ and s t a t e ( 1 ) = ’ 0 ’ then
              q <= q +1;
          e l s i f s t a t e ( 1 ) = ’ 1 ’ then
              r <= r ( 6 downto 0 ) & i b i t ;
         end i f ;
          i f count = r i c e p a r a m then
              q h o l d <= ” 00000000 ” & q ;
              q <= ” 00000000 ” ;
         end i f ;
          i f d o n e r = ’ 1 ’ then
              r <= ” 00000000 ” ;
         end i f ;
      end i f ;
    end i f ;
  end process i n p u t s ;

  c o u n t e r : process ( i c l o c k )
  begin −− p r o c e s s c o u n t e r
      i f r i s i n g e d g e ( i c l o c k ) then
          i f i r e s e t = ’ 1 ’ then
              count <= ” 0001 ” ;
         e l s i f i v a l i d = ’ 1 ’ and s t a t e ( 1 ) = ’ 1 ’ then
              i f count = r i c e p a r a m then
                 count <= ” 0001 ” ;
              else
                 count <= count + 1 ;
             end i f ;
         end i f ;
     end i f ;
  end process c o u n t e r ;

  done : process ( i c l o c k )
  begin −− p r o c e s s done
    i f r i s i n g e d g e ( i c l o c k ) then
       i f i r e s e t = ’ 1 ’ then
           d o n e r <= ’ 0 ’ ;
       e l s i f count = r i c e p a r a m then
           d o n e r <= ’ 1 ’ ;
       else
           d o n e r <= ’ 0 ’ ;
       end i f ;
    end i f ;
  end process done ;

   p r o d u c t <= ( q h o l d s l l t o i n t e g e r ( r i c e p a r a m ) ) ;
   o number <= ( p r o d u c t + r ) ;
   o v a l i d <= d o n e r ;

end f u n c ;




                                                                          41
References
 [1] J.        Coalson,         “libFLAC       Application      Programmer        In-
     terface,”          Xiph.Org          Foundation.      [Online].       Available:
     http://flac.sourceforge.net/api/group flac stream decoder.html#ga31

 [2] M. Eaves, C. Lancaster, J. Shirtliff, and C. Stewart, “H-QuAD: A lossless high-quality
     audio decoder - block verification,” Univ. of Waterloo, Tech. Rep., May 2007.

 [3] J. Coalson, FLAC format, Xiph.Org            Foundation    Std.   [Online].    Available:
     http://flac.sourceforge.net/format.html

 [4] DE2 Development and Education Board User Manual, Altera Corporation, 2006, ver.
     1.4.

 [5] IS61LV25616 256K x 16 High Speed Aysnchronous CMOS Static RAM with 3.3V Supply,
     Integrated Silicon Solution, Inc., November 2001, rev. C.

 [6] J. Coalson, FLAC format- Interchannel Decorrelation, Xiph.Org Foundation Std.
     [Online]. Available: http://flac.sourceforge.net/format.html#interchannel

 [7] ——, FLAC format- Residual Coding, Xiph.Org Foundation Std. [Online]. Available:
     http://flac.sourceforge.net/format.html#residualcoding

 [8] S. Golomb, “Run-length encodings,” IEEE Trans. Inf. Theory, vol. 12, no. 3, pp. 399–
     401, Jul. 1966.

 [9] Ogg Vorbis I format specification: comment field and header specification, Xiph.Org
     Foundation Std. [Online]. Available: http://xiph.org/vorbis/doc/v-comment.html

[10] HD44780U Dot Matrix Liquid Crystal Display Controller/Driver, Hitachi, Ltd., 1998.

[11] Character LCD Core for Altera DE2 Board, Altera Corporation, October 2006.

[12] R. Herveille, I2 C-Master Core Specification, OpenCores, July 2003, rev. 0.9.

[13] WM8731/WM8731L Portable Internet Audio CODEC with Headphone Driver and Pro-
     grammable Sample Rates, Wolfson Microelectronics, April 2004, rev. 3.4.

[14] J. Axelson, USB Mass Storage: Designing and Programming Devices and Embedded
     Hosts. Lakeview Research LLC, 2006.

[15] ISP1362 Embedded Programming Guide, Philips, July 2002, rev. 0.9.




                                            42

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:153
posted:9/4/2010
language:English
pages:49