Docstoc

project2

Document Sample
project2 Powered By Docstoc
					                                               cs260
                                            project two
                                        pitch identification


                                  Revision Date: September 16, 2009


Your assignment is to get a start on designing and building software for turning an off-key singer into the
next singing sensation!


Introduction
Pitch modification plays an important role in the modern music industry. Subtle changes in pitch can vastly
improve the recorded human voice. For an example of this kind of pitch modification, and voice enhancement
in general, see episode 14 of season 12 of The Simpsons (Yvan eht nioj!). Another use of pitch modification
is to generate new notes (higher or lower) from existing recorded notes. This is useful in synthesizing musical
instruments. Given a small set of notes, the full range of the instrument can be generated. However, before
one can modify the pitch of a musical note, one needs to determine exactly what that pitch is.
Your first program will read in a musical note stored as an RRA file and then determine the pitch of the
note.


The RRA format
RRA stands for readily readable audio. RRA files are compose of a header section and an amplitude section.
The header section starts with the token:

    RRAUDIO

and ends with the token:

    %%

The header section contains information about the audio file such as the number of samples, the number of
bits per sample, and the number of data channels. For this exercise, we will assume default values, so you
can ignore the header.
Here’s how to get an RRA file that has a pure 440 Hz tone:

    wget beastie.cs.ua.edu/cs260/projects/a4.rra
If you look at the file, you will see some header information, then the token %%, then a series of numbers.
These numbers represent samplings of the amplitude of the audio waveform at successive points in time.
Your program will read these values into a dynamic array and then use the Zero Crossing method for
determining the pitch of the note.


1    The Zero Crossing pitch estimator
The Zero Crossing pitch estimator (ZCr) is easy to implement. Just count the number of times the audio
single crosses zero on an upward slope (if you count both upward sloping and downward zero crossings, your
resulting frequency will be twice as high as it actually is).
To implement ZCr, find the total number of upwards zero crossings (n) in the audio signal, noting the
locations of the first (a) and last (b) of the upwards zero crossings. The frequency f0 is calculated with the
following formula:

                                               (n − 1) × sampleRate
                                        f0 =
                                                       b−a

You may assume a CD-quality sampling rate of 44100 samples per second.


Data structures
The amplitude samples are to be read in and stored in a array. Since you won’t know how big the array
should be at the start, you will need to grow the array periodically.
A subtask is to provide a dynamic integer array class which implements the following methods:

 getitem (self,n)
    retrieve integer at location n
 setitem (self,n,i)
    place integer i at location n
append(i)
     place integer i at the end (grow if necessary)
grow()
     add unfilled slots to the array

shrinkToFit()
     remove unfilled slots at the end of the array

You may need other methods as well. Also, your array class will need this constructor:

 init (self,s,g)
    construct an array with s initial empty slots with growth factor g. If g is positive, add g empty slots
    to the end of the array when more slots are needed. If g is negative, add (1 − g) ∗ k empty slots where
    k is the current size of the array. For example, g = −1 would double the size of the array when more
    slots are needed.




                                                      2
In particular, you are to implement the append method to grow the size of (based upon the growth factor)
if the array is full. Place the implementation of the array in the module darray.py. In addition, the append
method should always place the given value at an index one higher than the index of the element with the
highest index so far. Your get and set methods should raise an exception if the given index is greater than
the current size of the array.
Since python does not have arrays, your instructor has written a C program and Makefile that supplies the
needed functionality. Get this code with the following commands:

    wget beastie.cs.ua.edu/cs260/projects/array.py
    wget beastie.cs.ua.edu/cs260/projects/array.c
    wget beastie.cs.ua.edu/cs260/projects/Makefile

Your dynamic array class will use the given array class as a client, as in:

    def __init__(self,size,factor)
        self.store = array(size);
        ...

where store is the internal array where your dynamic array keeps its data.
To compile the library needed by array.py, execute the command:

    make



Input/Output
Your program should report statistics about the audio file: number of samples, max value, min value, and
an estimate of f0 , for the region of the file that was scanned (see -s and -S options).
Your program should handle a few options:

-f report only f0 (just the number, no statistics)
-s NNN.NN ignore the first NNN.NN seconds of the sample; default is zero
-S NNN.NN obtain the pitch using the first NNN.NN seconds of the sample after the skip; default is zero
     which means all of the (remaining) data
-d print out an informative message each time the array grows or shrinks; default is false

To see some code that uses an option handler, get this file:

    wget beastie.cs.ua.edu/cs260/projects/options.py

You may incorporate this code into your program.


Grading
Points will be deducted for not adhering to the specifications given in this document and in the grading
rubric. Points will be deducted for bad style, especially unreasonable amounts of duplicated code, as well
as for sloppy formatting, insufficient or overly verbose documentation, compiler warnings, run-time crashes,
and other such transgressions. You will receive no credit if your program fails to run to completion on all
tests.


                                                      3
2     CHALLENGE: The YIN pitch estimator
This challenge is optional.
The zero crossing estimator, while easy to implement is also easily fooled if the signal fluctuates around zero
on its upwards and downwards trajectories. An estimator that is not fooled by these fluctuations is the YIN
estimator.
Intuitively, the YIN estimator is calculated by making a copy of a portion of the waveform near the beginning
(this copy is called a window) and sliding that portion over the waveform, stopping at the first place where
it matches up well again (of course, it matches perfectly at the start). The sample rate (for this exercise,
assume the sample rate is 44100 samples per second) divided by the number of samples between the start
and the stop is an estimate of f0 , the fundamental frequency. The fundamental frequency is usually, but not
always the perceived pitch of a note.


2.1    Stepwise refinement for the YIN estimator

A window into a spectrum can be specified by a starting location, L, and a length n. The window is the set
of amplitude values:

                                       AL+0 , AL+1 , AL+2 , ..., AL+n−1

To find the first place (other than the beginning) where the spectrum matches up well, do the following:

    • Define a function that, when given an array and a threshold, returns the index of the first value below
      the threshold. It should return -1 if no value is below the threshold.
    • Define a function that when given an array and an index, adjusts the index to the right as long as the
      value to the right is lower. This function returns the adjusted index.
    • Define a function that when given an array, two starting locations, and a length, returns a fitness
      measure of how well the the two windows match up with each other. The first window is specified by
      the first starting location and the length; the second is specified by the second location and length. The
      fitness measure can simply be the sum of the absolute values of the differences between the points in
      the window. For a perfect match, the differences are all zero. Denote this function the spot correlation
      function.
    • Define a function that when given an array and a window length returns a new array containing all
      fitness measures from zero to some good value. The fitness measure for offset zero is stored at index
      zero of the new array, the fitness measure for offset one is stored at index one and so on. It does this
      by calling the spot correlation function repeatedly with the array, 0, the current offset, and the window
      length. Denote this function the array correlation function. and the array it returns the correlation
      array. The correlation array at index zero should have a value of zero (why?).

    • Define a function that takes the correlation array, a, and returns a new array whose values are computed
      thusly: for each index i in the new array, n, the value is

                                                           a[i] ∗ i
                                                  n[i] =    i
                                                                 a[j]
                                                           j=0


      Denote this new array the normalized correlation array.



                                                      4
   • Define a function that takes the normalize correlation array and a threshold and returns the first repeat
     location. The function finds the leftmost value in the array that is below the threshold value (via the
     first function you defined) and searches rightward as long the next value to the right is lower than
     the current value (via the second function you defined). This location is returned as the first repeat
     location.
      If no value is below the threshold, the function finds the lowest value over all values in the normalized
      correlation array and returns the location of this value as the first repeat location.
   • The sample rate divided by the first repeat location is the estimate of the frequency.

The default window length should be 500 samples and the default threshold should be 0.1.


2.2    YIN options

Add the following options to your program:

pitch -w NNN <filename> use a window size of NNN samples
pitch -t NNN.NN <filename> use a threshold value of NNN.NN


Submitting the assignment
While in your working directory, type the command:

      submit cs260 xxxxx project2

where xxxxx is replaced by your instructor name.
The submit program will bundle up all the files in your current directory and ship them to me (including
subdirectories).
You may submit as many times as you want; new submissions replace old submissions.

   • a4.rra
   • array.c
   • array.py

   • darray.py
   • libarray.so.1.0
   • Makefile
   • pitch.py

   • scanner.py

You may also submit any test cases you used.




                                                      5

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:11
posted:3/3/2011
language:English
pages:5