Laboratory 4

Document Sample
Laboratory 4 Powered By Docstoc
					060609/Thomas Munther
Halmstad University
School of Information Science, Computer and Electrical Engineering

                             Laboratory 4
Length about 2 hours with supervision.
Even if you completed the whole Lab-PM during these 2 hours you should try to
find the time to work through this Lab-PM thoroughly afterward.
The last page contains hand-in assignments. You should try to solve them within a
week to be able to follow the pace in the course.
Everything that follows after >> is written in the Matlab Command Window and
indicate what you as a user should write. The % is intended as a comment to that
specific command.
This laboratory will in part deal with data analysis and statistics.
The content in a vector can be shown by a histogram plot or bar graph.
To create a histogram you normally perform a classification into intervals and count the
number of elements in each and every class. Then the result is presented in a rectangular
fashion, where the height gives the number of elements in each class.

Now let us create a vector in Matlab!

>> x=[1 3 4 6 7 4 4 6 7 3 5 6 2 5 5 6];
>> hist(x),grid % Creates a histogram with 10 intervals. See Figure 1.

                      Figure 1

hist(x)              Plot a histogram with 10 intervals for the vector x.
hist(x,n)            Plot a histogram with n intervals for the vector x.
hist(x,y)            Plot an arbitrary interval with different ranges, which are given in
                     the vector y.

Introduce a histogram with 15 intervals for the vector x above.

The problem is to understand where the limits of the the intervals are.
Maybe it is better to introduce a histogram with 6 intervals, since we know the maximum
and minimum value of the vector.

>> hist(x,6)

This might not seem to be so smooth, to look first for the maximum and minimum values
in the vector x and then decide the appropriate intervals.
It is better if you yourself decide these. Assume that we are interested in integer values
between 0 and 10. Let´s put them in a vector y.

>> y=0:10;
>> hist(x,y); % Histogram for vector x, where y contains interval ranges.
              % See Figure 2.

                        Figure 2
Let´s see if we can illustrate how a bar graph looks like in Matlab.
The following commands could then be useful:

bar(x)        Plots a bar graph for values in vector x versus the index.
bar(x,y)      Plots a bar graph for values in vector x. Locations for the bars are given in
              the vector y. They must be equidistant.
bar(x,y,str) As above, but the third argument controls the colour of the bar graph.
Here follows a bar graph for vector x:
>> bar(x),grid % See Figure 3 !
>> title(’Bar graph for vector x’)

                                   Bar graph for vector x







          0    2       4       6         8           10          12   14   16    18

                               Figure 3

For the sake of completeness we also show some other plotting possibilities.
A stem plot in Matlab is written as:

>> stem(x), grid       % Gives a stem plot.
>> title(’Stem plot for vector x’)

See Figure 4, but please note that this command makes a plot where the value is plotted
versus its vector index.

                                       Stem plot for vector x







           0       2       4       6             8              10    12    14        16

                      Figure 4
Assume you want to produce a sampled curve of a sine wave signal. You pick sample
values from a continuous sine wave with amplitude 1 at equidistant time intervals.
The sine wave looks like: y=sin(x);

>> z=0:0.2:10;
>> yy=sin(z)
>> stem(yy),grid, title(’Stem plot for a sine wave’)

What we basically do is that we perform sampling of a time-continuous signal. Exactly
the same way as all computer-, or microprocessor-controlled systems manipulate a
signal. This generates a time-discrete signal. In Figure 5 we can see what such a signal
looks like for a computer system.

                               Stem plot for a sine wave










             0   10       20              30               40   50   60

                          Figure 5

Let´s take a look at our previous vector x. We have two more useful presentations to
show, like the pie diagram and stairstep plot. I think they are rather straightforward to
understand. The applicable command lines are given below in Figures 6 and 7.

>> subplot(1,2,1)
>> pie(x),grid,title('Pie diagram for vector x')
>> subplot(1,2,2)
>> stairs(x),grid,title('Stairstep plot for vector x')

                                                                   Stairstep plot for vector x

                     Pie diagram for vector x
                          4% 1%      8%
              8%                                           5


         9%                                           3%


               5%                                7%        3



                                                               0   5           10          15    20

                               Figure 6                                             Figure 7

Note that for a pie diagram each value in the x vector is presented as the percentage of the
sum of the vector. For the stairstep diagram it is the value versus vector index that is
plotted, but we could of course plot the value against another vector.

pie(x,explode)     The element in the vector explode which is nonzero will be
                   pulled out from the pie. Each element in the vector x will
                   correspond to one slice in the pie. Note that vector explode
                   should have the same length as x.
pie(x,labels)      Each slice in the pie can be named with labels. Each label
                   is a cell array which has the same length as x and contains
                   strings as elements.
> pie([2 4 3 5],{'North','South','East','West'})

Several of these commands that we use to present data vectors also work in three
dimensions, but their syntax changes slightly: bar3(x), stem3(x) and pie3(x)
Try them if you have time; otherwise, we might come back to them later on in the course.

We also need some statistical commands, for instance to evaluate measured data. There
are some simple statistical functions accessible within the figure window if we choose to
make a plot of a vector. We will investigate different commands, but first
we will repeat some statistical concepts.

Mean value: Assumed to be known.

Median value: Assume five measuremants and store them in vector X
>> X=[ 1 43 52 6 78]
When we would like to calculate the median value of vector X, it is always the value in
the middle that is given, unless it is an even number of elements. Then it is the mean
value of the pair in the middle. There is always an equal number of elements above and
below the median value. In our case the median value of X should be 43.

Variance: Variance estimates deviation in measured data. The standard procedure is to
calculate the deviation to the mean value for each and every data point and then take the
average of all deviations. It always becomes zero, if this is done correctly.
Therefore you first quadrate the deviations and then divide them with the number of data
elements. Now we have the variance of measured data.
Assume that we have a data set containing voltage measurements. Then the variance
vector will have the unit in Volt^2.

Standard deviation: Take the square root of the variance. Suppose we still investigate a
data vector containing voltage measurements, then we will have the standard deviation
expressed in the unit Volts. This is a measure of the average deviation from the mean
value in the data vector.

Correlation: Sometimes you measure several variables simultaneously and want to find
out whether there is some dependence beween them. The correlation gives a number
between -1 and +1. If the number is zero there is no dependence whatsoever between the
variables. A positive correlation could look like: much hay => many fat cows.
Assume sampled measurements performed in a barn. The measured variables are hay and
the fatness of the cattle. A correlation calculation would probably have given a positive

A probable negative correlation could be: Many sold umbrellas => Few sold bottles of
suntan lotion. The measured variables are then number of sold umbrellas and bottles of
suntan lotion. If we carry out the measurements day by day we would probably find a
correlation, but in this case it is a negative correlation.

Note that even if we find a nonzero correlation number, it is not certain that this is
actually a true correlation. Variables could increase or decrease without having any
dependence to one another.
Normally if we have a true correlation there should be a visible proportionality in the
graph between one variable and the other.

Covariance: Covariance can become any number (positive or negative), while the
correlation is normalized between -1 and 1.

Normal distribution: Normal distribution is described by the curve below in Figure 8. It
has a mean value ( centre of the curve ) and the width of the curve gives the standard
This a typical curve with a mean value 0 and standard deviation (SD) 1.
              x 10









          -6         -4          -2          0          2           4          6

                                   Figure 8
The curve in Figure 8 was made from 10 000 000 random numbers using the commands:
>> u=randn(1,10000000); % It takes some time, wait !
>> hist(u,100)              % Argument 100, gives how many interval we have in the
                            % plot.

The curve is also called a Gaussian distribution. A typical normal distribution could be a
national test in the elementary school. The results would then probably be distributed
around a mean value with some deviation.
The mean value could be nonzero and the standard deviation any number.
Suppose we add 2 vertical lines to the plot. At mean value + SD ( 0 + 1). That would
give us 68.3% of the total area.
If we instead put the 2 vertical lines at mean value + 1.96 SD ( 0 + 1.96), we would have
95% of the total area.

Confidence interval: There is 95% probability of finding the true mean value within the
confidence interval 0 + 1.96 SD.
To increase the probability in finding the mean value, we must also increase the interval.
An interval 0 + 2.58 SD gives us 99% probability.

The confidence interval can be considered to be a measure of the uncertainty contributed
by chance, when we try to find the true mean value.

Example 1: Create 2 vectors to which we will apply the statistical commands.
>> X=1:5; Z=[ 0 1 4 7 12];

Find the mean and median values of these. Use the commands mean and median.

>> mean( X), mean(Z)                   % mean of the elements in vector X and Z.

The result should become 3 and 4.8 respectively !

The median values are 3 and 4 respectively !
Check this to make sure!

>> median(X),median(Z)

and the standard deviation for vector X and Z.

>> std(X),std(Z)

1.5811 and 4.8683        % a measure of the average deviation.

We can clearly see that the deviation is much larger in Z than in X.
This is exactly what the standard deviation says. Just take a look on the elements at each
vector. Suppose we would like to plot vector Z versus X.
See how the graph looks in Figure 9.

>> plot(X,Z), grid







             1     1.5    2      2.5      3      3.5     4       4.5     5

                                      Figure 9
Enter the menu of the figure window and choose Tools-> Data Statistics.
Now another window appears that gives you options for the following commands: min,
max, mean, median, std and range both for X and Z vector. Mark mean, std and median
for the Z vector. Please note the changes in your figure window. See Figure 10 below !
Notice the three dotted lines. The one in the middle gives the mean value of the Z-vector,
and the upper line corresponds to mean value plus the standard deviation of Z. The
lower line is the mean value minus the standard deviation of Z.

        12                                                 data 1
                                                             y mean
                                                             y median
        10                                                   y std





             1   1.5    2     2.5     3      3.5     4      4.5         5

                                   Figure 10
In this case it seems that we have a positive correlation between Z and the X vector.
When X increases so does Z.

The commands we have used so far are meaningful even for matrices, but please take
notice of the result.
mean(A)              Gives a vector where each element is the mean value of the column.
median(A)            Gives a vector where each element is the median value of the
std(A)               Gives a vector where each element is the standard deviation of the

Form the matrix A !
>> A=[X;Z]

Try some of the commands in the table above on matrix A.
Are there any surprises ?

We also give some additional commands aside from matrix-manipulating commands to
calculate the sums, differences and products of the elements. They can also be applied to
vectors as well.
prod (A)         Gives a vector where each element is the product of the elements in the
sum(A)           Gives a vector where each element is the sum of the elements in the
diff(A)          Gives a vector where each element is the difference of the two closest
                 elements in the column.
sort(A)          Gives a vector where all the elements in the columns are sorted in
                 ascending order.

Introduce a matrix A that we will manipulate and also use the vector X.

>> A= 1 2 3
       4 5 6
       7 8 9
The commands in the table above can also be used on vectors. Try them on X.
>> prod(X),sum(X),diff(X)          % I hope you understand the commands.

>> diff(x,2)                  % The same as diff(diff(x)).

Try the command on matrix A instead:

>> diff(A), prod(A), sum(A)

ans=                   ans=                             ans =
   3 3 3                       28 80 162                     12 15 18
   3 3 3
Finally create a new matrix F. We will try to modify the output of the command line.
Everything to the left of the equal sign is assumed to be output. In some cases one can
what output one wants from the execution of the command.


  12 15 18
   1 2 3
   4 5 6
   7 8 9
Use the command sort on matrix F.
[A,index]=sort(F)     % Gives a sorted matrix A, but also indices of how the elements
                      % have been moved.
This could be useful.
The operation is only performed within each column. Therefore it is enough with one

Now we turn to something completely different from what we have been doing so far in
this laboratory. We often need nice tables, diagams, graphs and plots. The ouputs we
have displayed in the command window have been more or less uncontrolled. In short
we, need some command that can help us control the output.
Therefore we introduce some Matlab commands with formatting codes that can help
us make output more enjoyable.
We start with the command fprintf.

Example: Assume that we would like to produce a table consisting of 3 columns. The
first one has different integer values x; the second column contains square root values
of x and the third one contains the third root values of x. Look below for a suggested m-
file producing such a table.

% Alt_1.m, m-file created 050616 by Thomas Munther
% The m-file creates a table consisting of 3 columns,
% \t =tab, \n=change of row and formatting code %6.3f=6 positions with 3 decimals.
y1=sqrt(x); y2=x.^3;
Y=[x’ y1’ y2’];
disp(’ x’       sqrt(x)        x^3’)
fprintf(’%4.0f \t          %6.3f \t          %6.3f \n’, Y’)

The output will be displayed in the command window as:

x        sqrt(x)        x^3
 1       1.000         1.000
 2       1.414         8.000
 3       1.732       27.000
Exactly the same output can be achieved by the following m-file.
% Alt_2.m, m-file created 050616 by Thomas Munther
% The m-file creates a table consisting of 3 columns,
% \t =tab, \n=change of row and formatting code %6.3f=6 positions and 3 decimals.
disp(' x        sqrt(x)        x^3 ');
for x=1:3
         y1=sqrt(x); y2=x.^3;
         Y=[ x' y1' y2'];
         fprintf('%2.0f \t          %6.3f \t          %6.3f \n',Y)
Finally we will also use the command fprintf to write to a text-file which we will create.
We modify the m-file alt_2.m . Change the row with the fprint command in the previous
m-file to the one below.

disp(' x        sqrt(x)       x^3 ');
fid=fopen('alt_2.txt','w');                           % Opens a file for write access.
for x=1:3
y1=sqrt(x); y2=x.^3;

Y=[ x' y1' y2'];
fprintf(fid, '%2.0f \t %6.3f \t        %6.3f \n',Y)     % fid=file identifier
fclose(fid)                                              % Closes the file.

Check out the content in the text-file alt_2.txt !

There are many formatting codes to be used together with fprintf command, but these
codes can also be used with other commands as well. To read text-files one can use the
fscanf command, or textscan that converts text to a cell array.
\n                  New row
\r                  Back to the
                    beginning of the
\b                  Backspace
\t                  Tab
\f                  New page
%s                  String
%e                  Exponential form
%f                  Presentation in
%u                  Integer presentation
%g                  More compact than

On my homepage you can find a text-file named namn.txt . Read the file by using the
command textscan.

>> fid = fopen('namn.txt',’r’);            % Opens the file for reading (r=read).
>> C= textscan(fid,'%u%s%u%u');            % The output becomes a cell array.
>> fclose(fid)                             % Closes the file for reading.

Find out what the content in cell array C looks like!

>> C{1,1}, C{1,2},C{1,3},C{1,4}

There is more than one command that can achieve this. Both textread and fscanf can read
files. Use the help browser for the commands and read the examples thoroughly. Then it
should be manageable to open the file and read it.

           Homework assignments for Laboratory 4

1. A vector x and a matrix A are stored in a file matrisdata.mat and can be found
   on the homepage of the course.
   Write an m-file which reads the vector and the matrix.
   Decide the length and size of of the vector and matrix.
   Decide mean value, median value and standard deviation for both vector x and
   matrix A. Display the results nicely. Make a table where you have used the
   command sprintf or fprintf for the display of the output.

2   Generate an m-file which creates a vector with 5000 normally distributed random
    numbers ( randn !) with mean value 0 and standard deviation 1.
    Calculate mean value, median value and standard deviation for the elements in
    the vector. Plot the vector in figure window 1, where you can see the
    randomness of the numbers. Add three horizontal lines to your plot. These lines
    should indicate the mean value and mean value + standard deviation.

    Make also a similar plot, but for a vector with 1000 normally distributed random
    numbers with mean value 12 and standard deviation 3.
    This plot should be in figure window 2.
    Add three horizontal lines to your plot. These lines
    should indicate the mean value and mean value + standard deviation.
    Calculate mean value, median value and standard deviation for the elements in
    the vector.

3. In a table (tabell3) on the homepage one can see the Internet usage in Sweden for
   different categories such as age, gender and education background.
   The table is stored like a vector, row-wise, and has also been transposed and can
   be found in a mat-file named tabell3, and the vector is also named tabell3.
   Create a bar diagram that has age groups on the x-axis and Internet usage on the

    Plot two other diagrams (pie diagram), one for each gender. The Internetusage in
    these age groups should be related to the total number of persons for these age
    groups ( 16-74). This should be done separetely for each gender.
    Introduce also a group named others for those who are not Internet users in the
    group 16-74 years old . Every pie should consist of 6 slices.

4. Now let’s get a file consisting of ordinary Swedish names for boys yin 2004.
   Look in the appendix to this lab-paper to see how this text file partly looks.
   The names could be found in a file namn.txt ( on the homepage).
   The following should be done: Read all of the names within your m-file and then
   find all names that start with the J letter and then put them in a bar diagram.

        The y axis should show number of boys, and the x-axis should show, the names
        under each bar. Use the command textread for the reading of the file !
        Hint: help strmatch !
The following must be done in order to pass the laboratory.
You must hand in a m-file for each problem. The m-file should begin with 2 comment
lines stating when this file was created and by whom. Mail the the m-files to me.
I prefer that you send them in a compressed format like zip.files.


Shared By: