Face recognition
KH Wong
face recogntiion 1
Overview
PCA Principle component analysis
Application to face recognition : eigen face
Reference: [Bebis]
face recogntiion 2
PCA Principle component analysis [1]
A method of data
compression a1
Use less data in y to a b1
represent x but retain 2 b
important information x : y 2
..
E.g N=10, 10 dimensions
:
reduced to K=5 dimensions. a N bK
So recognition is easier.
where N K
face recogntiion 3
Data reduction N dataK data (N>K),
how?
a1
b1
or
a
b
2
x : y 2
y Tx, where
..
:
bK
t11 t12 .. t1N
a N
t t22 .. t2 N
where N K T 21
b1 t11a1 t12a2 ... t1n a N : : : :
b2 t21a1 t22a2 ... t1n a N
: tK 1 tK 2 .. tKN
bk tk 1a1 tk 2 a2 ... tkna N
face recogntiion 4
Dimensionality basis
The higher dimensiona l space has basis vi 1, 2,.....,N
x j a1v1 a1v2 ... a N vN j
v1 , v2 ,..., vN , are the basis of the N - dimensiona l space
The lower dimensional space has basis ui 1, 2,..,K
x j b1u1 b1u2 .......... bN uN j
ˆ .....
u1 , u2 ,...,uN , are the basis of the N - dimensional space
NK
We can make the vector
xj xj
ˆ
Note : if N K then x x
ˆ
face recogntiion 5
Redundancy concept in space
u1
x2
x2
x1
x1 P ca .pdf
Data are spread all over the 2D Data are spread along one line , so
space, so redundancy of using 2 redundancy of using 2 axes is high.
axes (x1,x2) is low Can consider to use one axis (U1)
along the spread of data to
represent it. Although some error
may be introduced.
face recogntiion 6
The concept
This point is x j according
to the space with basis {v1,v2 }
ˆ
Or x j according to the space
In this diagram, the
v2 with basis{u1}.
data is not entirely xj xj
ˆ
random, we can use a
u1
1-dimensional space
(basis u1)to represent
(approximate) the data
in the 2 dimensional
space (basis v1,v2)
v1
But some information is lost.
because x j x j error
ˆ
face recogntiion 7
PCA will enable information lost to be
minimized
Use Eigen value
method to find the best
solution to minimize
error
Minimize _ the _ error x x j
ˆ
face recogntiion 8
PCA Algorithm [smith 2002]
Proof is in Appendix
Step1:
get data
Step2:
subtract the mean
Step 3:
Find covariance matrix C
Step 4:
find eigen vectors and eigen values of C
Step5:
Choosing the large feature components.
face recogntiion 9
Some math background
Mean
Variance/ standard deviation
Covariance
Covariance matrix
face recogntiion 10
Mean, variance (var) and
standard_deviation (std) x=
1 n
mean x xi
2.5000
0.5000
n i 1 2.2000
2 1.9000
1 n
var( x ) xi 3.1000
n i 1 2.3000
2.0000
std ( x ) var( x ) 1.0000
1.5000
1.1000
mean_x =
%matlab code 1.8100
x=[2.5 0.5 2.2 1.9 3.1 2.3 2 1 1.5 1.1]' var_x =
mean_x=mean(x) 0.6166
var_x=var(x) std_x =
0.7852
std_x=std(x)
face recogntiion 11
Covariance [see wolfram mathworld]
“Covariance is a measure of the extent to which
corresponding elements from two sets of ordered
data move in the same direction.”
http://stattrek.com/matrix-algebra/variance.aspx
x1 y1
X : , Y :
xn
yn
covariance( X , Y )
N
xi x yi y
i 1 N 1
face recogntiion 12
Covariance (Variance-Covariance) matrix
”Variance-Covariance Matrix: Variance and covariance are often displayed together in a variance-
covariance matrix. The variances appear along the diagonal and covariances appear in the off-diagonal
elements”, http://stattrek.com/matrix-algebra/variance.aspx
Assume you have C sets of data X c 1,X c 2 ,..,Xc C . Each has N entries.
xc ,1
X c : , X c mean( X c )
xc , N
covariance_matrix( X , Y )
N
x1,i X 1 x1,i X 1 x 1,i X 1 x2 ,i X 2 x X 1 xc ,i X c
N N
.. 1,i
iN 1
i 1 i 1
1 x2,i X 2 x1,i X 1 x X 2 x2,i X 2 .. x2,i X 2 xc ,i X c
N N
2 ,i
( N 1) i 1 i 1 i 1
: : : :
N
xC ,i X C x1,i X C x X C x2,i X 2 .. xC ,i X C xc ,i X c
N N
C ,i
i 1 i 1 i 1
face recogntiion 13
N or N-1 as denominator??
see
http://stackoverflow.com/questions/3256798/why-does-matlab-native-
function-cov-covariance-matrix-computation-use-a-differe
“n-1 is the correct denominator to use in
computation of variance. It is what's known
as Bessel's correction”
(http://en.wikipedia.org/wiki/Bessel%27s_corr
ection) Simply put, 1/(n-1) produces a more
accurate expected estimate of the variance
than 1/n
face recogntiion 14
Example
covariance_matrix
N
x1,i X 1 x1,i X 1 /( N 1) x X 1 x2,i X 1 /( N 1)
N
1,i
Step2: iN1
i 1
x X x X /( N 1)
x2,i X 1 x2,i X 1 /( N 1)
N
X_data_adj = 2,i
i 1
1 1,i 1
i 1
X=Xo-mean(Xo)=
[0.6900 0.4900
-1.3100 -1.2100
x X 1 x1,i X 1 /( N 1)
N
0.3900 0.9900 1,i
i 1
0.0900 0.2900
1.2900 1.0900 (0.69*0.69+(-1.31)*(-1.31)+0.39*0.39+0.09*0.09+
1.29*1.29+0.49*0.49+0.19*0.19+(-0.81)*(-0.81)+
0.4900 0.7900
(-0.31)*(-0.31)+(-0.71)*(-0.71))/(10-1)
0.1900 -0.3100 =0.6166
-0.8100 -0.8100
-0.3100 -0.3100
-0.7100 -1.0100]
Xc=1 Xc=2 face recogntiion 15
Example
covariance_matrix
N
x1,i X 1 x1,i X 1 /( N 1) x X 1 x2,i X 1 /( N 1)
N
1,i
Step2: iN1
i 1
x X x X /( N 1)
x2,i X 1 x2,i X 1 /( N 1)
N
X_data_adj = 2,i
i 1
1 1,i 1
i 1
X=Xo-mean(Xo)=
[0.6900 0.4900
-1.3100 -1.2100
0.3900 0.9900
0.0900 0.2900
x X 1 x2,i X 2 /( N 1)
N
1,i
1.2900 1.0900 i 1
0.4900 0.7900 (0.69*0.49+(-1.31)*(-1.21)+0.39*0.99+0.09*0.29+
0.1900 -0.3100 1.29*1.09+0.49*0.79+0.19*-0.31+(-0.81)*(-0.81)+
(-0.31)*(-0.31)+(-0.71)*(-1.01))/(10-1)
-0.8100 -0.8100
=0.6154
-0.3100 -0.3100
-0.7100 -1.0100]
Xc=1 Xc=2 face recogntiion 16
Example
covariance_matrix
N
x1,i X 1 x1,i X 1 /( N 1) x X 1 x2,i X 1 /( N 1)
N
1,i
Step2: iN1
i 1
x X x X /( N 1)
x2,i X 1 x2,i X 1 /( N 1)
N
X_data_adj = 2,i
i 1
1 1,i 1
i 1
X=Xo-mean(Xo)=
[0.6900 0.4900
x X 1 x2,i X 2 /( N 1)
N
2 ,i
-1.3100 -1.2100 i 1
0.3900 0.9900
(0.49*0.49+(-1.21)*(-1.21)+0.99*0.99+0.29*0.29+
0.0900 0.2900 1.09*1.09+0.79*0.79+(-0.31*-0.31)+(-0.81)*(-0.81)+
1.2900 1.0900 (-0.31)*(-0.31)+(-1.01)*(-1.01))/(10-1)
0.4900 0.7900 = 0.7166
Hence:
0.1900 -0.3100
Covariance_matrix of X =
-0.8100 -0.8100 cov_x=
-0.3100 -0.3100
-0.7100 -1.0100] 0.6166 0.6154
0.6154 0.7166
Xc=1 Xc=2 face recogntiion 17
Eigen vector of a square matrix
Skip the following 3 slides if you are familiar with eigen values and vectors
Because A is
rank2 and is 2x2
AX= X,
so A has
2 eigen values
and 2 vectors eigvect of cov_x =
covariance_matrix of X = [-0.7352 0.6779]
In Matlab
cov_x= [ 0.6779 0.7352]
[eigvec,eigval]
=eign(A)
[0.6166 0.6154] eigval of cov_x =
[0.6154 0.7166] [0.0492 0]
[ 0 1.2840 ]
So
eigen value 1= 0.49,
its eigen vector is [-0.7352 0.6779]T
eigen value 2= 1.2840,
its eigen vector is [-0.6779 0.7352]T
face recogntiion 18
To find eigen values
x1 x1
A λ λ 2 ( d a ) λ ( ad bc) 0
x2 x2 solution to this quadratic equation
a b x1 x1 ( d a ) ( d a ) 2 4( ad bc)
c d x λ x
λ
2
2 2 So if
ax1 bx2 λx1 a b 0.6166 0.6154
c d 0.6154 0.7166
( a λ ) x1 bx2 (i )
( d a ) ( d a ) 2 4( ad bc)
λ
2
cx1 dx2 λx2 λ ( 0.7166 0.6166) / 2
cx1 ( λ d ) x2 (ii ) ( 0.7166 0.6166)^2 4 * (0.6166 * 0.7166 0.6154 * 0.6154)
(i ) /(ii ) 2
(a λ) b
eigen values are λ1 0.0492, λ2 1.2840.
c (λ d )
aλ ad λ 2 λd bc
face recogntiion 19
Find eigen vectors from eigen values
eigen values are λ1 0.0492, λ2 1.2840.
x1
eigen vector for λ1 is
x2
a b x1 x1
c d x λ1 x
2 2
0.6166 0.6154 x1 x
0.6154 0.7166 x 0.0492 1
2 x2
solve the above equation
So
x1 0.7352 eigen value 1= 0.49,
x 0.6779 its eigen vector is [-0.7352 0.6779]T
2
Similarly, another eigen vector for λ2is
eigen value 2= 1.2840,
x '1 0.6779 its eigen vector is [-0.6779 0.7352]T
x ' 0.7352
2
face recogntiion 20
PCA example
Step2:
Step1: X_data_adj =
Original data = X=Xo-mean(Xo)=
Xo=[ [0.6900 0.4900
2.5000 2.4000 -1.3100 -1.2100
0.3900 0.9900
0.5000 0.7000 0.0900 0.2900
2.2000 2.9000
1.2900 1.0900
1.9000 2.2000
0.4900 0.7900
3.1000 3.0000 0.1900 -0.3100
2.3000 2.7000 -0.8100 -0.8100
2.0000 1.6000 -0.3100 -0.3100
1.0000 1.1000 -0.7100 -1.0100]
Data is biased in this 2D space (not
1.5000 1.6000
1.1000 0.9000] random) so PCA for data reduction
Mean 1.81 1.91 will work. We will show X can be
approximated in a 1-D space with
small data lost.
Eigen vector with
Step4:
Step3: small eigen value
eigvect of cov_x =
Covariance_matrix of X = -0.7352 0.6779
cov_x= Eigen vector with
0.6779 0.7352
Large eigen value
0.6166 0.6154 eigval of cov_x =
0.6154 0.7166 0.0492 0 Small eigen value
0 1.2840 Large eigen value
face recogntiion 21
Step 5:Choosing eigen vector (large feature component) with large eigen value
for transformation to reduce data
Eigen vector with
Covariance matrix of X eigvect of cov_x = small eigen value
cov_x = -0.7352 0.6779
0.6779 0.7352 Eigen vector with
0.6166 0.6154 Large eigen value
0.6154 0.7166 eigval of cov_x =
0.0492 0
0 1.2840 Small eigen value
Fully Y PX Large eigen value
reconstruction X Original data_mean_adjusted
case: Y transposednew data
For comparison P with each row pi is an eigen vector of cov(X)
only, no data lost You have twochoices :
Eigen vector wi the biggest eigen value of covariance(X) transpose
th d
P _ fully _ rec
PCA algorithm d
Eigen vector wi second biggest eigen value of covariance(X) transpose
th
will select this 0.6779 0.779
Approximate 0.7352 0.6779
Transform Eigen vector wi the biggest eigen value of covariance(X) transpose
th d
P _ approx _ rec
P_approx_rec 0
0.6779 0.779
For data reduction
0 0 face recogntiion 22
Eigen vector wi the biggest eigen value of covariance(X) transpose
th d
P _ fully _ rec
d
Eigen vector wi second biggest eigen value of covariance(X) transpose
th
0.6779 0.779
0.7352 0.6779
Eigen vector wi the biggest eigen value of covariance(X) transpose
th d
P _ approx _ rec
0
0.6779 0.779
0 0
Y PX
Y_Fully_reconstructed Y_Approximate_reconstructed
(use 2 eignen vectors) (use 1 eignen vector)
Y_full=P_fully_rec_X Y_approx=P_approx_rec_X (the
(two columns are filled)= second column is 0) =
0.8280 -0.1751 0.8280 0
-1.7776 0.1429 -1.7776 0
0.9922 0.3844 0.9922 0
0.2742 0.1304 0.2742 0
1.6758 -0.2095 1.6758 0
0.9129 0.1753 0.9129 0
-0.0991 -0.3498 -0.0991 0
-1.1446 0.0464 -1.1446 0
-0.4380 0.0178 -0.4380 0
-1.2238 -0.1627 -1.2238 0
{No data lost, for comparaison only} {data reduction 2D 1 D, data lost
exist}
face recogntiion 23
Eigen vector wi the biggest eigen value of covariance(X) transpose
th d
P _ fully _ rec
d
Eigen vector wi second biggest eigen value of covariance(X) transpose
th
0.6779 0.779
0.7352 0.6779
Eigen vector wi the biggest eigen value of covariance(X) transpose
th d
P _ approx _ rec
0
0.6779 0.779
0 0
Y PX
X P 1Y PT Y , becuase P 1 PT
‘O’=Original data
‘’=Recovered using one ‘+’=Recovered
eigen vector that has the using all Eigen
biggest eigen value vectors
(principal component) reconstructed _ x _ full
reconstructed _ x _ approx.
P _ full _ recT * Y _ full
P _ approx _ recT * Y _ approx
Same as
original , so no
Some lost of information lost of
information
eigen vector with large
eigen value (blue)
eigen vector with small
face recogntiion 24
eigen value (blue)
Exercise
?
face recogntiion 25
PCA algorithm
Input data x1 , x2 ,...,xM are N 1 vectors.
1 M
Step1 : x xi
M i 1
Step2 : subtract t mean : i ( N 1) xi x ( N 1)
he
Step3 : form matrix A 1 2 .. M ( N M ) ,
find the covariance matrix
M
1
C N N 1 j Tj A( N M ) AT ( M N )
M j 1 M ( N N )
Step4 : find eigen values of C : 1 2 .. N
Step5 : find eigen vectorsof C : u1 , u2 ,..,uN
face recogntiion 26
Continue
Since C is symmetricu1,u2 ,..,uN form a basis (i.e. any vectorx
N
actually x x u1b1 u2b2 ... bn un bi ui
i 1
ng
Step6 : (dimensional reduction step)keep only the terms correspdni to
the K largest eigenvalue s
K
x x bi ui where K N
ˆ
i 1
The represntation of x x into the basis u1,u2 ,..,uk is thus
ˆ
b1
b
2
:
bK
face recogntiion 27
PCA
Space dimension N reduced to dimension K
Ui are normalized unit vectors
b1 u1
T
b T
2 u2 x x ( N 1) T x x ( K 1)
U
: :
T
bK ( K 1) uK ( K N )
face recogntiion 28
Geometric interpretation
PCA transforms the coordinates along the along the spread
of the data. Here are axes: u1 and u2
The coordinates are determined by the eigenvectors of the
covariance matrix corresponding to the largest
eigenvalues.
The magnitude of the eigenvalues corresponds to the
variance of the data along the eigenvector directions
x2 u1
u2
_
x
x1
face recogntiion 29
Choose K
> Threshold 0.95 will
preserve 95 % K
information i
i 1
N
Threshold (e.g. 0.95)
If K=N ,100% will be i
reserved (no data i 1
1 N
reduction) error e i
2 i K 1
Data standardization is Before you use xi
needed xi mean( xi )
xi
std ( xi )
std standard deviation
face recogntiion 30
Application to face recognition
Step1: obtain the training faces images
I1,I2,…,IM (centered and same size)
Each image is represented by a vector
Each image (a NxN matrix) (N2x1) vector
N=100
Vector N2X1=10000x1
N=100
face recogntiion 31
Continue
Note: C (size N2XN2) is too large to be
calculated , if N=100, C=10000x10000
Input data Γ 1,Γ 2 ,...,Γ M are N 2 1 vectors.
1 M
Step3 : (11) Γi
M i 1
Step4 : subtract t mean : i ( N 2 1) Γ i ( N 2 1)
he
Step5 : form matrix A 1( N 2 M ) 2 ( N 2 M ) .. M ( N 2 M )
( N 2 M )
,
find the covariance matrix C of A
M
1
CN 2 N 2 1 j Tj A( N M ) AT ( M N )
M j 1 M ( N 2 N )
face recogntiion 32
Continue
Step6 : Find eigen vectors ui of AAT ( N 2 N 2 ) ,
( e. g .size( A) is N 2 xM 10000x300)
AAT ui i ui
1
C AAT is too large to be calculated in limited time,
M
Size of C N 2 xN 2 10000x10000 if N 100
- - - -The trick is as follows : - - - - - - -
Step6.1: find A T A (size MxM) (instead of C AA T (size N 2 xN 2 ) )
e.g.M 300 training images , then AT A (size 300x300)
Step 6.2 : Find vi the eigen vectorsof AT A.
What is the relation between
ui eigen _ vector _ of ( AAT )( N 2 N 2 1000010000) , and
vi eigen _ vector _ of ( AT A)( M M 300300) ?
face recogntiion 33
What is the relation between
Continue ui eigen _ vector _ of ( AAT )( N 2 N 2 1000010000) , and
vi eigen _ vector _ of ( AT A)( M M 300300) ?
Answer : (relation of ui and vi ) Importantresult
AA u u (i )
T
i i i i i
A Av v (ii)
T
i i i ui Avi
from (ii ) : i is an eigen value (scalar) of AT A
multiply each size of (ii) by A
AAT Avi Ai vi i Avi , since i is a scalar
AA Av Av (iii)
T
i i i
Compare(iii) with (i)
ui Avi and i i
So, AAT and AAT have the same eigenvalue s
and eigen vectorsare related : ui Avi
face recogntiion 34
Important results
(e.g N=100, M=300)
(AAT)size=10000x10000 have N2=10000 eigen vectors
and eigen values
(ATA) size=300x300 have M=300 eigen vectors and
eigen values
The M eigen values of (ATA) are the same as
the M largest eigen values of (AAT)
Importantresult
i i
ui Avi
face recogntiion 35
Continue
Step6.3 :
Find the best M 300 eigen vectorsui 1, 2..300
and M 300 eigen values μi 1,2 ,...M of AT A M M 300300
- - - The first M eigen values i of AAT and eigen values i
of AT A are the same
μi 1,2 ,...M i 1,2 ,...M
Also
ui Avi
Note : normalize ui , so ui 1
Step7 : Only the K(e.g 5) eigen vectors(eigen avlues)
are useful in most cases
face recogntiion 36
Training faces
face recogntiion 37
Find largest eigen vectors u1,u2,..uk
face recogntiion 38
Eigen faces
K
For each face, find the K
i mean w j u j , wj uT i
ˆ ˆ
(e.g. K=5) face images j
j 1
(called eigen faces)
corresponding to the first K u j eigen _ faces
eigen vectors with largest
eigen values
Each face (subtract by the
mean) can be represented
by a combination of eigen
faces
http://onionesquereality.files.wordpress.com/2009/02/eigenfaces-reconstruction.jpg
face recogntiion 39
Reference
[Bebis] Face Recognition Using Eigenfaces www.cse.unr.edu/~bebis/CS485/Lectures/Eigenfaces.ppt
[Turk 91] Turk and Pentland , “Face recognition using Principal component analysis” journal of
Cognitive Neuroscience 391), pp71-86 1991.
[smith 2002] LI Smith , "A tutorial on Principal Components Analysis”,
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
[Shlens 2005] Jonathon Shlens , “ A tutorial on Prinicpal COmpponent Analysis”,
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
[AI Access] http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_covariance_matrix.htm
face recogntiion 40
Appendix: pca_test1.m
%pca_test1.m, example using data in [smith 2002] LI Smith,%matlab figure(1)
by khwong clf
%"A tutorial on Principal Components Analysis”, hold on
plot(-1,-1) %create the same daigram aas in fig.3.1 of[smith 2002].
%www.cs.otago.ac.nz/cosc453/student_tutorials/principal_componen plot(4,4), plot([-1,4],[0,0],'-'),plot([0,0],[-1,4],'-')
ts.pdf hold on
%---------Step1--get some data------------------ title('PCA demo')
function test %step5: select feature
x=[2.5 0.5 2.2 1.9 3.1 2.3 2 1 1.5 1.1]' %column vector %eigen vectors,length of the eigen vector proportional to its eigen val
y=[2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9]' %column vector plot([0,eigvect(1,1)*eigval_1],[0,eigvect(2,1)*eigval_1],'b-')%1stVec
plot([0,eigvect(1,2)*eigval_2],[0,eigvect(2,2)*eigval_2],'r-')%2ndVec
N=length(x) title('eign vector 2(red) is much longer (bigger eigen value), so keep
%---------Step2--subtract the mean------------------ it')
mean_x=mean(x), mean_y=mean(y)
x_adj=x-mean_x,y_adj=y-mean_y %data adjust for x,y plot(x,y,'bo') %original data
%%full %%%%%%%%%%%%%%%%%%%%%%%%
%---------Step3---cal. covraince matrix---------------- %recovered_data_full=P_full*data_adj+repmat([mean_x;mean_y],1,
data_adj=[x_adj,y_adj] N)
cov_x=cov(data_adj) final_data_full=P_full*data_adj'
%---------Step4---cal. eignvector and eignecalues of cov_x------------- recovered_data_full=P_full'*final_data_full+repmat([mean_x;mean_y]
[eigvect,eigval]=eig(cov_x) ,1,N)
eigval_1=eigval(1,1), eigval_2=eigval(2,2) %recovered_data_full=P_full*data_adj'+repmat([mean_x;mean_y],1,
N)
eigvect_1=eigvect(:,1),eigvect_2=eigvect(:,2), plot(recovered_data_full(1,:),recovered_data_full(2,:),'r+')
%eigvector1_length is 1, so the eigen vector is a unit vector %%approx %%%%%%%%%%%%%%%%
eigvector1_length=sqrt(eigvect_1(1)^2+eigvect_1(2)^2) %recovered_data_full=P_full*data_adj+repmat([mean_x;mean_y],1,
N)
eigvector2_length=sqrt(eigvect_2(1)^2+eigvect_2(2)^2) final_data_approx=P_full*data_adj'
%sorted,big eigen_vect(big eignval first) recovered_data_approx=P_approx'*final_data_approx+repmat([mea
%P_full=[eigvect(1,2),eigvect(2,2);eigvect(1,1),eigvect(2,1)] n_x;mean_y],1,N)
%recovered_data_full=P_full*data_adj'+repmat([mean_x;mean_y],1,
P_full=[eigvect_2';eigvect_1'] %1st eigen vector is small,2nd is large N)
P_approx=[eigvect_2';[0,0]]%keep (2nd) big eig vec only,small gone plot(recovered_data_approx(1,:),recovered_data_approx(2,:),'gs')
face recogntiion 41
A short proof of PCA (principal
component analysis)
This proof is not vigorous, the detailed proof can be found in [Shlens
2005] .
Objective:
We have an input data set X with zero mean and would like to transform
X to Y (Y=PX, where P is the transformation) in a coordinate system
that Y varies more in principal (or major) components than other
components.
E.g. X is in a 2 dimension space (x1,y1), after transforming X into Y
(coordinates y1,y2) , data in Y mainly vary on y1-axis and little on
y2-axis.
That is to say, we want to find P so that covariance of Y
(cov_Y=[1/(n-1)]YYT) is a diagonal matrix , because diagonal matrix
has only elements in its diagonal and shows that the coordinates
of Y has no correlation. n is used for normalization.
face recogntiion 42
Continue
Given X (with zero mean), we want to show that for Y=PX, if each row pi is an
eigenvector of XXT, then the covariance of Y is a diagonal matrix.
Proof:
For the covariance matrix (cov_Y) of Y
cov_Y=[1/(n-1)]YYT , (n is a normalization factor=length of X or Y.), put Y=PX
cov_Y=[1/(n-1)](PX)(PX)T
cov_Y=[1/(n-1)](PX)XTPT
cov_Y=[1/(n-1)]P(XXT)PT------(1)
From theorems 3,4 in appendix of [Shlens 2005]
For (XXT)=PTDP, if in P, each row pi is an eigenvector of XXT, then D is a diagonal matrix. Put
this in (1)
cov_Y=[1/(n-1)]P(PTDP)PT
cov_Y=[1/(n-1)]D, so covariance of Y (cov_Y) is a diagonal matrix. Meaning that
coordinates in Y has no correlation.
We showed that for P if each row of pi is an eigenvector of XXT
Covariance of Y is a diagonal matrix.
(Done)!
face recogntiion 43
Appendix
face recogntiion 44
Covariance i,j and covariance matrix
(reference: http://en.wikipedia.org/wiki/Covariance_matrix)
X1
X :
Xn
ij cov(X i , X j ) E X i i X j j
where
i E ( X i ) expected_value mean
E X 1 1 X 1 1 E X 1 1 X 2 2 ... E X 1 1 X n n
E X X E X X ... E X X
2 2 1 1 2 2 2 2 2 2 n n
: : : :
E X n n X 1 1 E X n n X 2 2 ... E X n n X n n
face recogntiion 45
Appendix
test_cov.m
%test_cov
clear
x=[2.5 0.5 2.2 1.9 3.1 2.3 2 1 1.5 1.1]'
y=[2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9]'
x_adj=x-mean(x),y_adj=y-mean(y)
cov_xy=0,cov_xx=0,cov_yy=0,N=length(x)
for (i=1:N)
cov_xx=x_adj(i)*x_adj(i)+cov_xx
cov_xy=x_adj(i)*y_adj(i)+cov_xy
cov_yy=y_adj(i)*y_adj(i)+cov_yy
end
'self cal'
c_N_minus_1_as_denon= [cov_xx/(N-1) cov_xy/(N-1) ; cov_xy/(N-1) cov_yy/(N-1)]
c_N_as_denon= [cov_xx/(N) cov_xy/(N) ; cov_xy/(N) cov_yy/(N)]
'using matlab cov function'
matlab_cov_xy=cov(x,y)
matlab_cov_xy1=cov(x,y,1)
face recogntiion 46
Appendix: Eigen value and eigen vector
AN*NXX*1=1x1XX*1
A is a square matrix, X is a vector, is a
scalar
A is a transformation that does not change
eth direction of X but only its length
face recogntiion 47