# LECTURE 9 CANONICAL CORRELATION ANALYSIS

Document Sample

```					                                LECTURE 9
CANONICAL CORRELATION ANALYSIS

Introduction

The concept of canonical correlation arises when we want to quantify the associations
between two sets of variables.

For example, suppose that the first set of variables, labeled 'arithmetic' records x1 the
speed of an individual in working problems and x2 the accuracy. The second set of
variables, labeled 'reading' consists of x3 reading speed and x4 comprehension. We can
examine the six pair wise correlations but in addition, we ask if it makes sense to ask if

The answer is given by considering a linear combination of the arithmetic variables, say,
u and a linear combination of the reading variables, say v and using their correlation to
represent the association between the groups. Thus we construct

u œ a 1 x1  a 2 x2       and        v œ b1 x3  b2 x4

and we seek coefficients so that this correlation is maximized.
(NOTE: Every text I know of uses u and v for these variables. SAS PROC CANCORR
uses v and w. That is OK but don't get confused.)

Development

Suppose we have a vector of variables, x that consists of two sets of variables, x1 and x2
where, x1 has length p1 and x2 has length p2 . Assume that p1 Ÿ p2 . To develop the
notation, let
xœ” •          E[x] œ ” 1 • and       D œ Var(x) œ ”
D21 D22 •
x1                .                               D11 D12
x2                .2

The matrix D12 gives the covariances between the variables in set one and set two and in
correlation form it gives the correlations. When p1 and p2 are moderately large,
examining the p1 p2 correlations and drawing conclusions is not an easy task. As an
alternative, we consider linear combinations
u œ aT x1       and    v œ bT x 2
Note that
Var[u] œ aT D11 a       Var[v] œ bT D22 b    Cov[u,v] œ aT D12 b

We want to determine the vectors a and b so that
aT D12 b
Corr[u, v] œ   ÈaT D11 aÈ bT D22 b

is as large as possible. To this end, we determine a and b as the solution to the problem
maximize aT D12 b
subject to : aT D11 a œ 1

bT D22 b œ 1

The variables so determined are called the first pair of canonical variables, u1 and v2 .
The second pair of canonical variables, u2 and v2 are similarly determined by linear
combinations of x1 and x2 with unit variance and maximum correlation among all
variables that are uncorrelated with the first pair. This reminds us of the discussion of
principal components and leads to the determination of eigenvalues and eigenvectors.

The solution leads us to the stationary equations,

D12 b  -D11 a œ 0

D21 a  )D22 b œ 0

Multiplying the first equation by aT and the second by bT shows that

- œ ) œ aT D12 b.

We thus seek - so that

º                -D22 º
 -D11       D12
œ 0.
D21

The following result is useful: I the matrix A is written in partitioned form as

Aœ”
A22 •
A11     A12
A21
then
lAl œ lA11 llA22  A21 A1 A12 l
11

œ lA22 llA11  A12 A1 A21 l
22

Applying the second form of this to our matrix we have

º              -D22 º
 -D11     D12                            "
œ l  -D22 ll  -D11  - D12 (D22 )1 D21 l
D21

œ l  D22 llD12 (D22 )1 D21  -2 D11 l

œ l  D22 llD11 llD111 D12 (D22 )1 D21  -2 Il
Since - is only involved in the last determinant, it follows that we can determine the

values of - by finding the eigenvalues of the matrix D1 D12 D221 D21 and taking the
11
square root. The positive square root of the largest eigenvalue gives the largest
correlation. Note that the matrix has at most p1 non-zero eigenvalues. .

To find a and b we return to the stationary equations. Recalling that - œ ) , multiplying
the second by D1 we see that
22

b œ - D1 D21 a
"
22

Substituting this in the first equation, and rearranging terms we see that a is given by the
solution of the equations

ŠD1 D12 D221 D21  -2 I‹a œ 0.
D11      

That is, the vector a is the eigenvector corresponding to the eigenvalue -2 .
Similar computations show that the vector b is given by solution to the equations

ŠD1 D21 D111 D12  -1 ‹b œ 0
22
          2

Thus the first pair of canonical variates are

u1 œ aT x1
1         and      v 1 œ bT x 2
1

with correlation 31 œ È-1 .
2

To find the second canonical pair, u2 , v2 , we solve the problem

maximize aT D12 b2
2

subject to :    aT D11 a2 œ 1
2

bT D22 b2 œ 1
2

:       aT D11 a1 œ 0
2

bT D22 b1 œ 0
2

2
It follows that the squared correlation between u2 and v2 is -2 the second largest

eigenvalue of the matrix D1 D12 D221 D21 , and the vectors a2 and b2 are obtained by
11
2
solving the above equations using -2 . Although we did not specify this in our
optimization problem, it also follows that
aT D12 b1 œ 0
2                    and bT D21 a1 œ 0
2
We can continue this for all non-zero eigenvalues.

Summary

The canonical variable pairs, ui œ aT x1 and vT x2 as determined have the following
i         i
properties:

Corr(ui , vi ) œ -i         Corr(ui , uj ) œ 0

Corr(vi , vj ) œ 0          Corr(ui , vj ) œ 0 for i Á j

These properties can be summarized by the correlation matrix

Diag((-i )
Ruv œ ”                              •
I p1
Diag(-i )            I p2

Example

Returning to the reading-arithmetic example, suppose the sample correlation matrix is
given by

Ô1         .4      .5   .6 ×
Ö .4                    .4 Ù
RœÖ                          Ù      R11 œ ”
1•
R12 œ ”
.4 •
1       .3                          1    .4                 .5   .6

Õ .6                     1Ø
.5       .3       1   .2                     .4                      .3
.4      .2

R22 œ ”
1•
R21 œ ”
.4 •
1     .2                           .5        .3
.2                                  .6

Note that it is best to apply the results to standardized data and hence we use the
correlation matrix. We may then compute

A œ R111 R12 R221 R21 œ ”
.495 •
                   .452          .289
.146
and
B œ R221 R21 R111 R11 œ ”
.340 •
                          .206   .251
.278

2              2
The eigenvalues of these two matrices are the same, that is, -1 œ .5457 and -2 œ .0009.
The eigenvectors of A and B are the columns of the matrices

VecA œ ”
.842 •
and VecB œ ”
.633 •
.951     -.540                             .595   -.774
.309                                       .804

Recall that we have specified that the variances of the ui and vi must be one. That is,
aT R11 ai œ 1 and
i                      biT D22 bi œ 1

The eigenvectors as determined are normalized to have length one but do not satisfy this
condition. The eigenvectors must be scaled. The scaled eigenvectors are given by

"                                   "
A œ VecAŒ
.636 
B œ VecBŒ
.804 
1.23         0       #
1.19     0       #
and
0                                        0

Thus,
Aœ”
1.055 •
Bœ”
.706 •
.856   -.677                      .545    -.863
and
.278                              .737

It follows that the first canonical pair is defined by

u1 œ .856z1  ..278z2              v1 œ .545z3  .737z4

with correlation
31 œ È.5457 œ .74

The second canonical pair is defined by

u2 œ  ..677z1  1.056z2                 v2 œ  .863x3  .706x4

with correlation
32 œ È.0009 œ .03

We see that the first pair captures most of the relation between arithmetic and reading.
The canonical variate for arithmetic, u1 , places over three times as much weight on speed
as it does on accuracy and the canonical variate for reading, v1 , puts more weight on
comprehension that on speed in proportion 4:3. Note that this does not say, for example,
that speed is three times as important as accuracy in arithmetic. It simply says that if we
are asking for a measure of the relation between arithmetic and reading, these functions
provide the essential component of that relation.

Interpretation of Canonical Variables

In general, the canonical variables are artificial and may have no physical meaning. The
interpretation is often aided by computing the correlation between the original variables
and the canonical variables. To do this, note that the canonical variables are related to
the original variables by the equations,

u œ AT z1                   and   v œ BT z2
where zi denotes the standardized data from which the eigenvectors have been
determined. Recalling that the canonical variables have been standardized to have
variance one, it follows that
Corr(u, z1 ) œ Cov(u, z1 ) œ Cov(AT z1 , z1 ) œ AT R11

Similarly,

Corr(u, z2 ) œ Cov(AT z1 , z2 ) œ AT R12

Corr(v, z1 ) œ BT R21

Corr(v, z2 ) œ BT R22

Example:

Returning to the arithemetic-reading example, we see that

Corr(u1 , z1 ) œ (.856 .278)”
1•
1    .4
œ (.97 .62)
.4
and
Corr(v1, z2 ) œ (.545 .737)”
1•
1        .2
œ (.69 .85)
.2

We see that of the two variables in z1 , u1 if most highly correlated with the first. Of the
two variables in z2 , v1 is most highly correlated with the second.

Similarly, we obtain the correlations

.Corr(u1 , z2 ) œ (.51 .63)          and   Corr(v1, z1 ) œ (.71 .46)

As in our study of principal components, it is more informative to look at the correlations
as opposed to the eigen vectors.

Observations

It can be shown that the first canonical correlation is larger than any of the simple
correlations in R12 . If there is one variable in set one, but several in set two, the squared
canonical correlation is the squared multiple correlation, R2 , in the regression of z1 on z2 .

In general, it can be shown that the squared multiple correlation for the regression of uk
2
on z2 is given 3k . this is also the squared multiple correlation for the regression of vk on
z1 .

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 66 posted: 9/27/2010 language: English pages: 6