# Chapter 1 LINEAR ALGEBRA

Document Sample

```					                                       Chapter 1

LINEAR ALGEBRA

1.1   Matrices

It is a common practice to arrange data into rows and columns. The temperature
readings in Central, Chai Wan, Shatin and Yuen Long for the past 3 hours may be
organized as
29.5◦ 30◦   31◦   31.5◦
30◦   31◦ 31.5◦    32◦ .
30.5◦ 32◦   32◦    33◦
In mathematics, such an array of data is called a matrix.

A matrix is a rectangular array of numbers called scalars, and each scalar in
the array is called an entry of the matrix. Usually, capital letters A, B, C are used
to denote matrices. A matrix A is composed of a ﬁnite number of rows and columns.
If A has m rows and n columns, then A is called an m × n matrix. The entry at the
i-th row and the j -th column    of A is denoted by aij .
                          
a     a12    · · · a1n
 11                       
                          
 a21 a22        · · · a2n 
Notation A =  .  .        .

.  or A = [aij ]1≤i≤m, 1≤j≤n   or A = [aij ].
 .        .
.            . 
.
                          
am1 am2      · · · amn

An 1 × n matrix is also called a row vector with n entries. The set of all such
row vectors is denoted by Rn . Note that when n = 2 or 3, Rn is simply the usual 2-
or 3- dimensional Euclidean space. An n × 1 matrix is called a column vector with n
6                                                                              Linear Algebra

entries. The set of all such column vectors is denoted by Rn . We usually use small
letters v, u, w to denote row or column vectors. Entries of a vector v in Rn or Rn
are simply denoted by v1 , v2 , . . . , vn .

A matrix A with n rows and n columns is called an n × n matrix or a square
matrix of order n. The entries a11 , a22 , . . . , ann are called the diagonal entries of A.
If aij = 0 whenever i 6= j, then A is called a diagonal matrix. We use the symbol In
(or simply I) to denote the n × n diagonal matrix whose diagonal entries are equal
to 1. I is called the identity matrix of order n.

Two m × n matrices A and B are said to be equal if and only if aij = bij for
1 ≤ i ≤ m and 1 ≤ j ≤ n.

If A and B are m×n matrices and if t is any scalar, we deﬁne A + B = [aij +bij ]
and tA = [t · aij ]. A + B is called the sum of A and B, and tA is called a scalar
multiple of A. (−1)A is denoted by −A and A + (−B) is usually written as A − B.
A matrix with all its entries equal to zero is called a zero matrix and is denoted by
the symbol 0.

If A is an m × n matrix and B is an n × k matrix, then we deﬁne their product
Pn
AB to be the m × k matrix C such that cij =     ais bsj for 1 ≤ i ≤ m and 1 ≤ j ≤ k.
s=1

In particular, if A is an m × n matrix and v is a column vector with entries
v1 , v2 , . . . , vn , then the product Av is the column vector with m entries, the i th entry
being ai1 v1 + ai2 v2 + · · · + ain vn .

As a result, if C = AB where A and B are m × n and n × k matrices
respectively, then C is the m × k matrix whose r th column is equal to Abr , where
br is the r th column of the matrix B. Similarly, the ith row of C is equal to ai B,
where ai is the ith row of A.

Finally, it should be noted that BA may not be well deﬁned even if AB is so.
Furthermore, if A and B are square matrices of the same order, then both AB and
BA are well deﬁned, but in general AB 6= BA.
Matrices                                                                                                                                     7
                                                                                                 
4 0 5                                 1 1 1                                    2 −3
Example 1.1 Let A =                                      , B =                               and C =                               .
−1 3 2                                    3 5 7                                    0        1
                                                                       
4+1 0+1 5+1                                       5 1 6
Then A + B =                                                 =                             , while A + C is not deﬁned
−1 + 3 3 + 5 2 + 7                                    2 8 9

since A has 3 columns while C has only 2.
                                                                        
2×1 2×1 2×1                                           2   2        2
Furthermore, 2B =                                                    =                               and
2×3 2×5 2×7                                           6 10 14
                                                                                       
4 0 5                       2     2       2                       2 −2             3
A − 2B =                       −                               =                                      .
−1 3 2                       6 10 14                           −7 −7 −12

                                                              
2       3                                 4       3 6
Example 1.2 If A =                                   and B =                                    , compute AB.
1 −5                                      1 −2 3
                                            
2       3            4               11
Solution If B = [b1                 b2      b3 ], then Ab1 =                                            =                 ,
1 −5                 1               −1
                                                                                                                  
2    3           3                    0                                    2        3           6                21
Ab2 =                              =                  and Ab3 =                                           =                  .
1 −5          −2                     13                                    1 −5                 3               −9
                            
11         0       21
Therefore, AB =                                     .
−1 13 −9

                                                                
2 −5 −4                                             4  6
                                                        
                                                        
Example 1.3 Let A =  1  3  0                                  and B =  7 −1 . Find the 2nd row of
                                                        
6 −8 −7                                             3 −2
the product AB.
8                                                                       Linear Algebra
             
4     6
      
      
Solution The 2nd row of AB is [1 3 0]  7 −1  = [25 3].
      
3 −2

                               
5   1                   2 0
Example 1.4 Let A =                  and B =            . Compute AB − BA.
3 −2                4 3
                     
14    3              10   2
Solution AB =           and BA =         . Therefore,
−2 −6                29 −2
         
4   1
AB − BA =          .
−31 −4

Rules of matrix algebra

Let A, B, C be matrices and t, s be scalars. Suppose that the sizes of the
matrices are such that the indicated operations can be performed. Then the following
rules are valid :

(a) A + B = B + A (Commutative law for addition);

(b) A + (B + C) = (A + B) + C (Associative law for addition);

(c) (AB)C = A(BC) (Associative law for multiplication);

(d) (A + B)C = AC + BC; A(B + C) = AB + AC (Distributive law);

(e) t(A + B) = tA + tB; t(AB) = (tA)B = A(tB);

(f) (ts)A = t(sA); (t + s)A = tA + sA;

(g) A + 0 = A; A − A = 0;

(h) A0 = 0, 0B = 0; AI = A, IB = B.
Systems of linear equations                                                                9

We end this section by stating the following deﬁnitions:

(1) Let A be a square matrix.          We deﬁne A0 = I and Ak = A(Ak−1 ) for every
positive integer k. Thus A1 = A, A2 = AA, A3 = AA2 = AAA.

(2) The transpose of an m × n matrix A is deﬁned to be the n × m matrix B such
that bij = aji for 1 ≤ i ≤ n and 1 ≤ j ≤ m.

For instance, the transpose of a row vector is a column vector, and vice versa.
¡ ¢T
The transpose of A is denoted by the symbol AT . It is obvious that AT = A.

         T
            T              
4 −6            
1 −2              1   
3           4 7 3
Example 1.5                =       and  7


1  =        .
3   0      −2 0                 −6 1 2
3  2

Note The student may easily prove that if A is an m × n matrix and B is an n × k
matrix, then (AB)T = BT AT .

1.2   Systems of linear equations

A system of m linear equations in n unknowns x1 , x2 , . . . , xn is given by

 a11 x1 + a12 x2 + · · · + a1n xn = b1





 a x + a x + ··· + a x = b
21 1     22 2             2n n        2
.       .                .      .      ,                    (1.1)
 .
 .          .
.                .
.      .
.




 a x + a x + ··· + a x = b
m1 1      m2 2              mn n      m

where the aij ’s and the bk ’s are given scalars. Eqn (1.1) can be conveniently written
in matrix form as Ax = b by putting A = [aij ], x = [x1 . . . xn ]T and b = [b1 . . . bm ]T .
The matrix A is commonly known as the coeﬃcient matrix of the linear system.

A column vector v in Rn satisfying Av = b is said to be a solution of the
system of linear equations Ax = b. We say that the system of linear equations is
consistent if it has a solution. The collection of all the solutions is called the solution
10                                                                            Linear Algebra

set of the system. On the other hand, if Ax = b has no solution, then the system
is said to be inconsistent.    It is evident that a system of linear equations is either
inconsistent (no solution) or consistent (has at least one solution).

Remark 1.1        When m = 1 and n > 1, Eqn (1.1) is reduced to a single linear
equation in n unknowns, which may be written in matrix form as
     
x1
     
h               i  x2 
     
a1 a2 . . . an  .  = b. (The coeﬃcient matrix is just a row vector)
 . 
 . 
     
xn

If b 6= 0, the linear equation 0 · x1 + 0 · x2 + · · · + 0 · xn = b has no solution. On the
contrary, the linear equation 0 · x1 + 0 · x2 + · · · + 0 · xn = 0 is satisﬁed by any column
vector with n entries.

 x − 2x = −1
1    2
Example 1.6 The system of linear equations               is consistent.
 −x + 3x = 3
1    2
T
It has one and only one solution given by x = [3 2] . Geometrically, each of the two

linear equations is represented by a straight line in the plane, and it is evident that
the solution of the system of linear equations corresponds to the point of intersection
of these two straight lines.

 x − 2x = −1
1    2
On the other hand, the system                 has no solution (inconsistent).
 −3x + 6x = 13
1      2

The straight lines represented by these linear equations are parallel and therefore never
intersect one another.

1.2.1 Elementary row operations

Our main problem is to determine whether a given system of linear equations Ax = b
is consistent, and ﬁnd all its solutions in case the linear system is consistent.
Systems of linear equations                                                           11

Deﬁnition 1.1 Two systems of linear equations are said to be equivalent if they have
identical solutions, i.e., their solution sets are equal.

A given system of linear equations may be solved by the method of elimination.
The idea is to replace a given system Ax = b by an equivalent system A0 x = b0 in
such a way that the latter is easy to solve.

The elimination process consists of the following types of operations:

(i)    interchange any 2 equations of a system of linear equations;
(ii)   multiply both sides of any equation in a system by a non-zero scalar;
(iii) add a multiple of one equation to another equation within the system.

It is obvious that if a given system Ax = b is reduced to A0 x = b0 by
operations of types (i), (ii) or (iii), then these two systems are equivalent, i.e., they
have identical solutions.

Given a system of linear equations Ax = b, we deﬁne the augmented matrix
for this system to be the m × (n + 1) matrix obtained by adjoining to the right of A
the column vector b.
                                 
a      a12   . . . a1n    b1
 11                               
 .      .           .      .      
Notation         [A|b] =  .
.      .
.           .
.      .
.      .
                                  
am1    am2   . . . amn    bm

It is clear that instead of applying any one of the above three operations to
the system Ax = b, we might as well apply one of the following operations to the
augmented matrix [A |b] :

(i)    interchange any two rows of [A |b];
(ii)   multiply any row of [A |b] by a non-zero scalar;
(iii) add a scalar multiple of one row of [A |b] to another row.

These operations are called elementary row operations on matrices.
12                                                                      Linear Algebra

Deﬁnition 1.2 A matrix is said to be in reduced row-echelon form if it has the fol-
lowing properties:

(1) If a row does not consist entirely of zeros, then the 1st non-zero entry of this
row is equal to 1 (known as the leading 1 of the row);

(2) All the rows that consist entirely of zeros are grouped together at the bottom of
the matrix;

(3) If the leading 1 of the i th row occurs at the p th column and if the leading 1 of
the (i+1) th row occurs at the q th column, then p < q;

(4) Each column that contains a leading 1 has zeros elsewhere.

A matrix that satisﬁes (1), (2) and (3) but not necessarily (4) is said to be in row-
echelon form.

                                        
                 1 0 0  4             0 1 2 0 15
1 −2 0                                      
                               
Example 1.7 The matrices                  ,  0 1 0 −2      and  0 0 0 1 −3 
0   0   1                                   
0 0 1 −9             0 0 0 0  0
are in reduced row-echelon form, while
             
                    0 1 2  6 0
                  1 4 3 7                        
                               
1 −5 3                             0 0 1 −1 18 
              ,  0 1 6 2  and                   are in row-echelon form.
                               
0 1 −2                               0 0 0  0 1 
0 0 1 5                        
0 0 0  0 0

Theorem 1.1 Every matrix A can be reduced to a matrix in reduced row-echelon
form by applying to A a sequence of elementary row operations.

We shall see how the method of elimination works in the following examples.
Systems of linear equations                                                             13


    x1 − 2x2 + x3 = 0


Example 1.8 The system of linear equations                  2x2 − 8x3 = 6 has



 −4x + 5x + 9x = −9
1    2     3

                                                                            
1 −2        1                                            1   −2   1   0
                                                                            
                                                                            
A=       0     2 −8  as its coeﬃcient matrix and B =  0             2    −8 6  as
                                                                            
−4     5  9                                     −4            5    9 −9

the augmented matrix. The elimination process goes as follows:

                                                        

      x1 − 2x2 + x3 = 0        1 −2 1     0

                                              
                  
2x2 − 8x3 = 6    ⇔  0   2 −8 6 

                                              

 −4x + 5x + 9x = −9
1     2     3            −4 5   9 −9
                                            
 x1 − 2x2 + x3 = 0
                              1 −2 1    0

                                            
                
2x2 − 8x3 = 6        ⇔  0 2 −8 6 

                                            

 −3x + 13x = −9
4 × (1) + (3) →                  2      3                0 −3 13 −9
                                            
 x1 − 2x2 + x3 = 0
                              1 −2 1    0

                                            
1                                                              
× (2) →        x2 − 4x3 = 3                ⇔  0 1 −4 3 
2         
                                                    

 −3x + 13x = −9
2      3                        0 −3 13 −9
                                                  
 x1 − 2x2 + x3 = 0
                                      1 −2 1 0

                                                  
              
x2 − 4x3 = 3        ⇔  0 1 −4 3 

                                          


3 × (2) + (3) →                        x3 = 0            0 0   1 0
                                        
 x1 − 2x2
               =0             1 −2 0 0

                                        
−1 × (3) + (1) →                                                     
x2      =3         ⇔  0 1 0 3 
4 × (3) + (2) →           
                                        

            x3 = 0            0 0 1 0
                                      
 x1
              =6              1 0 0 6

                                      
          
2 × (2) + (1) →                  x2      =3          ⇔  0 1 0 3 .

                                      

           x3 = 0             0 0 1 0
14                                                                                   Linear Algebra

Therefore, the given system of linear equations is consistent. It has one and
only one solution x = [6 3 0]T .

Example 1.9 Determine whether the following system is consistent


         x2 − 4x3 = 8


2x − 3x2 + 2x3 = 1 .
 1


 5x − 8x + 7x = 1
1      2     3

Solution The augmented matrix                 of the system can be reduced as follows :
                                                         
0 1 −4                       8           2 −3 2 1
                                                         
                                                         
 2 −3 2                       1  →  0 1 −4 8 
                                                         
5 −8 7                       1           5 −8 7 1
                                                                                   
1   −3      1    1
1                −3    1      1
1       −3     1     1
      2           2                     2           2              2           2
                                                                                
→ 0       1   −4   8 → 0 1                   −4      8   → 0        1     −4   8 .
                                                                                
5   −8      7    1     0 −1
2
2     −3
2
0        0     0     5
2

 2x1 − 3x2 + 2x3 = 1



The last matrix is the augmented matrix of                   0x1 + x2 − 4x3 = 8 . This system



 0x + 0x + 0x = 5
1     2     3   2

5
is inconsistent as the third equation 0x1 +0x2 +0x3 =            2
has no solution (See Remark
1.1). As such, the system of linear equations is inconsistent.


 x1 + x2 − 4x3
                       =5


Example 1.10 The augmented matrix of                       2x1 + 3x2 − 7x3   = 14



       −x2 − x3       = −4
                             
1   1    −4       5
                             
                             
is given by  2 3 −7 14                   . We may use elementary row operations to re-
                             
0 −1 −1 −4
Systems of linear equations                                                       15
                 
1 0 −5 1
                
                
duce the coeﬃcient matrix to reduced row echelon form to obtain  0 1 1 4 .
                
0 0 0 0

 x         −5x3 = 1
1
The corresponding equivalent system is                        , giving x1 = 1 + 5x3
      x +x = 4
2     3

and x2 = 4 − x3 . x1 and x2 are now the leading variables because they correspond

to the column positions of the two leading 1’s in the reduced row-echelon matrix
              
1 0 −5
              
              
 0 1 1 , and x3 is a free variable which may assume any value.
              
0 0 0
h                iT
As such, solutions of the system are given by x = 1 + 5t 4 − t t .

The method used in Examples 1.8, 1.9 and 1.10 may be applied to solve
linear systems in general. We summarize the elimination process as follows:

1.2.2 Gaussian elimination and Gauss-Jordan method

Consider a system of linear equations Ax = b and its augmented matrix [A |b]. It
follows from Theorem 1.1 that we can use a sequence of elementary row operations
to reduce the matrix A to a matrix R in row-echelon form.     In this manner, [A |b]
is reduced, by the same sequence of elementary row operations, to [R |c] for some
column vector c.       The system of linear equations Rx = c, which is equivalent
to Ax = b, may now be solved by a few steps of backward substitutions.         This
method is known as the Gaussian Elimination, which was named after the great
mathematician Karl Friedrich Gauss (1777-1855). It remains to be one of the most
popular methods in handling systems of linear equations.

On the other hand, if [A |b] is reduced to [R |c] where R is in reduced row-
echelon form, then the solution of the new equivalent system Rx = c can be obtained
simply by inspection. This process is called the Gauss-Jordan Method.
16                                                                           Linear Algebra

Both methods are well suited for computer computations because they are
systematic. However, these procedures sometimes introduce fractions, which might
otherwise be avoided by varying the steps in the right way.       Thus, once the basic
procedure has been mastered, students may wish to vary the steps in speciﬁc problems
to avoid fractions.



    x1 + x2 + 2x3 = 9


Example 1.11 Use Gaussian elimination to solve            2x + 4x2 − 3x3 = 1 .
 1


 3x + 6x − 5x = 0
1    2     3

                   
1 1 2 9
                         
                         
Solution The augmented matrix  2 4 −3 1                 may be reduced as follows:
                         
3 6 −5 0
                
1      1   2   9
               
               
 2      4 −3 1 
               
3      6 −5 0
                                                  
1      1   2       9           1 1      2    9
                                                  
                                                  
→  0      2 −7 −17    → 0           1   −7   − 17   
                                       2     2    
3      6 −5   0         0          3 −11 −27
                                            
1      1 2    9         1          1 2   9
                                            
                                         17  .
→  0      1 − 7 − 17  →  0             7
1 −2 − 2 
            2   2                           
0          1
0 −2 −3 2
0          0 1   3

 x1 + x2 + 2x3 = 9



The last matrix corresponds to the linear system            x2   − 7 x3 = − 17 , which

              2         2

              x3       =3
7     17
implies x3 = 3, x2 =     ×3−    = 2 and x1 = −2 − 2 × 3 + 9 = 1. Therefore,
2     2
x = [1 2 3]T is the only solution of the original linear system.
Systems of linear equations                                                      17

Example 1.12 Consider the following system of linear equations in 6 unknowns


             x1 + 3x2 − 2x3 + 2x5      =0




 2x + 6x − 5x − 2x + 4x − 3x           = −1
1    2     3     4     5     6
.

                5x3 + 10x4 + 15x6      =5




     2x1 + 6x2 + 8x4 + 4x5 + 18x6      =6

                            
1 3          0 4 2 0       0
                              
                              
 0 0          1 2 0 0       0 
The augmented matrix can be row reduced to 

.
 0 0                        1 

0 0 0 1
                            3 

0 0          0 0 0 0       0

 x1 +3x2
                +4x4 +2x5            = 0


Therefore, we have                 x3 +2x4                  = 0 .



                                         1
x6 =      3

Taking x2 , x4 and x5 as free variables, we obtain

1
x1 = −3x2 − 4x4 − 2x5 , x3 = −2x4 and x6 = .
3

The solution set thus consists of all vectors of the form

1T
x = [−3α − 4β − 2γ   α   − 2β   β     γ     ] ,
3

where α, β and γ are arbitrary scalars (known as parameters).

The following theorem gives a useful necessary and suﬃcient condition for a
system of linear equations to be consistent.
18                                                                          Linear Algebra

Theorem 1.2 Suppose the augmented matrix [A|b] of the linear system Ax = b is
reduced to [R|c] by elementary row operations, where R is an m×n matrix in reduced
row-echelon form (or in row-echelon form) and c = [c1        c2   ...   cm ]T . If R has
r non-zero rows, then the system Ax = b is consistent if and only if cj = 0 for
r < j ≤ m.

Proof Since R has (m−r) zero rows, the last (m−r) equations of the linear system
Rx = c are of the form 0 · x1 + 0 · x2 + . . . + 0 · xn = cj for r < j ≤ m. Therefore,
consistency of Rx = c forces cj = 0 for r < j ≤ m. (see Remark 1.1)

Conversely, when cj = 0 for r < j ≤ m, the last (m − r) equations of Rx = c
are satisﬁed by any column vector with n entries and are therefore redundant. Thus
Rx = c is left with r linear equations and the r unknowns corresponding to the leading
1’s of the non-zero rows of the matrix R can be regarded as basic variables, while the
remaining (n − r) unknowns become free variables. Therefore, Rx = c is consistent.

                                 
1   2   3 −1                α
                             
                             
Example 1.13 Consider the linear system  8 10 12             1  x =  β .
                             
7 8 9               2         0

                                                             
1 2 3 −1 α        1 2 3 −1                             α
                                                             
                                                      8α−β   
Gaussian elimination gives  8 10 12 1 β  →  0 1 2 −3                                    .
                         2                              6    
7 8 9 2   0       0 0 0 0                              α−β

                 
1 2 3 −1
                    
                    
The matrix R =  0 1 2 − 3           is in row-echelon form with 2 non-zero rows.
         2          
0 0 0 0

By Theorem 1.2, the system is consistent if and only if α − β = 0.
Systems of homogeneous equations                                                       19

1.3   Systems of homogeneous equations

A linear equation with zero “right hand side” is called a homogeneous linear equation.
A system of m homogeneous linear equations in n unknowns may be written as
Ax = 0, where A is a given m × n matrix and 0 is the m × 1 zero column vector. A
homogeneous system has the n × 1 zero vector 0 as an obvious solution (called the
trivial solution). Any other solutions are known as non-trivial solutions.

It is also clear that if v and u are solutions of Ax = 0, then tv + su is also a
solution for any scalars t, s. As such, the solution set of Ax = 0 either has only the
trivial solution, or has inﬁnitely many solutions.

Remark 1.2        To solve Ax = 0, we reduce its augmented matrix [A| 0] to [R| 0],
where R is a matrix in reduced row-echelon form. Suppose that R has r non-zero
rows, and that for 1 ≤ j ≤ r, the leading 1 of the j th row occurs at the kj th column.
It then follows from the structure of the reduced row-echelon matrix R that xk1 , xk2 ,
. . ., xkr may be taken as basic variables of the system Rx = 0, which consists of only
P                   P
r linear equations, the j th equation being of the form xkj + (· · · ) = 0. Here, (· · · )
denotes sums that involve only the remaining (n − r) variables (free variables). It is
now clear that the system Rx = 0 has non-trivial solutions whenever n > r.

A simple consequence of Remark 1.2 is the following useful theorem.

Theorem 1.3 If A is an m × n matrix where m < n, then the homogeneous system
Ax = 0 always has non-trivial solutions.

In other words, Theorem 1.3 says that if the number of equations is less than
the number of unknowns, then the homogeneous system has non-trivial solutions. For
instance, a system of 3 linear equations in 4 unknowns admits non-trivial solutions.
20                                                                           Linear Algebra
                             
1 2 3        x1           0
                          
                          
Example 1.14 Solve the homogeneous system  2 3 4               x2  =  0  .
                          
3 4 5                 x3       0

Solution The augmented matrix of the linear system may be simpliﬁed by elemen-
tary row operations as follows:
                
1    2 3 0
            
            
 2    3 4 0 
            
3    4 5 0
                     
1    2     3    0
              
              
→  0    −1 −2 0 
              
0    −2 −4 0
                        
1    2 3 0      1 0 −1 0
                        
                        
→  0    1 2 0 → 0 1 2   0 .
                        
0    0 0 0      0 0 0  0


 x      −x3 = 0
1
The last matrix is the augmented matrix of the system                   .
    x2 + 2x3 = 0

Taking x3 = t as the free variable, we conclude that solutions of the linear system
are given by x = [t    − 2t t]T where t is an arbitrary scalar.         Geometrically, the
solution set is represented by a straight line in R3 passing through the origin.

1.4   Nonsingular matrices

For any non-zero real number a, there is a real number b such that ab = ba = 1. The
number b is known as the multiplicative inverse of a. The matrix analogue of this
will now be discussed. We shall begin with a few deﬁnitions and examples.
Nonsingular matrices                                                                           21

Deﬁnition 1.3 A square matrix A is said to be nonsingular (or invertible) if there
is a square matrix B such that AB = I and BA = I.              The matrix B is called an
inverse of A.

                                                            
3 2          5 −2                      5 −2              3 2
Example 1.15 Since                             = I and                              = I,
7 5         −7    3                 −7       3           7 5
                                                 
3 2                                5 −2
we conclude that the matrix            is nonsingular, with                   as an inverse.
7 5                              −7        3

              
0 a1 a2
         
         
Example 1.16 Consider the 3 × 3 matrix A =  0 a3 a4 .
         
0 a5 a6
If B is any 3 × 3 matrix, then the 1st column of the product BA consists entirely of
zeros. As such, BA 6= I and A thus has no inverse.

We include some important facts about nonsingular matrices the following
three propositions.

Proposition 1.1 If A, B and C are n × n matrices such that AC = I and BA = I,
then B = C.

Proof    B = BI = B(AC) = (BA)C = IC = C.

Note     If AC = I, then C is called a right inverse of A. If BA = I, then B is called
a left inverse of A. We have just demonstrated that if A has a right inverse and a
left inverse, then they are equal. In particular, a matrix cannot have two inverses.
We shall use the symbol A−1 to denote the inverse of A.
22                                                                        Linear Algebra

Proposition 1.2 If A and B are nonsingular matrices of the same order, then AB
is nonsingular and (AB)−1 = B−1 A−1 .

Proof    By the associativity of matrix multiplication, we have

AB(B−1 A−1 ) = A(B(B−1 A−1 )) = A((BB−1 )A−1 ) = A(IA−1 ) = AA−1 = I and
£          ¤
(B−1 A−1 )AB = B−1 (A−1 (AB)) = B−1 (A−1 A)B = B−1 (IB) = B−1 B = I.

Therefore, AB is nonsingular and (AB)−1 = B−1 A−1 .

Note       If A1 , A2 , . . ., Ak are nonsingular matrices, then repeated applications
of Proposition 1.2 show that their product A1 A2 · · · Ak          is nonsingular, with
(A1 A2 · · · Ak )−1 = A−1 · · · A−1 A−1 .
k         2   1      In particular, if A is nonsingular, then Ak
is nonsingular k and (Ak )−1 = (A−1 )k .

Proposition 1.3 If A is nonsingular and t is a non-zero scalar, then

(i) A−1 is nonsingular and (A−1 )−1 = A;

(ii) tA is nonsingular and (tA)−1 = ( 1 )A−1 .
t

The following theorem gives interesting information on solutions of a system
of linear equations whose coeﬃcient matrix is nonsingular.

Theorem 1.4        Let A be an n × n nonsingular matrix. Then

(i) the homogeneous system Ax = 0 has only the trivial solution;

(ii) the system Ax = b has a unique solution for any b in Rn .

Proof       If v in Rn is a solution of Ax = 0, then Av = 0. Therefore, we have

v = Iv = (A−1 A)v = A−1 (Av) = A−1 0 = 0.
On the other hand, the identity A(A−1 b) = (AA−1 )b = Ib = b implies that A−1 b is

a solution of the linear system Ax = b. Finally, if w is also a solution of Ax = b,
Nonsingular matrices                                                                23

then Aw = b. We therefore conclude that

w = Iw = (A−1 A)w = A−1 (Aw) = A−1 b, as required.

The simplest nonsingular matrices are the so-called elementary matrices.

Deﬁnition 1.4 An n × n matrix is called an elementary matrix if it can be obtained
from the n × n identity matrix In by performing a single elementary row operation.

                                                 
1  0 0               1 0 0             0 1 0
                                          
                                          
Example 1.17           E =  0 −3 0   , F =     0 1 0  and G =  1 0 0 
                                          
0  0 1              −2 0 1             0 0 1
are elementary because E is obtained from I by multiplying its second row by −3, F is
obtained from I by adding (−2) times its ﬁrst row to the third row and G is obtained
from I by interchanging its ﬁrst and second rows.

We now state some useful and interesting facts about elementary matrices in
the following

Proposition 1.4 (i) If A is an m × n matrix and E is an m × m elementary ma-
trix that results from performing a certain elementary row operation on Im , then the
product EA is the matrix that results when this same elementary row operation is
performed on A.

(ii) If an elementary row operation is applied to an identity matrix I to produce an
elementary matrix E, then there exists another elementary row operation which, when
applied to E, produces I.

(iii) Every elementary matrix is nonsingular, and the inverse of an elementary ma-
trix is also an elementary matrix.
24                                                                      Linear Algebra
                 
a11 a12 a13
                       
                       
Example 1.18 Let A =  a21 a22 a23            and E, F and G be as given in Example
                       
a31 a32 a33
1.17. Straightforward calculations indicate that
                          
a11   a12   a13
                            
                            
(a) EA =  −3a21 −3a22 −3a23           is the matrix obtained from A by multiplying
                            
a31   a32   a33
the second row of A by 3;
                                      
a11        a12        a13
                                      
                                      
(b) FA =        a21        a22        a23         is the matrix obtained from A by
                                      
a31 − 2a11 a32 − 2a12 a33 − 2a13
adding ( −2) times   the ﬁrst row of A to the third row;
                 
a     a     a23
 21 22           
                 
(c) GA =  a11 a12        a13  is obtained from A by interchanging its ﬁrst and
                 
a31 a32     a33
second rows.

Example 1.19 With E, F and G as given in Example 1.17, we observe that
              
1 0 0
         
         
(a) E−1                       1
is the matrix  0 − 3 0 , which is the elementary matrix obtained from
         
0 0 1
I by multiplying its second   row by − 1 ;
3
            
1 0     0
            
            
(b) F−1 is the matrix  0 1        0 , which is the elementary matrix obtained from
            
2 0     1
I by adding 2 times the ﬁrst row to the third row;
Nonsingular matrices                                                                    25
           
0 1 0
                   
                   
(c) G−1 is the matrix  1 0 0             , which is the elementary matrix obtained from
                   
0 0 1
I by interchanging the ﬁrst and second row.

The following theorem gives necessary and suﬃcient conditions for a square
matrix to be nonsingular.

Theorem 1.5 If A is an n × n matrix, then the following statements are equivalent:

(i) A is nonsingular;

(ii) the homogeneous system Ax = 0 has only the trivial solution;

(iii) A can be reduced to I by a sequence of elementary row operations.

Proof “(i) ⇒ (ii)”: Follows from Theorem 1.4.

“(ii) ⇒ (iii)”: Suppose R is the reduced row-echelon form of A.            If R 6= I, then
the number of non-zero rows of R is less than n. Remark 1.2 then shows that
Ax = 0 has non-trivial solutions. Therefore, A can be reduced to I by elementary
row operations.

“(iii) ⇒ (i)”: Suppose that there are elementary row operations σ1 , σ 2 , . . . , σ k such
that
1σ   2    3  σ   σ       σ k−1
k       σ
A → A1 → A2 → · · · → Ak−1 → Ak = I.
σj
If Ej is the elementary matrix obtained by applying σ j to I, i.e., I → Ej , then by
Proposition 1.4, one has

1σ     2     σ   3       σ
k       σ
A → E1 A → E2 E1 A → · · · → Ek · · · E2 E1 A = I.

We thus conclude that (Ek · · · E2 E1 )A = I.
Since E1 , E2 , . . . , Ek are elementary matrices, they are also nonsingular. By Propo-

sition 1.2, the product Ek · · · E2 E1 is a nonsingular matrix. Proposition 1.1 now
26                                                                                         Linear Algebra

implies that A = (Ek · · · E1 )−1 is nonsingular, and that A−1 = Ek · · · E2 E1 .

This completes the proof of the equivalence of (i), (ii) and (iii).

Remark 1.3 We shall now describe a practical method to ﬁnd A−1 . In fact, it is
evident from the proof of “(iii)⇒(i)” in Theorem 1.5 that if a sequence of elemen-
tary row operations reduces A to I, and then by performing this same sequence of
elementary operations on I, we obtain A−1 . Symbolically, we have

σ           σ                   σ       σ
[A |I] → [A1 |E1 I] → [A2 |E2 E1 I] → · · · → [I |Ek . . . E2 E1 I] = [I |A−1 ].
1            2               3       k

                  
1 −2 2
        
        
Example 1.20 Find the inverse of the matrix A =  2 −3 6 .
        
1  1 7
Solution
                           
1       −2 2 1 0 0
                    
                    
 2       −3 6 0 1 0 
                    
1        1 7 0 0 1
                                                                              
1       −2 2        1     1
0 0                     −2 2      1        0       0
                                                                 
                                                                 
→  0        1 2 −2 1 0  →  0                     1       2 −2 1 0 
                                                                 
0        3 5 −1 0 1       0                     0       1 −5 3 −1
                                                                                   
1       −2 0 11 −6 2                            1 0 0         27       −16      6
                                                                                 
                                                                                 
→  0        1 0 8 −5 2  →                        0 1 0         8        −5       2 .
                                                                                 
0        0 1 −5 3 −1                            0 0 1 −5                3      −1
                     
27 −16 6
                          
                          
We therefore conclude that A is nonsingular, and that A−1              =  8 −5 2                   .
                          
−5 3 −1
Nonsingular matrices                                                               27

By deﬁnition, to check that B is the inverse of A, we have to verify that
AB = I and BA = I. The following theorem tells us that this is in fact not necessary.

Theorem 1.6 Let A and B be n×n matrices. Then BA = I if and only if AB = I.

Proof Suppose that BA = I. If v is a solution of the homogeneous system Ax = 0,
then Av = 0 and therefore

v = Iv = (BA)v = B(Av) = B0 = 0.

Hence the homogeneous system Ax = 0 has only the trivial solution. By Theorem
1.5, A is nonsingular.
Proposition 1.1 now implies that B = A−1 . Hence AB = AA−1 = I.

The following theorem may be considered as an extension of Theorem 1.5.

Theorem 1.7 For any n × n matrix A, the following statements are equivalent.
(i) A is nonsingular;
(ii) The system of homogeneous equations Ax = 0 has only the trivial solution;
(iii) A can be reduced to I by a sequence of elementary row operations;
(iv) The non-homogeneous system Ax = b is consistent for every vector b in Rn .

Proof The equivalence of (i), (ii) and (iii) is a consequence of Theorem 1.5. “(i)
⇒(iv)” follows from Theorem 1.4. We only need to prove “(iv) ⇒(i)”.
For 1 ≤ k ≤ n, let ek be the k-th column of the identity matrix In , i.e., ek is the
column vector whose k-th entry is equal to 1, while all other entries are zero.
By the hypothesis of (iv), the system of linear equation Ax = ek has a solution (say
vk ) for every k.
Denote by V the n × n matrix whose k-th column is equal to vk for k = 1, 2, . . . , n.
It is then clear from the deﬁnition of matrix multiplication that AV = I. Theorem
1.6 now implies that VA = I. Hence A is nonsingular.
28                                                                          Linear Algebra

1.5    Determinant

1.5.1 Deﬁnition of the determinant

For any square matrix A, we deﬁne det(A), called the determinant of A, as follows:

If A is a 1 × 1 matrix, i.e., A = [a11 ], we deﬁne det(A) = a11 .      Thus the
determinant of a 1 × 1 matrix is the (only) entry in the matrix.

Assume that n > 1 and that the determinant is deﬁned for all square matrices
of order < n. (This is the induction hypothesis.) Let A be an n × n matrix, i.e.,
A = [aik ]1≤i, k≤n .

For any entry aik of A, where 1 ≤ i, k ≤ n, we deﬁne the following terms:

(1) Mik is the determinant of the (n − 1) × (n − 1) matrix obtained from A by
deleting its i th row and k th column.

(2) Cik = (−1)i+k · Mik .

Mik and Cik are respectively called the minor and the cofactor of the entry aik
of A. They are well deﬁned because of the induction hypothesis.

P
n
Deﬁnition 1.5          We now deﬁne det(A) =            a1k C1k .
k=1

In other words, det(A) is obtained by taking “cofactor expansion” along the
ﬁrst row of the matrix A.

Notation We also use the symbol |A| or det A to denote the determinant of A.
             
a11 a12
Example 1.21 If A =                     , then C11 = a22 , C12 = −a21 and thus
a21 a22
det(A) = a11 C11 + a12 C12 = a11 a22 − a12 a21 . On the other hand, if
               
a11 a12 a13                    ¯           ¯
                               ¯           ¯
                               ¯ a22 a23 ¯
A =  a21 a22 a23 , then C11 = ¯     ¯
¯ = a22 a33 − a23 a32 ,
¯
                               ¯ a32 a33 ¯
a31 a32 a33
Determinant                                                                                    29
¯            ¯                                   ¯              ¯
¯            ¯                                   ¯              ¯
¯ a21 a23    ¯                                   ¯ a   a        ¯
C12   = −¯            ¯ = −(a21 a33 − a23 a31 ) and C13 = ¯ 21 22        ¯ = a21 a32 − a22 a31 .
¯            ¯                                   ¯              ¯
¯ a31 a33    ¯                                   ¯ a31 a32      ¯

∴ det(A) = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 ).

                 
2    1  3                ¯         ¯
                         ¯         ¯
                         ¯ −1    1 ¯
Example 1.22 If A =  1 −1      1 , then C11 = ¯¯
¯ = −2,
¯
                         ¯ 4 −2 ¯
1    4 −2
¯      ¯            ¯       ¯
¯      ¯            ¯       ¯
¯ 1  1 ¯            ¯ 1 −1 ¯
¯
C12 = − ¯      ¯ = 3, C13 = ¯       ¯ = 5. ∴ |A| = 2 × (−2) + 1 × 3 + 3 × 5 = 14.
¯            ¯       ¯
¯ 1 −2 ¯            ¯ 1   4 ¯

1.5.2 Further properties of the determinant

Some interesting properties of the determinant will now be discussed. We shall ﬁrst
state without proof the following fundamental result.

Theorem 1.8         Let A be any n × n matrix where n ≥ 2. If 1 ≤ i 6= j ≤ n, then

X
n                 X
n
aip Cip =         ajp Cjp .                         (1.2)
p=1               p=1

In other words, the determinant of a matrix can be evaluated by taking cofactor
expansion along any row.

As a simply illustration of Theorem 1.8, we calculate the determinant of the
matrix in Example 1.22 by expanding along the third row and obtain
¯      ¯            ¯     ¯          ¯      ¯
¯      ¯            ¯     ¯          ¯      ¯
¯ 1 3 ¯             ¯ 2 3 ¯          ¯ 2 1 ¯
|A| = 1 · ¯      ¯ + 4 · (−1) ¯     ¯ + (−2) · ¯      ¯ = 14.
¯      ¯            ¯     ¯          ¯      ¯
¯ −1 1 ¯            ¯ 1 1 ¯          ¯ 1 −1 ¯
30                                                                               Linear Algebra

The following Proposition tells us that the statement in Theorem 1.8 remains
valid when “row” is replaced by “column”.

Proposition 1.5     det(A) can be obtained by taking cofactor expansion along any
P
column. In other words, we have det(A) = n aik Cik for any 1 ≤ k ≤ n.
i=1

Since the determinant of a square matrix is deﬁned by induction, results con-
cerning determinant are normally proved by induction.            The following corollary is a
typical example.

Corollary 1        det(A) = det(AT ) for any square matrix A.

Proof The formula is certainly correct if A is an 1 × 1 matrix.

Let n > 1 and assume that the formula holds for all matrices of order < n. For

any n × n matrix A, let B = AT . It follows from the deﬁnition of transpose that
aij = bji , and the induction hypothesis implies that the cofactor of aij in A equals to
the cofactor of bji in B.

Hence the cofactor expansion of det (A) along the ﬁrst row equals the cofactor expan-
sion of det (B) along the ﬁrst column.
Therefore, det(A) = det(B) = det(AT ). This completes the proof.

                          
d11    0   ···    0
                          
                          
   0     d22 · · ·   0    
Corollary 2        If D = 
   .      .          .
, then det(D) = d11 × d22 × · · · × dnn .

   .
.      .
.          .
.    
                          
0      0   · · · dnn
In particular, det(I) = 1.

The following proposition describes, among other things, how det(A) varies
when elementary row operations are applied to the matrix A.
Determinant                                                                           31

Proposition 1.6        Let A be a square matrix of order n.
(i) If A0 is the matrix obtained from A by interchanging any two rows of A, then
det (A0 ) = (−1) · det (A).
(ii)    If two rows of A are identical, then det (A) = 0.
(iii)   If B is the matrix obtained by multiplying the i th row of A by a scalar t while
other rows remain unchanged, then det(B) = t · det(A).
(iv)    Let b1 , b2 , . . ., bn be scalars. Then
                                  
a11       a12    ··· a1n
                                  
       .
.          .
.         .
.     
       .          .         .     
                                  
                                  
det  ai1 + b1 ai2 + b2 · · · ain + bn 
                                  
       .
.          .
.         .
.     
       .          .         .     
                                  
an1      an2     ··· ann
                                                     
a      a  · · · a1n         a      a12 · · ·   a1n
 11 12                      11                      
 .         .        .       .        .           . 
 .  .      .
.        . 
.        . .      .
.           . 
.
                                                    
                                                    
= det  ai1 ai2 · · · ain  + det  b1            b2    · · · bn  .
                                                    
 .  .      .
.        . 
.       .        .           . 
 .         .        .        . .      .
.           . 
.
                                                    
an1 an2 · · · ann           an1    an2   · · · ann

(v)     If C is obtained from A by an elementary row operation of adding a scalar
multiple of the i-th row to its j-th row, where i 6= j, i.e.,
                                                      
a11      ···      a12     ···     a1n
                                                      
      .
.                 .
.                .
.            
      .                 .                .            
                                                      
                                                      
     ai1      ···      ai2     ···      ain           
                                                      
      .
.                 .
.                .
.            
C=          .                 .                .            ,
                                                      
                                                      
 aj1 + tai1        aj2 + tai2 · · · ajn + tain        
                                                      
      .
.                 .
.                .
.            
      .                 .                .            
                                                      
an1       ···     an2      ···     ann
then det (C) = det (A).
32                                                                                   Linear Algebra

Remark 1.4          Statements in Proposition 1.6 are true when “row” is replaced by
“column”. For instance, a matrix with two of its columns identical has determinant
equal to zero.

An immediate consequence of Proposition 1.6 is the following interesting
result which will be useful when we prove our next theorem concerning product of
two determinants.

Lemma 1.1 If E is an n × n elementary matrix and A is any n × n matrix, then

(i) det(E) 6= 0;
(ii) det(EA) = det(E) × det(A).

Lemma 1.2 If A and B are n × n matrices such that A is singular, then AB is also
singular.

Proof       If (AB)−1 exists, then A[B(AB)−1 ] = (AB)(AB)−1 = I, implying that A
has an inverse (see Theorem 1.6). This contradicts the fact that A is singular.

Proposition 1.7 If A is a singular matrix, then det (A) = 0.

Proof       There are elementary row operations σ1 , σ 2 , . . . , σ k which reduce A to a ma-
trix R in reduced row-echelon form, and R has at least one zero row (see the proof of
Theorem 1.5).

Let E1 , E2 , . . . , Ek be elementary matrices associated with σ 1 , σ 2 , . . . , σ k respectively.
It follows again from the proof of Theorem 1.5 that R = (Ek · · · E2 E1 )A. Since R
has at least one zero row, Theorem 1.8 =⇒ det (R) = 0.
Therefore, we have det [(Ek · · · E2 E1 )A] = 0. Repeated applications of Lemma 1.1
shows that 0 = det(Ek ) × · · · × det(E2 ) × det(E1 ) × det(A).
Since elementary matrices have non-zero determinant (Lemma 1.1 again), we con-
clude that det(A) = 0.
Determinant                                                                                     33

Putting these results together, we obtain the following important theorem.

Theorem 1.9 If A and B are n × n matrices, then det(AB) = det(A) × det(B).

Proof If A is singular, then AB is also singular (Lemma 1.2).
Proposition 1.7 now implies that det (A) and det(AB) are both equal to zero.

If A is nonsingular, then there are elementary row operations σ 1 , σ 2 , . . . , σ k which
reduce A to the identity matrix I (Theorem 1.5). If E1 , E2 , . . . , Ek are elementary
matrices associated with σ 1 , σ 2 , . . . , σ k respectively, then I = (Ek · · · E2 E1 )A.
Denoting E−1 by Fj , we now conclude from Proposition 1.2 that A = (F1 F2 · · · Fk ).
j

Repeated applications of Lemma 1.1 show that

det(AB) = det[(F1 F2 · · · Fk )B] = det[F1 (F2 · · · Fk B)]
= det(F1 ) × det(F2 · · · Fk B)
= det(F1 ) × det(F2 ) × · · · × det(Fk ) × det(B)
= det(F1 F2 · · · Fk ) × det(B) = det(A)× det(B).

The following proposition gives an useful necessary and suﬃcient condition for
a square matrix to be nonsingular.

Proposition 1.8 A square matrix A is nonsingular if and only if det (A) 6= 0.

Proof     If A is singular, then det (A) = 0 by Proposition 1.7.
Conversely, if A is nonsingular, then A−1 exists and AA−1 = I.
By Theorem 1.9, we obtain det(A) × det(A−1 ) = det(AA−1 ) = det(I) = 1, which

implies that det(A) 6= 0.

As a result of Proposition 1.8, we have
1
det(A−1 ) =                                             (1.3)
det(A)
for any nonsingular matrix A.
34                                                                            Linear Algebra

1.5.3 Cramer’s rule

Let A be an n×n matrix and Cij be the cofactor of the entry aij . With this notation,
the adjoint of A may now be deﬁned as
                                 T
C    C12 · · · C1n
 11                              
    .
.    .
.         .
.             
adjA =     .    .         .              .
                                 
Cn1 Cn2 · · · Cnn
The concept of adjoint is important because of the following result.

Proposition 1.9 A adjA = (det A)I.
Proof    Let B = A adjA.        The deﬁnition of matrix multiplication shows that, for
any integers 1 ≤ i, j ≤ n,

bij = ai1 Cj1 + ai2 Cj2 + · · · + ain Cjn .

In particular, it follows from Theorem 1.8 that

bjj = aj1 Cj1 + aj2 Cj2 + · · · + ajn Cjn = det(A).

When i 6= j, ai1 Cj1 + ai2 Cj2 + . . . + ain Cjn represents an “expansion” along the i th
row but multiplying each of the entries ai1 , ai2 , . . ., ain by the corresponding cofactors
of j th row.
Thus bij = det (C), where C is the matrix obtained from A by replacing its j th row
by the i th row, while all other rows remain unchanged.
Therefore, bij = 0 by Proposition 1.6.

Proposition 1.10 If A is a nonsingular matrix of order n and b is a column vector
with n entries, then the linear system Ax = b has a unique solution given by
1
det (A)
Proof This is just a consequence of Proposition 1.8, Theorem 1.4 and Propo-
sition 1.9.
Determinant                                                                                     35

Formula (1.4) implies that the j th entry of the solution vector is given by
1                                          det (Aj )
xj =           (b1 C1j + b2 C2j + · · · + bn Cnj ) =           ,              (1.5)
det (A)                                       det (A)
where Aj is the matrix obtained from A by replacing the j th column of A by the
column vector b, while all other columns of A remain unchanged.

Formula (1.5) is commonly known as the Cramer’s rule.

Remark 1.5       An immediate consequence of Proposition 1.9 is the formula
1
det(A)
which is valid for any nonsingular matrix A. Formula (1.6) enables us to calculate
the inverse of a square matrix, and is commonly known as the Adjoint Method.

 3x − 2x = 6
1    2
Example 1.23 Use Cramer’s rule to solve the linear system                    .
 −5x + 4x = 8
1         2
                                                           
3 −2                    6 −2                         3 6
Solution With A =                      , A1 =               and A2 =                ,
−5     4                 8   4                    −5 8
Cramer’s rule yields
det A1   40            det A2   54
x1 =          =    = 20, x2 =        =    = 27.
det A    2             det A    2


 2x1 − 3x2 + 4x3 = 10



Example 1.24 Solve the linear system               3x + 7x2 − 3x3 = −10 .
 1


 4x − 5x + 2x = 10
1     2    3
                                         
2 −3   4                         10 −3    4
                                           
                                           
Solution      With A =  3  7 −3 ,             A1 =  −10     7 −3 ,
                                           
4 −5   2                         10 −5    2
                                                 
2       10   4             2             −3     10
                                                  
                                                  
A2 =  3      −10 −3  and A3 =  3               7 −10 , we have
                                                  
4       10   2             4             −5     10
36                                                                                                  Linear Algebra

det(A) = −120, det(A1 ) = −60, det(A2 ) = 120 and det(A3 ) = −180.

Therefore, Cramer’s rule yields

det A1   −60   1       det A2    120            det A3   −180  3
x1 =          =      = , x2 =        =      = −1, x3 =        =      = .
det A    −120  2       det A    −120            det A    −120  2

                           
2           1       3
             
             
Example 1.25 Find the inverse of the matrix A =  1 −1        1  by the Adjoint
             
1    4 −2

 2x1 + x2 + 3x3 = 1



Method. Hence solve the system of linear equations     x1 − x2 + x3 = 0 .



 x + 4x − 2x = 1
1           2           3

Solution A simple calculation shows that the cofactors of A are given by

C11 = −2; C12 = 3;             C13 = 5;
C21 = 14;          C22 = −7; C23 = −7;
C31 = 4;           C32 = 1;    C33 = −3.
                       
−2       14     4
            
            
Therefore, adjA =       3 −71  and det A = 2 × C11 + 1 × C12 + 3 × C13 = 14.
            
5 −7 −3
                                    
−2 14   4       −1/7       1    2/7
1 
 
 


Thus, A−1   =     3 −7    1  =  3/14 −1/2      1/14  and
14                                     
5 −7 −3         5/14 −1/2 −3/14
                                                     
1               −1         1      2
1             1
                7                7                7   
                                                      
x = A−1  0  =           3
−1          1    0  =        2   .
                14    2         14                7   
5
1                14
3
− 1 − 14
2
1             1
7
Vector space and linear dependence                                                         37

1.6   Vector space and linear dependence

In Section 1 of this chapter, we used the symbol Rn to denote the set of all n × 1
matrices (column vectors with n entries) and the symbol Rn to denote the set of all
1 × n matrices (row vectors with n entries).           For any v and w in Rn and for any
scalar t, we may deﬁne v + w and tv as in Section 1, called the sum of v and w
and the multiple of v by the scalar t respectively. Speciﬁcally, we have
                                                      
v          w          v + w1                  v         tv
 1   1   1                               1   1 
                                                      
 v2   w2   v2 + w2                       v2   tv2 
                              and t                   
 . + . =                .                .  =  . .
 .   .  
.          .             .
.                .   . 
.         .
                                                      
vn         wn         vn + wn                vn         tvn

Addition and scalar multiplication of row vectors can be similarly deﬁned.

Deﬁnition 1.6 A vector space is a set V of column vectors (or row vectors) satisfying
the following properties:

(1) V contains the zero vector 0 ;
(2) if V contains v and w, then V also contains the sum v + w ;
(3) if V contains v and t is any scalar, then V also contains tv.

Elements of a vector space are simply called vectors. By deﬁnition, a vector
space contains the zero vector, and is closed under addition and scalar multiplication.

Example 1.26 (a) Both Rn and Rn are vector spaces in their own right. In par-
ticular, when n = 2 or 3, they are just the usual Cartesian plane or Cartesian space
respectively.
h           i
(b) The set of all v =          v1 v2       satisfying the equation 2v1 + 3v2 = 0 is a vector
space in R2 . More generally, any straight line in R2 that passes through the origin
is a vector space.
38                                                                                 Linear Algebra

(c) The upper half plane in R2 is not a vector space.
(d) Any plane in R3 which passes through the origin is a vector space.
(e) If A is an m×n matrix, then the solution set of the homogeneous system Ax = 0
is a vector space in Rn . (See the ﬁrst paragraph of Section 3.)

Deﬁnition 1.7 Let v1 , v2 , . . . , vk be vectors in a vector space V . A vector v is called
a linear combination of v1 , v2 , . . . , vk if v = t1 v1 + t2 v2 + · · · + tk vk for some scalars
t1 , t2 , . . ., tk .

Example 1.27 Since [8 7 6] = (−4) · [1 2 3] + 3 · [4 5 6], [8 7 6] is a
linear combination of the vectors [1 2 3] and [4 5 6] in R3 .

Deﬁnition 1.8 Let v1 , v2 , . . . , vk be vectors in a vector space V . Suppose that W is
the set of all linear combinations of v1 , v2 , . . . , vk . Then W is a vector space, called
the vector space spanned by the vectors v1 , v2 , . . . , vk .

Notation W = span {v1 , v2 , . . . , vk }.

Example 1.28 The vector space spanned by [1 2 3] and [4 5 6] is the plane
passing through the points [0 0 0], [1 2 3] and [4 5 6].
                             
2    2    −1     0    1
                   
                   
 −1 −1 2 −3 1 
Example 1.29 For A =                      , the system of linear equations

 1     1 −2 0 −1 
                   
0    0   1   1 1
Ax = 0 has solutions given by x = [−α − β α − β 0 β]T , where α and β
are arbitrary scalars. Each solution of the system can thus be written as a linear

combination of v1 = [−1 1 0 0 0]T and v2 = [−1 0                      − 1 0 1]T , i.e.,
x = αv1 + βv2 . Therefore, the solution set of Ax = 0 is a vector space spanned by
v1 and v2 .
Vector space and linear dependence                                                               39

Deﬁnition 1.9 Vectors v1 , v2 , . . . , vk are said to be linearly dependent if there are
scalars t1 , t2 , . . . , tk , not all zero, such that t1 v1 + t2 v2 + · · · + tk vk = 0.

Note It is clear that two vectors v and w are linearly dependent if and only if one
is a scalar multiple of the other, i.e., either v = tw for some scalars t or w = sv for
some scalar s. Three vectors v, w and u are linearly dependent if one of them can
be expressed as a linear combination of the other two. In general, we have

Proposition 1.11 v1 , v2 , . . ., vk are linearly dependent if and only if one of these
vectors can be expressed as a linear combination of the remaining ones.

Proof Suppose t1 v1 + t2 v2 + · · · + tk vk = 0, where t1 , t2 , . . . , tk are not all equal to
zero. Without loss of generality, we may assume that t1 6= 0. Then
µ      ¶          µ     ¶
1        t2    2          tk
v = −          v + ··· + −      vk .
t1               t1
As such, v1 is expressed as a linear combination of v2 , . . ., vk .

Example 1.30 v1 = [1 2 3]T , v2 = [4 5 6]T and v3 = [7 8 9]T are linearly
dependent because 1v1 − 2v2 + 1v3 = 0.
Each of these 3 vectors may be expressed as a linear combination of the remaining
1      1
two. In fact, we have v1 = 2v2 − v3 , v2 = v1 + v3 and v3 = −v1 + 2v2 .
2      2

Note Vectors which are not linearly dependent are called linearly independent. In
other words, v1 , v2 , . . . , vk are linearly independent if and only if

t1 v1 + t2 v2 + · · · + tk vk = 0 ⇒ tj = 0 for any j = 1, 2, . . . , k.

It is evident that if a vector v1 is expressed as a linear combination of v2 , . . . , vk ,
then the vector space spanned by v2 , . . . , vk is the same as the vector space spanned
by v1 ,v2 ,. . .,vk .

For instance, if v1 , v2 and v3 are vectors in Example 1.30, then

span {v1 , v2 , v3 } =span {v1 , v2 } =span {v2 , v3 } =span {v1 , v3 }.
40                                                                              Linear Algebra

Remark 1.6        It turns out that linear dependence of vectors is equivalent to the
existence of non-trivial solutions for some homogeneous systems.             As a matter of
fact, if v1 , v2 , . . . , vk are vectors in Rn , we denote by A the n × k matrix whose j th
P
k
column equals to the vector vj for j = 1, . . . , k. Clearly,         tj vj = 0 if and only if
j=1
T                 1   2      k
At = 0, where t = [t1      t2   . . . tk ] . Therefore v , v , . . . , v are linearly dependent
if and only if the system of homogeneous equations At = 0 has non-trivial solutions.

Example 1.31 [1 2 0]T , [1 − 1 1]T and [0 0 1]T are linearly independent
        
1 1 0
        
        
because  2 −1 0  t = 0 has only the trivial solution.
        
0 1 1

By virtue of Remark 1.6 and Theorem 1.3, we have the following useful
result on linear dependence of vectors.

Theorem 1.10 If v1 , v2 , . . . , vk are vectors in Rn where k > n, then v1 , v2 , . . . , vk
are linearly dependent.

For instance, any four vectors in R3 are linearly dependent.

1.7   Inner product

1.7.1 Deﬁnition of inner product

If v and w are vectors in Rn (or Rn ), we deﬁne hv, wi = v1 w1 + · · · + vn wn , where
v1 , . . . , vn and w1 , . . . , wn are entries of the vectors v and w respectively. The scalar
hv, wi is called the inner product of v and w.
Inner product                                                                         41

It is obvious that the inner product satisﬁes the following properties:

(1) hv, wi = hw, vi for any v and w in Rn (or Rn );

(2) hv + u, wi = hv, wi + hu, wi for any v, u and w in Rn (or Rn );

(3) htv, wi = t hv, wi for any v, w in Rn (or Rn ) and for any scalar t;

(4) hv, vi ≥ 0 for any v in Rn (or Rn ), and hv, vi = 0 if and only if v = 0.

Deﬁnition 1.10 (1) The norm or magnitude of a vector v is deﬁned by
p        q
2    2            2
kvk =    hv, vi = v1 + v2 + · · · + vn .                  (1.7)

(2) If kuk = 1, then u is said to be a unit vector.

Note It is clear that kvk = 0 if and only if v is the zero vector.
µ      ¶
1
On the other hand, if w is a non-zero vector, then          w is a unit vector. This
kwk
h         iT
process is known as normalization. For instance, the vector w = 1 1 3           may
be normalized as [ √1
11
√1
11
√3 ]T .
11

Theorem 1.11 (Schwarz’s Inequality)              If v, w are vectors in Rn or Rn , then

|hv, wi| ≤ kvk · kwk. Moreover, equality holds if and only if v and w are linearly
dependent.

Proof For ﬁxed v and w, consider the quadratic function deﬁned by

f (t) = htv + w, tv + wi = hv, vi t2 + 2 hv, wi t + hw, wi. By (4) in the deﬁnition of

inner product, f (t) ≥ 0 for any real t. Consequently, the quadratic equation f (t) = 0

has no distinct real roots. This implies

(hv, wi)2 ≤ hv, vi × hw, wi .
42                                                                            Linear Algebra

Schwarz’s Inequality now follows by taking square root on both sides of this inequality.

When (hv, wi)2 = hv, vi × hw, wi, the quadratic equation f (t) = 0 has a double real

root. Therefore f (t0 ) = 0 for some real t0 .
We thus conclude that t0 v + w = 0, i.e., v and w are linearly dependent.

Note Theorem 1.11 simply says that given two sets of scalars v1 , . . . , vn and
p                 p
2           2    2            2
w1 , . . . , wn , we have |v1 w1 + · · · + vn wn | ≤ v1 + · · · + vn × w1 + · · · + wn , with
v1      v2         vn
equality holds if and only if            =     = ··· =    .
w1       w2         wn

The following facts illustrate the geometric meaning of the norm.

Proposition 1.12 If v and w are vectors, then
(i) kv + wk ≤ kvk + kwk, with equality holds if and only if one of the vectors is a
nonnegative scalar multiple of the other. (Triangle inequality);
(ii) ktvk = |t| kvk for any scalar t;
(iii) kv + wk2 + kv − wk2 = 2 kvk2 + 2 kwk2 . (The sum of squares of the diagonals
of a parallelogram is equal to the sum of squares of its four sides)

1.7.2 Angle between two vectors and orthogonality

Let v and w be non-zero vectors. We deﬁne the angle between v and w to be the
hv, wi
unique number θ lying between 0 and π such that cos θ =           . The deﬁnition
kvk · kwk
makes sense because Theorem 1.11 implies that

hv, wi
−1 ≤             ≤ 1.
kvk · kwk

Example 1.32 The angle θ between [1 3] and [5 2] satisﬁes

1×5+3×2          ∼ 0.6459.
cos θ = √          √        =
12 + 32 ×  52 + 22

Therefore, θ ∼ 0.8685 rad (37.01◦ ).
=
Inner product                                                                                43

Deﬁnition 1.11 (1) Two vectors v and w are said to be orthogonal if hv, wi = 0.
(2) Two non-zero vectors v and w are said to be perpendicular to each other if the
π
angle between them is equal to . (Notation v ⊥ w)
2
(3) A set of vectors {v , . . . , vk } is said to be an orthogonal set of vectors if these
1

vectors are mutually orthogonal to each other, i.e., hvi , vj i = 0 whenever i 6= j.
(4) An orthogonal set of vectors {v1 , . . . , vk } is said to be orthonormal if all these
vi ’s are unit vectors.

Remark 1.7 (1) The zero vector 0 is orthogonal to every vector.
(2) If v 6= 0 and w 6= 0, then v and w are orthogonal if and only if v ⊥ w.
(3) hv, wi = 0 ⇒ kvk2 + kwk2 = kv + wk2 . (Pythagorean Theorem)
(4) Every orthogonal set of non-zero vectors can be normalized to an orthonormal
set of vectors.

1
Example 1.33 A simple calculation shows that [1             − 1 2] and [−1 0          2
]   are
orthogonal (and perpendicular) to each other.

Example 1.34 {[0 1 0], [1 0 1], [1 0                 − 1]} is a set of orthogonal vectors.
They may be normalized to the following orthonormal set of vectors
½            ·              ¸ ·              ¸¾
1        1      1       −1
[0 1 0], √        0 √ , √         0 √         .
2        2      2         2

Proposition 1.13 Let {v1 , v2 , . . . , vk } be an orthogonal set of non-zero vectors.
hv, vj i
(i) If v = c1 v1 + c2 v2 + · · · + ck vk , then cj = j j for any j = 1, . . . , k.
hv , v i
(ii) v1 , v2 , . . ., vk are linearly independent.

Corollary 3 If {v1 , v2 , . . . , vk } is an orthonormal set of vectors, and if v is a linear
P
combination of v1 , v2 , . . . , vk , then v = k hv, vj i vj .
j=1
44                                                                               Linear Algebra

Note Let v and w be vectors and w 6= 0. We deﬁne the projection of v onto w to
µ         ¶
hv, wi
be the vector projw v =                w. Observe that (i) the magnitude of projw v
kwk2
is equal to kvk · |cos θ|; (ii) hv − projw v, wi = 0, i.e., v− projw v is orthogonal to

w.   Using this terminology, the Corollary 2 simply says that v is the sum of its
projections onto each of the orthonormal vectors v1 , v2 , . . . , vk .

v

θ        w
Proj w v

Fig. 1 Projection of v onto w

The above note can be applied to a construction of orthonormal vectors known
as the Gram-Schmidt process.

Theorem 1.12 Let v1 , v2 , . . . , vk be linearly independent vectors which span a vector
space V . Then there are orthonormal vectors w1 , w2 , . . . , wk such that wj is a linear
combination of v1 , . . . , vj . In particular, w1 , w2 , . . . , wk span V .
v1            v2 − hv2 , w1 i w1
Proof (Sketch) Take w1 = 1 , w2 = 2                                       ,
kv k        kv − hv2 , w1 i w1 k

vj+1 − [hvj+1 , w1 i w1 + · · · + hvj+1 , wj i wj ]
wj+1 =                                                         whenever j < k.
kvj+1 − [hvj+1 , w1 i w1 + · · · + hvj+1 , wj i wj ]k
Eigenvalue problem                                                                           45

1.8   Eigenvalue problem

1.8.1 Eigenvectors and eigenvalues of a square matrix

Let A be an n × n matrix. A non-zero vector v in Rn is called an eigenvector of A if
Av is a scalar multiple of v, i.e., if there is a scalar λ such that Av = λv. The scalar
λ is called an eigenvalue of A and v is said to be an eigenvector of A corresponding
to the eigenvalue λ.
                           
3   0                 1
Example 1.35            Let A =               and v =        .
8 −1                  2
       
3
Since Av =            = 3v, v is an eigenvector of A corresponding to the eigenvalue
6
                                                             
1                                                       3
λ = 3. w =             is not an eigenvector of A because Aw =             , which is not a
0                                                       8
scalar multiple of w.

Remark 1.8            It is clear that if v is an eigenvector of A corresponding to the eigen-
value λ, then tv is also an eigenvector of A corresponding to the same eigenvalue λ,
provided that t 6= 0. More generally, we have the following simple result:

“If λ0 is an eigenvalue of an n×n matrix A, then the eigenvectors of A corresponding

to λ0 , together with the zero vector 0, form a vector space in Rn .”

By re-writing Av = λv as Av − λv = 0, or (A − λI)v = 0, we observe
that ﬁnding an eigenvector of A is equivalent to ﬁnding non-trivial solutions of the
homogeneous system (A − λI)v = 0. By Theorem 1.5 and Proposition 1.8, we
conclude that (A − λI)v = 0 has non-trivial solutions if and only if det(A − λI) = 0.

Theorem 1.13              Let A be an n × n matrix with real entries. A real number λ is
an eigenvalue of A if and only if det(A − λI) = 0.
46                                                                                Linear Algebra

Once we obtain an eigenvalue (say λ) of A, we shall be able to use Gaussian
elimination to ﬁnd non-trivial solutions of (A − λI)v = 0 and thus obtain the corre-
sponding eigenvectors of A.
As the homogeneous system (A − λI)v = 0 has inﬁnitely many non-trivial
solutions, there are inﬁnitely many eigenvectors of A corresponding to λ. Therefore,
we only need to ﬁnd eigenvectors that are linearly independent.             All other eigen-
vectors corresponding to λ may be expressed as linear combinations of these linearly
independent eigenvectors.

                                         
a11 − λ     a12    ···       a1n
                                          
                                          
        a21     a22 − λ · · ·      a2n    
Remark 1.9         Let f (λ) = det(A − λI) = det 
         .          .               .
.

         .
.          .
.   ···         .
.     
                                          
an1        an2    · · · ann − λ

It follows by induction on n that f (λ) is a polynomial of degree n with leading coeﬃ-
cient (−1)n . f (λ) is called the characteristic polynomial of A.

Therefore, eigenvalues of the matrix A are real roots of the equation f (λ) = 0. As a
result, an n × n matrix has at most n eigenvalues.

         
5 4
Example 1.36             For A =          , we have
1 2
                 
5−λ       4
f (λ) =                   = λ2 −7λ+6. Therefore, eigenvalues of A are roots
1     2−λ

of the quadratic equation λ2 − 7λ + 6 = 0.

We thus obtain λ1 = 6, λ2 = 1.
Eigenvalue problem                                                             47

 −v + 4v = 0
1    2
Case (1) For λ1 = 6, (A − λ1 I)v = 0 ⇔               .
 v − 4v = 0
1    2

We thus obtain v = [4 1]T as an eigenvector corresponding to λ2 = 6.

 4v + 4v = 0
1     2
Case (2) For λ2 = 1, (A − λ2 I)v = 0 ⇔                    .
 v +v =0
1    2

∴ v = [1   − 1]T as an eigenvector corresponding to λ2 = 1.

             
0   1 0
         
         
Example 1.37 If A =  0       0 1 , then
         
6 −11 6
¯               ¯
¯               ¯
¯ −λ 1      0 ¯
¯               ¯
¯               ¯
f (λ) = det(A − λI) = ¯ 0 −λ      1 ¯ = −(λ − 1)(λ − 2)(λ − 3) = 0.
¯               ¯
¯               ¯
¯ 6 −11 6 − λ ¯
Therefore, the eigenvalues are given by λ1 = 1, λ2 = 2, λ3   = 3.
                      
−1      1    0     v
                  1 
                      
Case (1) For λ1 = 1, (A − λ1 I)v = 0 ⇔  0 −1               1   v2  = 0.
                      
6 −11       5     v3
We thus obtain v = [1 1 1]T as an eigenvector corresponding to λ1 = 1.
                  
−2    1 0       v
              1 
                  
Case (2)    For λ2 = 2, (A − λ1 I)v = 0 ⇔  0 −2 1   v2  = 0.
                  
6 −11 4        v3
The corresponding eigenvector is given by v = [1 2 4]T .
                   
−3      1 0     v1
                   
                   
Case (3)    For λ3 = 3, (A − λ3 I)v = 0 ⇔  0 −3 1   v2  = 0.
                   
6 −11 3       v3
We thus obtain v = [1 3 9]T as an eigenvector corresponding to λ3 = 3.
48                                                                       Linear Algebra
             
3 −2 0
                   
                   
Example 1.38 For A =  −2            3 0 , we obtain f (λ) = (5 − λ)2 (1 − λ).
                   
0            0 5

Hence the eigenvalues are λ1 = 1, λ2 = λ3 = 5.

For λ1 = 1, we solve (A − I)v = 0 to obtain v1 = [1 1 0]T as a corresponding
eigenvector. For λ2 = λ3 = 5 (double root), we solve (A − 5I)v = 0 to obtain

v2 = [1   − 1 0]T and v3 = [0 0 1]T as two linearly independent eigenvectors.

We thus conclude that there are altogether three linearly independent eigenvectors for
the given matrix A.

                
1 1 −1
         
         
Example 1.39 If A =  −1 3 −1 , then f (λ) = (2 − λ)(1 − λ)2 .
         
−1 2  0
Therefore the eigenvalues are λ1 = 2, λ2 = λ3 = 1.

Solving the linear systems (A − λI)v = 0 for λ = 2 and λ = 1 (multiplicity 2),
we obtain respectively two linearly independent eigenvectors v1 = [0 1 1]T and
v2 = [1 1 1]T .

Remark 1.10 From these examples, we observe that an 3×3 matrix may have up to 3
linearly independent eigenvectors (Examples 1.37 and 1.38), but this is not always
the case (Example 1.39).         An n × n matrix with exactly n linearly independent
eigenvectors has some nice properties, which will be dealt with in the next subsection.

1.8.2 Diagonalization

Deﬁnition 1.12 A square matrix A is said to be diagonalizable if there is a nonsin-
gular matrix P such that P−1 AP is a diagonal matrix. We also say that the matrix
P diagonalizes A.
Eigenvalue problem                                                                                    49
                                                   
5 4                                         1 4
Example 1.40 Let A =                        . If we take P =                   , a simple calculation
1 2                                        −1 1
shows that
                                                       
1
−4              5 4           1 4             1 0
P−1 AP =        5     5                                =           .
1        1
5        5
1 2          −1 1             0 6
             
1 4
Therefore, A is diagonalizable and P =                             diagonalizes A.
−1 1

Remark 1.11      It should also be noted that the diagonalizing matrix, if exists, is
          
−2 1
not unique. For instance, in Example 1.40, we may also take P =               .
1
2 4

The following theorem says that whether a matrix is diagonalizable depends
on the number of its linearly independent eigenvectors.

Theorem 1.14 Let A be an n × n matrix.                        Then A is diagonalizable if and only

if A has n linearly independent eigenvectors. In fact, if v1 , v2 , . . . , vn are linearly

independent eigenvectors of A corresponding to eigenvalues λ1 , λ2 , . . ., λn (not nec-
essarily distinct), then by taking P to be the n × n matrix having v1 , v2 , . . . , vn as its

columns and D to be the diagonal matrix with djj = λj for j = 1, 2, . . . , n, we obtain
AP = PD.

                            
−1           2   2
                       
                       
Example 1.41 For A =                2     2 , we have f (λ) = −λ(λ + 2)(λ + 3).
2
                       
−3 −6 −6

Therefore, eigenvalues of A are λ1 = −2, λ2 = −3, λ3 = 0 and the corresponding
eigenvectors are respectively given by

v1 = [2   − 1 0]T , v2 = [1 0                    − 1]T , v3 = [0 1          − 1]T .
50                                                                            Linear Algebra

Observe that these eigenvectors are linearly independent in R3 . It thus follows from
Theorem 1.13 that A is diagonalizable, and that by taking
                                                 
2   1   0                           −2     0 0
                                                 
                                                 
P =  −1   0   1 , we have P−1 AP =  0 −3 0 .
                                                 
0 −1 −1                               0    0 0

                 
3 −2 0
         
         
Example 1.42 The matrix A =  −2  3 0  in Example 1.38 has
         
0  0 5

v1 = [1 1 0]T , v2 = [1       − 1 0]T and v3 = [0 0 1]T as eigenvectors corre-

sponding to eigenvalues λ1 = 1,   λ2 = λ3 = 5. Therefore, A diagonalizable.
                                           
1 1       0                         1 0 0
                                           
                                           
In fact, by taking P =  1 −1      0 , we have P−1 AP =  0 5 0 .
                                           
0 0       1                         0 0 5

Example 1.43 The 3×3 matrix in Example 1.39 has only two linearly independent
eigenvectors and is therefore not diagonalizable.

                       
λ1   0   ···    0
                              
                              
          0    λ2 · · ·   0   
Remark 1.12       If P AP = D = 
−1
          .     . ..      .
, then A = PDP−1 . This

          .
.     .
.     .   .
.   
                              
0    0   · · · λn
                         
λm
1   0    ···   0
                                       
                                       
                 0    λm   ···   0     
implies A = PD P for any positive integer m. Since D =                                        ,
m    m −1                                  m                          2
                 .
.    .
.    ...   .
.     
                 .    .          .     
                                       
0    0    · · · λm
n
the m th power of A may be computed easily. Note that this method is important in
control theory.
Eigenvalue problem                                                                             51
            
4 −3
Example 1.44 Compute A2002 , where A =                            .
2 −1
Solution The eigenvalues of A are λ1 = 2, λ2 = 1 with corresponding eigenvectors

given by v1 = [3 2]T and v2 = [1 1]T .
                                   
3 1               1 −1            2 0
Now P =         , P−1 =           , D =      , A = PDP−1 and
2 1             −2    3           0 1

                                          
2002
3 1          2           0         1 −1
A2002 = PD2002 P−1 =                                                      
2002
2 1              0   1            −2   3
                                           
3 × 22002 − 2 −3 × 22002 + 3
=                                            .
2003                  2003
2        −2        −2          +3

The following result may be considered as a supplement of Theorem 1.13.

Proposition 1.14 An n × n matrix with n distinct eigenvalues is diagonalizable.

1.8.3 Diagonalization by orthogonal matrices

Deﬁnition 1.13        A square matrix A is said to be symmetric if AT = A.                   It is
clear that A is symmetric if and only if aij = aji for any 1 ≤ i, j ≤ n.

Deﬁnition 1.14        A square matrix Q is called an orthogonal matrix if Q−1 = QT ,
i.e., Q satisﬁes QT Q = QQT = I.

                  
1  4 5                                                           
                                                      ◦
cos 60 − sin 60 ◦
                        
Example 1.45 A =  4 −3 0                  is symmetric and Q =                     is
                                                sin 60◦ cos 60◦
5  0 7
orthogonal.
52                                                                        Linear Algebra

Proposition 1.15 The following statements are equivalent:
(i) Q is an orthogonal matrix, i.e., QT = Q−1 ;
(ii) the row vectors of Q form an orthonormal set of vectors in Rn ;
(iii) the column vectors of Q form an orthonormal set of vectors in Rn .

The following theorem, which says that any symmetric matrix can be diago-
nalized by an orthogonal matrix, is of importance in many areas of pure and applied
mathematics, including coordinate geometry and partial diﬀerential equations.

Theorem 1.15 Let A be a symmetric matrix of order n. Then
(i) A has eigenvectors v1 , . . . , vn which form an orthonormal set of vectors in Rn ;
(ii) there is an orthogonal matrix Q such that QT AQ = D, where D is a diagonal
matrix. In other words, A is diagonalized by Q.
              
1 −3
Example 1.46 Let A =                       .
−3     1
The characteristic polynomial of A is given by
              
1 − λ −3
f (λ) = det                = (λ − 4)(λ + 2).
−3 1 − λ
               
1           −1
For λ = −2 and λ = 4, we may respectively choose   and               as eigenvec-
1           1
              
1          −1
tors. It is clear that   and           are orthogonal, which can be normalized as
1           1
                     
1               1
√             − √2
 2  and              .
1
√             1
√
2             2
                                             
1       1
√    − √2                            −2 0
By taking Q =        2          , we obtain QT AQ =           .
1     1
√
2
√
2
0 4

For further applications of diagonalization, see Section 5 of Chapter 3.

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 9 posted: 5/14/2010 language: English pages: 48
How are you planning on using Docstoc?