VIEWS: 333 PAGES: 89 CATEGORY: Education POSTED ON: 1/5/2010 Public Domain
Linear Algebra 2 with Maple Denis Sevee John Abbott College Contents 0 Review 1 1 Change of Basis 17 1.1 Coordinates Relative to a Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.2 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3 Examples of Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2 Eigenvalues and Eigenvectors 39 2.1 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.3 Eigenvectors and Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . 64 2.4 Complex Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3 Dynamical Systems 85 3.1 The Fibonacci Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.2 Diﬀerence Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.3 Examples of Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.4 The Geometry of Dynamical Systems in the Plane. . . . . . . . . . . . . . . . . . . . 117 3.5 Dynamical Systems with Complex Eigenvalues . . . . . . . . . . . . . . . . . . . . . 121 3.6 More Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 3.6.1 Population Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 3.6.2 Solving Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 4 Inner Products and Projections 137 4.1 The Standard Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 4.2 Orthogonal Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 4.3 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 4.4 The Gram-Schmidt Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 4.5 Least-Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 4.6 Data Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 4.7 An Experiment by Galileo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 5 Inner Product Spaces 213 5.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 5.2 Approximations in Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 230 5.3 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 5.4 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 i ii CONTENTS 6 Symmetric Matrices 259 6.1 Symmetric Matrices and Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . 259 6.2 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 6.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 7 The Singular Value Decomposition 291 7.1 Singular Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 7.2 Geometry of the Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . 305 7.3 The Singular Value Decomposition and the Pseudoinverse . . . . . . . . . . . . . . . 314 7.4 The SVD and the Fundamental Subspaces of a Matrix . . . . . . . . . . . . . . . . . 324 7.5 The SVD and Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 7.6 Total Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 8 Calculus and Linear Algebra 345 8.1 Calculus with Discrete Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 8.2 Diﬀerential Equations and Dynamical Systems . . . . . . . . . . . . . . . . . . . . . 348 8.3 An Oscillating Spring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 8.4 Diﬀerential Equations and Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . 354 A Linear Algebra with Maple 359 B Complex Numbers 367 C Linear Transformations 373 D Partitioned Matrices 377 Preface This book was written for students taking a second semester linear algebra course at John Abbott College. In the ﬁrst semester of linear algebra students have been exposed to the theory of linear systems which includes techniques of solving systems, basic matrix and vector operations, determi- nants, subspaces (of Rn ), and the concept of a basis. In this book we will look at the other two main pillars of linear algebra: the theory of eigenvectors and the theory of orthogonality. The ﬁrst part of the book covers the theory of eigenvectors and eigenspaces. In condensed form the general thread of topics is Uncoupling in diﬀerence equations Eigenvectors −→ Diagonalization −→ Uncoupling in diﬀerential equations Principal axes The next portion of the book covers the theory of orthogonality. The thread of topics here may be summarized as Least-squares Lines Orthogonal bases −→ Projections −→ Best Approximations The last part of the book deals with connections between these ﬁrst two topics and leads from symmetric matrices to the singular value decomposition. Along with these main topics there are several other motifs running through the text: • The contrast between discrete and continuous representations. A large number of important applications of linear algebra involve converting from continuous models to discrete models, and most commonly this conversion is done by sampling the continuous data at evenly spaced intervals. We look at many examples of this, the main one being the contrast between linear dynamical systems and systems of linear diﬀerential equations. • The interpretation of one-dimensional data as vectors and two-dimensional data as matrices. Vectors are usually introduced as arrows (i.e., directed line segments) in R2 and R3 and this geometric interpretation provides important intuitive insight into the geometry of vector operations. These insights can and should be carried over into more abstract vector spaces, but the techniques of linear algebra are frequently used in spaces of very high dimension where the entries in the vector (or matrix) are measurements of some quantity. In these cases other methods of visualizing the vectors are more appropriate. • Matrix-vector multiplication. What happens when you multiply a vector by a matrix? It’s incredible that this simple computation (which is usually presented in an introductory linear algebra course via the algorithm of pairing the entries in the rows of the matrix with the entries of the vector, multiplying the pairs, and adding the results) is so versatile. So many basic procedures can be interpreted as matrix-vector multiplication: ﬁnding the dot or cross products, solving a system, ﬁnding least-squares solutions of systems, ﬁnding projections, iii iv CONTENTS rotating vectors in space, the Gram-Schmidt procedure, and others can all be interpreted as matrix-vector multiplication. The singular value decomposition is particularly interesting in this regard as it reveals a simple geometric pattern behind any matrix-vector multiplication. • The idea of distance. It makes intuitive sense to talk about distance in R2 or R3 , spaces which can be easily visualized. But the idea of distance can be extended to more abstract vector spaces; not only Rn for larger values of n but also spaces of functions, matrices or other mathematical objects. The extension of this idea to abstract vector spaces allows us to talk about the distance between two functions (for one example), and trying to minimize these distances leads us to the idea of best approximations. Earlier we mentioned the division of linear algebra into the theories of linear systems, of eigenspaces, and of orthogonality. There is another standard way of dividing linear algebra: theoretical linear algebra, applied linear algebra, and numerical linear algebra. This book tries to maintain a balance between these. In particular, the last two are closely linked with the use of computer technology. Most important applications of linear algebra involve the use of computers and numerical linear algebra deals with the nuts and bolts of how computations can be performed most eﬀectively. In this book we use the computer algebra system Maple to illustrate the topics. Chapter 0 Review This section is a summary of some results from the ﬁrst semester of linear algebra that will play important roles in this course. They are not listed in any particular order, and if there are any of the following topics that you don’t understand you should review it in a textbook or see your teacher. Systems of Linear Equations There are several basic ways of looking at a system of linear equations. The most obvious way is to view such a system as a collection of linear equations with a certain number of unknowns. For example, 3x1 + 2x2 − 9x3 + 4x4 = 13 2x1 + 3x2 + 7x3 − 3x4 = 22 5x1 − 5x2 + 5x3 + 2x4 = 17 would be a system of 3 equations with 4 unknowns. This type of system can be solved by setting up the augmented matrix and reducing it using elementary row operations. The two standard approaches for solving a linear system by row reduction are called Gaussian elimination and Gauss-Jordan elimination. A key result here is that any linear system has either (a) a unique solution, (b) inﬁnitely many solutions, or (c) no solution. Any system of linear equations can also be interpreted as a matrix equation in the form Ax = b So, for example, the above system could be written as: x 4 1 3 2 −9 13 x 7 −3 2 2 3 = 22 x3 5 −5 5 2 17 x4 This type of equation represents a linear system in terms of a matrix-vector product. It says that if you take vector x and multiply it by matrix A, then you get vector b as a result. From this point of view olving the system amounts to ﬁnding the vector x which satisﬁes the equation Ax = b. The equation Ax = b means that vector x (in R4 in the above example) has somehow been transformed into vector b ( in R3 in the above example) when it is multiplied by matrix A. This 1 2 0. Review operation of multiplying a vector by a matrix is one example of the more general concept of a linear transformation. If A is an invertible matrix then the solution to the system Ax = b can be written as x = A−1 b. There is a third important way of looking at a linear system. The left side of the equation Ax = b can be seen as a linear combination of vectors. So the above system can be seen as: 3 2 −9 4 13 x1 2 + x2 3 + x3 7 + x4 −3 = 22 5 −5 5 2 17 This means that when you are trying to solve the system Ax = b you are trying to ﬁnd out how vector b can be written as a linear combination of the columns of matrix A. The unknowns are called the weights of this combination. Classiﬁcation of Systems of Equations There are three basic classes of linear systems, Ax = b, depending on the size of matrix A. Suppose A is an m × n matrix, then: • The system is said to be overdetermined if m > n. In this case there are more equations than unknowns. When A is reduced there can’t be a pivot in each row although there might be a pivot in each column of A. • The system is said to be underdetermined if m < n. This means there are fewer equations than unknowns. When A is reduced there can’t be a pivot in each column although there might be a pivot in each row. • When m = n the system is called evendetermined. This is the one form where it is possible to get a pivot in each row and column when A is reduced. Linear Independence Any collection of vectors can be classiﬁed as being either linearly dependent or linearly independent. A collection of vectors is linearly dependent if one of the vectors in the collection can be written as a linear combination of the other vectors in the set. An equivalent way of deﬁning linear independence for vectors in Rn is the following: If v1 , v2 , . . . , vn are vectors in Rn and A = v1 v2 · · · vn , then the columns of A are linearly independent if and only if Ax = 0 has only the trivial solution. Bases Any non-zero vector space has a basis. In fact, for any such vector space there are many possible choices for a basis. If the dimension of the space is n (with n > 0), then any basis will consist of n linearly independent vectors from that space. It is also true that any set of n linearly independent vectors in the space will be a basis of that space. 3 The signiﬁcance of a basis for a vector space is that any vector in the vector space can be expressed in a unique way as a linear combination of the basis vectors. This is one of the most important ideas in linear algebra. The subspace consisting of only the zero vector is said to be 0-dimensional and has no basis. The Dot Product If you have two vectors in Rn they can be multiplied together using the dot product formula. As a rule the dot product is not diﬃcult to compute but you should know that the dot product can also be expressed as a matrix product. That is: u · v = uT v = vT u A vector in Rn can be seen as an n × 1 matrix, so the above formula says that the dot product of two vectors can be written as the transpose of one vector times the other vector. This will be used a lot in this course. There are some other important aspects of the dot product you should remember: 1. u · v = 0 means that u and v are orthogonal. 2. u · u = uT u = u 2 . This equation relates the dot product to the length of a vector. 3. u · v = u v cos(θ). In this equation θ is the angle between u and v and 0 ≤ θ ≤ π. Matrix Multiplication There is an important way of looking at matrix multiplication which will be used frequently in this course. If you have two matrices of the appropriate dimensions then AB = A b1 b2 b3 ··· bn = Ab1 Ab2 Ab3 ··· Abn In other words, when you multiply two matrices you multiply the ﬁrst matrix with each of the column vectors of the second matrix. You can also look at matrix multiplication as a collection of dot products. So a1 T b1 a1 T b2 a1 T bn ··· a2 T b1 a2 T b2 ··· a2 T bn AB = .. . am T b1 am T b2 ... T am bn In the above ai T stands for row i of A, and bj stands for column j of B. This might look complicated but it is just the standard computational procedure for multiplying two matrices together. That is, you go along the rows of the ﬁrst matrix and down the columns of the second matrix, multiplying the corresponding entries and adding the results. (These are the same steps used in ﬁnding the dot product of two vectors.) 4 0. Review Unit Vectors A unit vector is a vector whose length is 1. v You should remember that gives a unit vector in the same direction as v for any non-zero v vector v. When this formula is used to convert a vector into a unit vector, the procedure is called normalizing the vector. Projections uT v The projection of one vector, u, onto another vector, v, is given by Projv u = vT v v. The orthogonal component of the projection is given by u − Projv u. Diagonal Matrices A diagonal matrix is a matrix A where aij = 0 if i = j. That is, for a diagonal matrix all the entries oﬀ the main diagonal are equal to zero. We will not require a diagonal matrix to be square (this is slightly diﬀerent from the conventional use of this term.). Sometimes non-square diagonal matrices will be referred to as rectangular diagonal matrices. In a diagonal matrix some (or all) of the entries on the main diagonal could also be zero. Why are diagonal matrices important? They are important mainly because of their simplicity. If a11 0 0 ··· 0 0 a22 0 · · · 0 0 0 a33 · · · 0 A= .. . 0 0 0 · · · ann is a square diagonal matrix then ak 11 0 0 ··· 0 0 ak 22 0 ··· 0 ak Ak = 0 0 33 ··· 0 .. . 0 0 0 ··· ak nn for any positive integer k. And 1/a11 0 0 ··· 0 0 1/a22 0 ··· 0 A −1 = 0 0 1/a33 ··· 0 .. . 0 0 0 ··· 1/ann assuming that all the diagonal entries are non-zero. Also, for a diagonal matrix the matrix-vector product Av is simple to compute. Multiplication by a square diagonal matrix just amounts to scaling each entry of v by the corresponding diagonal entry. 5 Rotations cos θ sin θ The 2 × 2 matrix R = is called a rotation matrix or a rotator. If any vector − sin θ cos θ 2 in R is multiplied by such a matrix then the vector is rotated clockwise around the origin through the angle θ. Note that the columns of this matrix are orthogonal, unit vectors in R2 for any value of θ. In R3 rotations are more complicated. Rotations around the x, y, or z axis would correspond to multiplying by the following rotators: 1 0 0 cos θ 0 − sin θ cos θ − sin θ 0 Rx = 0 cos θ − sin θ Ry = 0 1 0 Rz = sin θ cos θ 0 0 sin θ cos θ sin θ 0 cos θ 0 0 1 The Null Space Any m × n matrix A has an associated vector space called the null space of A. This subspace is denoted by Nul A. Nul A will be a subspace of Rn and consists of the solutions to Ax = 0, but you should remember that any system of this type (a homogeneous system) has either only the trivial solution x = 0, or an inﬁnite number of solutions. The last statement implies that the null space of A might consist of only the zero vector (that is, Nul A might be zero dimensional) or will contain an inﬁnite number of vectors. The null space is just the zero vector if the columns of A are linearly independent. If the columns of A are linearly dependent then Ax = 0 has non-trivial solutions and Nul A will contain an inﬁnite number of vectors. What’s so special about Nul A? Well, if Ax = b is consistent the the solution is just the null space of A translated by the addition of a constant vector. The Column Space and Row Space Any m × n matrix A has an associated vector space called the column space of A. This subspace is denoted by Col A. Col A is a subspace of Rm and consists of all possible linear combinations of the columns of A. What’s so special about the column space? You know that any system of equations Ax = b is either consistent or inconsistent. It is consistent if b can be written as a linear combination of the columns of A, otherwise it is inconsistent. This means, the system is consistent when b is in Col A, and is inconsistent when b is not in Col A. The dimension of the column space of A is called the rank of A. The rank is also the number of pivots in the reduced row echelon form of A. There is another space associated with an m × n matrix A called the row space and denoted Row A. This is the set of all possible linear combinations of the rows of A. The row space and column space alsways have the same dimension. There is one particular aspect of the row space that will be relevant for this course. Suppose x is a vector such that Ax = 0 (that is, x is in the null space of A). It was pointed out earlier that the product Ax can be seen as a collection of dot products so the equation Ax = 0 means that any vector in Nul A must be orthogonal to the rows of A. 6 0. Review Exercises 1. Let 1 1 1 0 4 1 0 −1 1 , v = −1 1 , u2 = 1 u1 = 0 , u4 = 1 , u3 = 1 0 1 1 1 −1 (a) Show that {u1 , u2 , u3 , u4 } is a basis for R4 . (b) Write v as a linear combination of these basis vectors. 2. Are the following statements true or false. (a) An underdetermined system cannot have a unique solution. (b) An overdetermined system must be inconsistent. (c) An evendetermined system must have a unique solution. (d) A consistent, underdetermined system must have an inﬁnite number of solutions. (e) An underdetermined, homogeneous system must have an inﬁnite number of solutions. (f) If the system Ax = 0 has only the trivial solution then this system cannot be underdetermined. 3. Let A be an n × n matrix. Which of the following conditions are equivalent to A being invertible? (a) The rank of A is n. (d) The null space of A is {0}. (b) The column space of A is {0}. (e) The null space of A is Rn . (c) The column space of A is Rn . (f) The determinant of A is 0. 4. The dot product of u and v is given by uT v = vT u. What happens if it is the second vector that is transposed? That is, what is uvT ? Is uvT = vuT ? 5. Suppose AB = I for two matrices A and B. What can you say about the orthogonality of the row vectors of A and the column vectors of B? (In this problem why is it not true that B = A−1 ?) 1 2 1 6. Normalize the vectors 1 and 1 −2 1 1 7. Let 3 1 u = 3 , v = 2 −1 2 Find the projection of u onto v. Find the projection of v onto u. Find the orthogonal component of the projection in both of these cases. 8. Show that the projection and orthogonal component of the projection are the same whether you project u onto v or project u onto kv for any non-zero scalar k. What is the formula for the projection of u onto v if v is known to be a unit vector? 9. Suppose A is an m × n diagonal matrix with 1’s down the diagonal and m > n. Using the notation I of partitioned matrices this means that we can write A = . O 7 (a) Give a description of the eﬀect of multiplying a vector in Rn by A. (b) Give a description of the eﬀect of multiplying a vector in Rm by AT . (c) What happens to a vector if it is multiplied by AAT ? (d) What happens to a vector if it is multiplied by AT A? Multiplying a vector by A is called zero padding and multiplying by AT is called truncation. Zero padding increases the dimensionality of a vector, and truncation decreases the dimensionality. The above comments can be summarized by the rule: Zero-padding is the transpose of truncation. 10. Let 4 0 0 0 2 0 A= 0 0 1 0 0 0 Describe what happens to a vector in R3 when it is multiplied by A. Describe what happens to a vector in R4 when it is multiplied by AT . 11. (a) Let 0 1 0 0 0 0 1 0 A= 0 0 0 1 0 0 0 0 Describe the eﬀect when a vector in R4 is multiplied by A. What is A4 ? (b) Let A be the n × n matrix 0 1 0 ... 0 0 0 0 1 ... 0 0 . A= . . 0 0 ... 0 1 0 0 0 ... 0 0 What is A2 ? What is An ? (c) Let 0 1 0 0 0 0 1 0 A= 0 0 0 1 1 0 0 0 Describe the eﬀect when a vector in R4 is multiplied by A. What is A4 ? (d) Let A be the n × n matrix 0 1 0 ... 0 0 0 0 1 ... 0 0 . A= . . 0 0 ... 0 1 1 0 0 ... 0 0 What is A2 ? What is An ? 8 0. Review cos θ 12. Why is any unit vector in R2 of the form ? sin θ a 13. Let u = be any vector in R2 . Find a vector orthogonal to u. b 14. Use matrix multiplication to show that a clockwise rotation (in R3 ) of π/2 around the x axis followed by a clockwise rotation of π/2 around the z axis, followed by a clockwise rotation of π/2 around the y axis is equivalent to a counter-clockwise rotation of π/2 around the z axis. 1 1 1 15. Give a geometric description of the null space of . Give an equation that deﬁnes this 2 1 3 space. Give a geometric description of the row space of this matrix. Give an equation for this space. Why is the column space of this matrix all of R2 ? 2 4 16. Let A = . 3 6 (a) How can you tell by observation that this matrix has rank 1? (b) Write this matrix in the form uvT where u and v are vectors in R2 . (c) Find a basis for Nul A, Col A, and Row A. 17. (a) Show that uvT has rank 1 for any two non-zero vectors u and v. (b) Show that if w is orthogonal to v then w is in the null space of uvT . 1 2 3 1 18. Let A = 1 1 2 and v = 1 . Is v in Col A? Is v in Nul A? 0 1 1 −1 9 Using MAPLE Example 1. In this Maple lab we will illustrate an important aspect of linear transformations. cos t In R2 any vector of the form lies on the unit circle. In fact, the entire unit circle is the set of sin t all vectors of this form as t ranges from 0 to 2π. In Maple we can illustrate this as follows: >with(LinearAlgebra): >v:=<cos(t), sin(t)>: >plot([v[1],v[2],t=0..2*Pi]); 1 0.8 0.6 0.4 0.2 –1 –0.6 0 0.20.40.60.8 1 –0.2 –0.4 –0.6 –0.8 –1 Figure 1: The unit circle. Here are some points you should understand about the above Maple commands: • The ﬁrst command, with(LinearAlgebra), loads the linear algebra package into Maple . This loads some additional commands dealing with linear algebra into Maple . All the Maple exercises in this book require this package so you should enter this command at the beginning of each Maple lab. From now on this command won’t be given explicitly but it will be understood that you have entered it at the beginning of each Maple session. We ended the line with a colon. If you end the line with a semi-colon Maple will print out the names of all the new commands that have been loaded. cos t • The second command deﬁnes a vector v = . sin t • Angle brackets, square brackets, parenthesis, and curly brackets all have diﬀerent meanings in Maple . So make sure you use the same type of bracket as indicated in the example. • The plot command as used here takes a list as input. The list consists of three entries. The ﬁrst entry is the x value of the point that is being plotted. This value is cos t and this is the ﬁrst entry in vector v. This ﬁrst entry can be selected using the command v[1]. The second entry in the list is the y value. This is the second entry in v and is selected by using v[2]. The third entry in the list is the range of the parameter t. In this case the range is from 0 to 2π. 10 0. Review 1 −1 cos t Now let A = . If we multiply by matrix A it is equivalent to multiplying the entire 2 −1 sin t unit circle by A. In this case we say that we are applying a linear transformation to the circle. How does the circle get transformed by A? We will use Maple to ﬁnd the answer: >A:=<<1,2>|<-1,-1>>; >u:=A.v; >plot([u[1],u[2],t=0..2*Pi],scaling=constrained); 2 1 –1 0 0.5 1 –1 –2 1 −1 Figure 2: The unit circle transformed by . 2 −1 Here are some comments about the above: • The ﬁrst line illustrates one method of deﬁning a matrix in Maple . Here we deﬁne the matrix column by column. There is one pair of angled brackets for the entire matrix and each column vector is enclosed in angled brackets. The columns are separated by vertical bars, |. • The second command illustrates how to perform matrix multiplication in Maple . Scalar multipli- cation uses the * symbol. Matrix multiplication uses the . symbol. • The last line plots the transformed circle. The added parameter scaling=constrained forces the x and y axes to be scaled the same. Now we will repeat the same procedure for other matrices. (In a Maple worksheet you can just scroll up, change the values in matrix A, and re-enter the subsequent commands. −3 1 For example, if we let A = we would get Figure 3. 2 −1 2 1 –3 –2 –1 0 1 2 3 –1 –2 −3 1 Figure 3: The unit circle transformed by . 2 −1 11 1 1.5 If we let A = we would get Figure 4. 0 2 2 1 –1.5 –1–0.5 0.5 1 1.5 –1 –2 1 1.5 Figure 4: The unit circle transformed by . 0 2 2 0 If we let A = we would get Figure 5. 0 .3 2 1 –2 –1 0 1 2 –1 –2 2 0 Figure 5: The unit circle transformed by . 0 .3 This last example is particularly important. The matrix A in this example is diagonal and therefore multiplication by A corresponds to scaling the unit circle by 2 in the horizontal direction and by .3 in the vertical direction. The eﬀect is to stretch the circle horizontally and ﬂatten the circle vertically. You should ﬁnd it easy to predict the result of multiplying the unit circle by any diagonal matrix. For our next two examples let cos(.3) − sin(.3) .4 0 R= and S = sin(.3) cos(.3) 0 1.5 Notice that R is a rotation matrix. Matrix R rotates vectors in R2 counter-clockwise by .3 radians. S is a diagonal matrix. Matrix S scales vectors in R2 by diﬀerent amounts along the horizontal and vertical axes. What does the unit circle look like after multiplying by R? by S? byR2 ? by S 2 ? by R3 ? by S 3 ? Now consider the matrices RS and SR, how do these two matrices transform the unit circle? Do they have the same eﬀect? 12 0. Review >R:=<<cos(.3),sin(.3)>|<-sin(.3),cos(.3)>>; >S:=<<.4,0>|<0,1.5>>: >u1:=R.S.v; >u2:=S.R.v; >plot([u1[1],u1[2],t=0..2*Pi],scaling=constrained); >plot([u2[1],u2[2],t=0..2*Pi],scaling=constrained); This gives Figure 6 and Figure 7. 1.5 1 1 0.5 0.5 –0.5 0 0.5 –0.5 –0.5 –1 –1 –1.5 Figure 6: Transforming by RS. Figure 7: Transforming by SR. You should understand why you got these pictures. (A lot of people expect them to be reversed.) You should try a few more examples using some 2 × 2 matrices of your own choice. In each of the above examples the transformed circle was, in fact, an ellipse. By the end of the course you will have a clear understanding of why you get an ellipse in these cases. But, in fact, do you always get an ellipse? 1 3 Suppose we let A = then the transformed circle is shown in Figure 8. 2 6 6 4 2 –2 0 2 –2 –4 –6 1 3 Figure 8: The unit circle transformed by , a non-invertible matrix. 2 6 In this case we get a straight line. Is this an ellipse? Well, yes, in a sense. This can be seen as what is called a degenerate ellipse, an ellipse where the width along the minor axis has shrunk to zero. Why did we get a straight line in this last example? You should be able to understand this result on purely theoretical grounds. First, notice that A in this last example is not invertible. Therefore the columns of A must be linearly dependent. If you look at the column space you should be able to see that it is just the line y = 2x. When you multiply any vector by matrix A you are just forming a linear combination of the columns of A, that means that the result of that multiplication has to be in the 13 column space, i.e. the result must lie on the line y = 2x. Therefore every point on the unit circle ends up on this line. Now consider the following questions. • The above examples all applied a linear transformation to the unit circle. What would the results have been if the same transformations had been applied to a circle, centered at the origin, of radius 2? cos θ − sin θ • How would the unit circle look if it was multiplied by a matrix of the form ? sin θ cos θ 1 −1 √ • How would the unit circle look if it was multiplied by ? (Hint: factor the scalar 2 out 1 1 of this matrix.) • What would happen to the unit sphere in R3 if it was multiplied by a 3 × 3 diagonal matrix with non-zero entries on the diagonal? What if one of the diagonal entries is 0? What if two of the diagonal entries are 0? • Modify the Maple procedures from this section to see what happens when a transformation is applied to a circle that is not centered at the origin. Is the result still an ellipse? Give a mathematical explanation of how the results in this case compare with the results when the circle is centered at the origin. Example 2. In this example we will look at a particular type of problem that is closely related to much of what we will be doing later on in this book. Suppose you have a set of n points in R2 with distinct x coordinates, then it is possible to ﬁnd a polynomial of degree n− 1 that passes through all the points. This polynomial is called an interpolating polynomial. For example, you can ﬁnd a straight line that passes through any two such points, you can ﬁnd a quadratic which passes through any three such points, four such points determine a cubic polynomial, and so on. To illustrate, suppose we have the following ﬁve points: (1,3), (2,1), (3,6), (4,4), (8,0). We can then ﬁnd a fourth degree polynomial (this is called a quartic) p(x) = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 that passes through all ﬁve points. Furthermore, we can ﬁnd this polynomial by solving a system of ﬁve equations with ﬁve unknowns. The equations making up the system are found by substituting each of the points into the polynomial. Each time we substitute a numerical value for x and p(x) we are left with an equation that is linear in the unknowns a0 , a1 , a2 , a3 , and a4 . This gives the following system a0 + a1 + a2 + a3 + a4 = 3 a0 + 2a1 + 4a2 + 8a3 + 16a4 = 1 a0 + 3a1 + 9a2 + 27a3 + 81a4 = 6 a0 + 4a1 + 16a2 + 64a3 + 256a4 = 4 a0 + 8a1 + 64a2 + 512a3 + 4096a4 = 0 14 0. Review which can be written as the matrix equation 1 1 1 1 1 a0 3 1 2 4 8 16 a1 1 1 3 9 27 81 a2 = 6 1 4 16 64 256 a3 4 1 8 64 512 4096 a4 0 Notice that the coeﬃcient of a0 is always 1 and these values make up the ﬁrst column of the coeﬃcient matrix. The coeﬃcients of a1 are the x coordinates of the given points and these x coordinates make up the second column of the coeﬃcient matrix. The coeﬃcients of a2 are the squares of the x coordinates and these make up the third column of the coeﬃcient matrix. The same pattern continues for the remaining columns. The coeﬃcient matrix of this type of system is called a Vandermonde matrix and there is a command in Maple for deﬁning this type of matrix. We will solve this problem using the following Maple code, but ﬁrst here are some comments about the code: • We have ended most lines of Maple input with a colon to prevent the output from being printed. You might want to change these colons to semi-colons to see the result of each command. • The ﬁrst line deﬁnes a list which we called xvals consisting of the x coordinates of the ﬁve points. The second line deﬁnes a list of the y coordinates. Each of these lists can be seen as a vector in R5 . • The VandermondeMatrix command in Maple takes a vector of the x coordinates as input and returns the corresponding Vandermonde matrix. The third line below deﬁnes the 5×5 Vandermonde matrix for our system. The ﬁrst column consists of the coeﬃcients of a0 , i.e., all 1’s. The second column consists of the coeﬃcients of a1 , i.e., the x values of our points. The third column consists of the coeﬃcients of a2 , i.e., the squares of the x values. And so on. Each column of the Vandermonde matrix consists of the x coordinates raised to various powers. In the ﬁrst column these coordinates are raised to the power 0. In the second column they are raised to the power 1. And so on. • The fourth line solves the system by using the inverse of the Vandermonde matrix. We called the solution sol, so sol is a vector containing the values of the 5 coeﬃcients of our interpolating polynomial. • The ﬁfth line gives another way of solving the system using row reduction. There are actually two commands on this line. The ﬁrst command creates the augmented matrix of our system and then puts it in reduced row echelon form using the rref command. We call the reduced matrix RV. The second command selects the sixth column from this reduced matrix. This column will be the solution to the system. • The sixth line deﬁnes our interpolating polynomial which we call p. In this line we use the add command in Maple . As the parameter i varies from 1 to 5 the expression sol[i] takes on each of the 5 values in sol and multiplies it by the corresponding power of x. The add command then adds these terms together giving us the polynomial. >xvals:=<1,2,3,4,8>: >yvals:=<3,1,6,4,0>: >V:=VandermondeMatrix(xvals): >sol:=V^(-1).yvals): >p:=add(sol[i]*x^(i-1),i=1..5); 15 The last line returns the interpolating polynomial: 1264 1244 129 2 275 3 59 4 − x+ x − x + x 35 21 4 42 140 We will now use Maple to plot this results. Again we will give some comments concerning the commands. • We will include the points in our plot. To do this in Maple we need to make up a list consisting of the 5 points. That is we need the list [[1, 3], [2, 1], [3, 6], [4, 4], [8, 0]] A list in Maple is contained within square brackets so this list contains 5 items. Each item in the list is itself a list of two values, the coordinates of the points. The ﬁrst line deﬁnes this list of points. Note how the seq command is used to create this sequence of coordinates. The expressions xvals[i] and yvals[i] return the x and y coordinates previously deﬁned as i ranges from 1 to 5. • The second line deﬁnes the plot of the points and calls it p1. The plot is not shown at this stage. • The third line deﬁnes the plot of the interpolating polynomial calling it p2. • The fourth line uses the display command from the plots package in Maple . This command allows us to combine several plots into one. >pts:=[seq( [xvals[i],yvals[i]], i=1..5)]: >p1:=plot(pts,style=point,color=black): >p2:=plot(p,x=1..8): >plots[display](p1,p2); This returns Figure 9 which shows the polynomial passing through each of the ﬁve given points. 30 20 10 0 1 2 3 4 5 6 7 8 –10 –20 –30 Figure 9: An interpolating polynomial passing through 5 points. We will look at another example of the same type that involves more points. In the following Maple code the ﬁrst line illustrates another way of deﬁning a vector in Maple . Here we generate a vector in R9 . The expression i->i-5 is a funtion that generates the entries in the vector. The parameter i rages from 1 to 9 and the corresponding entry in the vector is i-5. 16 0. Review The second line generates the y coordinates by evaluating the signum function at each x value. The signum function returns -1 if the input is negative, it returns 1 if the input is positive, and and returns 0 if the input is 0. The Map command just tells Maple to apply this function to each entry in xvals. The rest of example follows the same steps as the previous example. >xvals:=Vector(9, i->i-5); >yvals:=Map(signum,xvals): >V:=VandermondeMatrix(xvals): >sol:=V^(-1).yvals): >p:=add(sol[i]*x^(i-1),i=1..9); >pts:=[seq([xvals[i],yvals[i]],i=1..9)]: >p1:=plot(pts,style=point): >p2:=plot(p,x=-4..4): >plots[display](p1,p2); This returns Figure 10. 1 0.5 –4 –3 –2 –1 0 1 2 3 4 –0.5 –1 Figure 10: An interpolating polynomial passing through 9 points. Finally we will point out that a unique solution to this type of problem is guaranteed as long as the Vandermonde matrix is invertible and this will happen whenever the x values are all distinct. Chapter 1 Change of Basis First, some comments about notation. In this book a basis will be represented by a calligraphic symbol such as B or C. In particular, the standard basis of Rn will be represented by E. If a certain letter is used to represent a basis, then the vectors of that basis will usually be represented by the same letter in lower case bold font with subscripts. For example, B = {b1 , b2 , b3 } would represent a basis of a three dimensional vector space. In this book we will usually understand a basis to be an ordered basis. This means that the vectors in the basis are speciﬁed in a particular order. The same basis vectors in a diﬀerent order would be regarded as a diﬀerent basis. Finally, given the above usage, a capital letter in regular font will indicate the matrix created by using the basis vectors (in the appropriate order) as columns.1 So for example, B = b1 b2 b3 is the matrix formed by using the vectors in basis B as columns. 1.1 Coordinates Relative to a Basis This section is based on three important points that you should know from a previous linear algebra course. • Any non-empty vector space has a basis. • Given a vector space, V , and a basis for that space, B, then any vector in V can be written as a unique linear combination of those basis vectors. • Any non-empty vector space has, in fact, inﬁnitely many possible choices for a basis and each basis will contain the same number of vectors. (This number is the dimension of the vector space.) Let B = {b1 , b2 , . . . , bn } be a basis of some vector space and let v be some vector in that space. It then follows that v = c1 b1 + c2 b2 + · · · + cn bn 1 The matrix formed from the standard basis is an exception to this. This matrix would be the identity matrix and is represented by I. 17 18 1. Change of Basis for some weights c1 , c2 , . . . , cn . These weights are called the coordinates of v relative to basis B. These weights can be seen as a vector in Rn and this vector is represented by the symbol [v]B . So c1 c2 [v]B = . .. cn The vector [v]B is also called the B-representation of v. You should look at the vector [v]B as a recipe for how to create vector v from the vectors in basis B. It gives you the numerical information required to construct v from the basis vectors. The following example will illustrate the points made so far in this chapter: Let 1 1 1 5 B= , C= , 1 −1 2 −1 5 1 1 0 D= , E= , −1 2 0 1 Each of these sets is a basis of R2 (why?). In fact, the last of these bases is just the standard basis of R2 . You should also notice that C and D contain the same vectors in a diﬀerent order. Now let 1 1 10 v = 4b1 + 6b2 = 4 +6 = 1 −1 −2 Notice that v is deﬁned as a linear combination of b1 and b2 . The weights of this linear combination are 4 and 6. These are the coordinates of v relative to basis B, so we have 4 [v]B = 6 The entries in this vector tell you how to obtain v from the vectors in basis B. How can v be obtained from the vectors in C? The answer here should be obvious if you look at the relevant vectors. To obtain v you have to take zero times the ﬁrst basis vector and two times the second basis vector. In other words, 0 [v]C = 2 Similarly it should be very easy to see that 2 [v]D = 0 and 10 [v]E = −2 To recap, the expression [v]B stands for a vector. The entries in this vector are the weights required to create vector v from the vectors in the basis B. Example 1.1.1 1.2. Change of Basis 19 Let 1 0 1 b1 = 1 b2 = 1 b3 = 2 1 1 1 Then B = {b1 , b2 , b3 } will be a basis of R3 . (You should verify this yourself.) 1 Let v = 2. What is [v]B ? 3 The notation is new, but you should recognize this as a variation on one of the funda- mental problems of linear algebra. The problem is how to write one vector as a linear combination of some other vectors. This amounts to solving the system with augmented matrix b1 b2 b3 v Setting up this augmented matrix and applying elementary row operations would result in 1 0 1 1 1 0 0 2 1 1 2 2 ∼ 0 1 0 2 1 1 1 3 0 0 1 −1 The solution to this system gives you the weights of the desired combination so 2 [v]B = 2 −1 0 Now suppose you are told that [u]B = 3, then what is u? 5 Again, the computation is simple. It’s just a matter of understanding the notation. The information that you are given means that 0 0 5 5 u = 0b1 + 3b2 + 5b3 = 0 + 3 + 10 = 13 0 3 5 8 1.2 Change of Basis The Standard Basis c1 c2 Suppose B = {b1 , b2 , . . . bn } is a basis of Rn and [v]B = . . How can you ﬁnd [v]E ? .. cn 20 1. Change of Basis Look at the following equations. (Give the justiﬁcation for each step.) [v]E = v = c1 b1 + c2 b2 + · · · + cn bn c1 c2 = b1 b2 . . . bn . .. cn = B [v]B The above derivation shows that to convert the B-representation of a vector to the representation in terms of the standard basis you just have to multiply by B. This matrix is therefore called the change of basis matrix from B to E and is represented by the symbol PE←B . Suppose, on the other hand, you knew the representation of a vector, v, in the standard basis and wanted the representation relative to B. This is the same problem as trying to write v as a linear combination of the columns of B. It is therefore the same as trying to solve the system Bx = v. The solution to this equation is given by B −1 v = B −1 [v]E . The conclusion we can draw from this −1 is that PB←E = B −1 . In other words, PB←E = PB←E . Example 1.2.1 Let 2 1 B = {b1 , b2 } = , 1 2 Then we have 2 1 PE←B = 1 2 and −1 2 1 2/3 −1/3 PB←E = = 1 2 −1/3 2/3 3 Suppose you are given [u]B = . What is [u]E ? −1 The point of this section is that this can be answered by a matrix multiplication. We have 2 1 3 5 [u]E = PE←B [u]B = = 1 2 −1 1 So the coordinates of vector u relative to the standard basis are (5, 1), and the coordinates of u relative to B are (3, -1). These are illustrated in Figures 1.1 and 1.2. −2 Similarly, if we knew that [v]E = then we would have 4 2/3 −1/3 −2 −8/3 [v]B = PB←E [v]E = = −1/3 2/3 4 10/3 Example 1.2.2 1.2. Change of Basis 21 6 4 2 –2 2 4 6 –2 Figure 1.1: The vector (5,1) relative to the standard basis. Figure 1.2: The vector (5,1) relative to basis B is (3,-1). Let 2 1 0 B = {b1 , b2 , b3 } = 1 , 0 , 1 0 1 1 3 0 [u]B = −1 , [v]E = 0 −1 3 What are [u]E and [v]B ? Again the main point is that both of these questions can be answered by matrix multipli- cation. To ﬁnd [u]E we just have to multiply PE←B [u]B , and from the above discussion this is just B [u]B : 2 1 0 3 5 1 0 1 −1 = 2 = [u] E 0 1 1 −1 −2 More intuitively, when you compute the above matrix multiplication you are just com- bining the columns of B using the weights in vector [u]B . The result will be vector u which is the same as [u]E . For the second question we again just have to multiply by the appropriate change of basis matrix. This means that we must compute PB←E [v]E , and this is just B −1 [v]E . In principle this is ﬁne, but in practice it means that we would have to ﬁrst ﬁnd the inverse of B and then carry out the matrix multiplication. It would be computationally simpler to set up the appropriate augmented matrix and reduce it: 22 1. Change of Basis 2 1 0 0 1 0 0 −1 1 0 1 0 ∼ 0 1 0 2 0 1 1 3 0 0 1 1 −1 So [v]B = 2 . 1 To conﬁrm this result we can compute: −2 2 0 0 −b2 + 2b2 + b3 = −1 + 0 + 1 = 0 = v = [v]E 0 2 1 3 Arbitrary Bases Suppose now that you have two bases of Rn , B and C, and you want to convert a vector from its B-representation to its C-representation. In other words, what is PC←B ? The following diagram is meant to illustrate the answer. E d d B d C −1 d d d B E C −1 C B The idea is that if you start oﬀ with a vector in terms of basis B, then you convert into the standard basis by multiplying by B. When you have the representation in the standard basis you convert into the basis C by multiplying by C −1 . These two steps can be combined into multiplying the original vector by C −1 B. So the change of basis matrix from B to C is PC←B = C −1 B. Notice that one implication of the above discussion is that PB←C = (C −1 B)−1 = B −1 C. Example 1.2.3 2 1 1 2 3 S uppose B = {b1 , b2 } = , , C = {c1 , c2 } = , , and [v]B = . 3 1 3 1 −1 What is [v]C ? To answer this question we just have to evaluate C −1 B [v]B . We will ﬁrst ﬁnd C −1 B using row reduction: 1 2 2 1 1 2 2 1 1 0 4/5 1/5 [C |B] = ∼ ∼ 3 1 3 1 0 −5 −3 −2 0 1 3/5 2/5 1.2. Change of Basis 23 The right hand side2 of the ﬁnal array is C −1 B. So now we multiply this matrix by the given vector: 4/5 1/5 3 11/5 = 3/5 2/5 −1 7/5 11/5 Therefore [v]C = . To conﬁrm this we can compute v from its representation in 7/5 the two bases. From basis B we get 2 1 5 v=3 −1 = 3 1 8 From basis C we get 1 2 11/5 14/5 25/5 5 v = 11/5 + 7/5 = + = = 3 1 33/5 7/5 40/5 8 2 Putting a matrix in reduced row echelon form by elementary row operations is equivalent to multiplying by a matrix. In this example the left hand side of the array was transformed from C to I, which is equivalent to multiplying by C −1 . Since the same row operations were done to the right hand side, the ride hand side becomes C −1 B. 24 1. Change of Basis Exercises 1 2 5 1. Let b1 = , b2 = , and v = . Let B = {b1 , b2 }. 2 1 −2 Find [v]B . 1 1 1 1 2. Let b1 = 1, b2 = 1 , b3 = −1, and v = 5. Let B = {b1 , b2 , b3 }. 1 −1 −1 5 Find [v]B . 1 4 7 7 3. Let b1 = 2, b2 = 5, b3 = 8,and v = 2. 3 6 8 4 (a) Find [v]E . (b) Find [v]B . (c) Find PB←E 3 −1 2 4. Let b1 = , b2 = . Let B = {b1 , b2 } and C = {b2 , b1 }. Suppose [v]B = . −1 1 3 (a) Find [v]E . (b) Find [v]C . (c) Find PB←E . (d) Find PB←C . 3 1 5. Let u = . Find a basis, B, of R2 such that [u]B = . How many choices for B are there in 0 1 this case? 2 −1 0 6. Let b1 = −1, b2 = 2 , b3 = −1 and let B = {b1 , b2 , b3 }. 0 −1 2 1 (a) Suppose [v]B = −1. Find v. 1 0 (b) Suppose u = 0. Find [u]B . 4 1 1 1 3 7. Let b1 = , b2 = , c1 = , c2 = . 1 2 2 4 (a) Find PB←C and PC←B . 5 (b) If [v]B = ﬁnd [v]C . 6 (c) What is [c2 ]B ? (d) What is [c2 ]C ? 1.2. Change of Basis 25 8. Let 1 1 0 1 1 1 B = 1 , 1 , 1 , C = 0 , 2 , 2 1 0 1 0 0 3 Find PC←B and PB←C 9. Let B = {b1 , b2 , b3 } be a basis for R3 . Let C = {b3 , b1 , b2 }. (a) What is [b1 ]B ? (b) What is [b1 ]C ? (c) What is PB←C ? (d) What is PC←B ? 10. Let B = {b1 , b2 , . . . , bn } be a basis for Rn , and let C = {Ab1 , Ab2 , . . . , Abn } for some invertible n × n matrix A. (a) What is PB←C ? (b) What is PC←B ? 11. Suppose B is an invertible n × n matrix, then the columns of B form a basis for Rn . Call this basis B. It follows that B 2 is also invertible, and so the columns of B 2 also form a basis of Rn . Call this basis C. What is PC←B ? 12. Let B = {b1 , b2 , . . . , bn } and C = {c1 , c2 , . . . , cn } be bases of Rn . Show that PC←B = [b1 ]C [b2 ]C · · · [bn ]C 13. Let B = {b1 , b2 , . . . , bn } be a basis for some n dimensional subspace. What is [bi ]B ? 14. Let B, C, and D be diﬀerent bases of Rn . Show that PD←B = PD←C PC←B . 15. Are the following true or false: (a) [u + v]B = [u]B + [v]B (b) [kv]B = k [v]B , where k is a scalar. (c) [Av]B = A [v]B , where A is a matrix of the appropriate dimension. 26 1. Change of Basis 1.3 Examples of Bases The Haar Basis There are some bases that are important enough to be singled out and given special names. One example is the standard basis. Another basis that has some important applications is called the Haar basis. We will occasionally use the Haar basis for examples later in this book. There is a Haar basis for Rn whenever n is a power of 2. If we use the Haar basis for Rn as the columns of a matrix, Hn , we have: 1 1 H2 = 1 −1 1 1 1 0 1 1 −1 0 H4 = 1 −1 0 1 1 −1 0 −1 1 1 1 0 1 0 0 0 1 1 1 0 −1 0 0 0 1 1 −1 0 0 1 0 0 1 1 −1 0 0 −1 0 0 H8 = 1 −1 0 1 0 0 1 0 1 −1 0 1 0 0 −1 0 1 −1 0 −1 0 0 0 1 1 −1 0 −1 0 0 0 −1 The pattern behind the Haar basis should be clear from these examples. One particularly important property of the Haar basis that will become relevant later on is that the vectors in the basis are mutually orthogonal. Example 1.3.1 1 2 Find the coordinates of v = relative to the Haar basis of R4 . 3 4 The simplest way to answer this question is to set up the appropriate augmented matrix and reduce it: 1 1 1 0 1 1 0 0 0 5/2 1 1 −1 0 2 0 1 0 0 −1 ∼ 1 −1 0 1 3 0 0 1 0 −1/2 1 −1 0 −1 4 0 0 0 1 −1/2 The last column gives the coordinates we are looking for. Example 1.3.2 T Compute H4 H4 . By simple matrix multiplication we have 1.3. Examples of Bases 27 1 1 1 1 1 1 1 0 4 0 0 0 1 1 −1 −1 1 1 −1 0 0 4 0 0 = 1 −1 0 0 1 −1 0 1 0 0 2 0 0 0 1 −1 1 −1 0 −1 0 0 0 2 The following questions are meant to reveal the signiﬁcance of the above computation. •What do the entries down the diagonal of this matrix product tell you about the vectors in H4 ? •What do the zeroes oﬀ the diagonal of this product tell you about the vectors in H4 ? −1 •What is H4 ? (This inverse should be fairly easy to ﬁnd given the above matrix −1 product. Hint: is H4 = H T ?) Sampling There is another method of generating vectors, and in some cases basis vectors, that will be men- tioned here. This method might seem strange at ﬁrst, but it lies behind some very important applications of linear algebra. Suppose you have a set of functions, say f1 (x) = 1, f2 (x) = x, f3 (x) = x2 , and f4 (x) = x3 . These functions can be sampled. This means they can be evaluated at a set of equally spaced3 x values. The values generated by each function are then placed in a vector. For example we can evaluate the above functions at x = 0, 1, 2, 3. Evaluating the four functions at these points gives the following corresponding vectors: 1 1 1 1 1 2 4 8 v1 = v2 = v3 = v2 = 1 3 9 27 1 4 16 64 Figure 1.3 illustrates the connection between the function fi (x) and the vector vi You can easily conﬁrm that these vectors form a basis for R4 . Suppose we sample the same 4 functions at x = −2, −1, 1, 2 and use the resulting vectors as columns of matrix. The resulting matrix would be 1 −2 4 −8 1 −1 1 −1 1 1 1 1 1 2 4 8 You will again ﬁnd that this matrix is invertible so in this case the sampled vectors would be a basis of R4 . If we had the functions g1 (x) = 1, g2 (x) = x, g3 (x) = 2x, and g4 (x) = 3x and sampled them at x = 1, 2, 3, 4 and use the sampled vectors as columns of a matrix we would get 1 1 2 3 1 2 4 6 1 3 6 9 1 4 8 12 3 It is not absolutely necessary that the input values be equally spaced but it is more common 28 1. Change of Basis 26 24 22 20 18 16 14 12 10 8 6 4 2 0 1 2 3 Figure 1.3: The polynomials 1,x, x2 , and x3 and their sampled values. This matrix would not be invertible so in this case the sampled vectors do not form a basis of R4 . As another example we will sample the functions gn (x) = cos(nx/2) for n = 1, 2, . . . , 6 at the values x = 0, π/3, 2π/3, π, 4π/3, 5π/3. If we use the resulting vectors as the columns of a matrix we get 1 1 1 1 1 1 √ √ 1 1/2 3 1/2 0 −1/2 −1/2 3 1 1/2 −1/2 −1 −1/2 1/2 1 0 −1 0 1 0 1 −1/2 −1/2 1 −1/2 −1/2 √ √ 1 −1/2 3 1/2 0 −1/2 1/2 3 This matrix would have a non-zero determinant so the vectors are linearly independent and are therefore a basis for R6 . Figure 1.4 is a plot of the cosine functions used in this example and the sampled values. 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 –0.2 –0.4 –0.6 –0.8 –1 Figure 1.4: A basis for R6 generated by sampling a set of cosine functions. 1.3. Examples of Bases 29 This ﬁgure shows 6 cosine functions. One each of the curves there are 6 points marked. The y values at these points are the entries of the sampled vector. Sampling a set of functions in this way will not always give independent vectors, but there are many important bases that are generated in this way. You can look at sampling as a technique of approximating a continuous function by a ﬁnite, discrete set of points. In practice, sampling is not restricted to this procedure of ﬁnding a basis. More commonly it is used as means of simply collecting data. If any physical parameter is measured at regular intervals the parameter is said to be sampled. The data collected can be seen as a vector. For example, a record was kept of the number of lynx trapped yearly in the Mackenzie River District of Western Canada from 1821 to 1934. This data corresponds to a vector with 114 entries. How can this vector in R114 be plotted? The most natural way is to plot the entries as the y coordinates of a series of points where the x coordinates run from 1 to 114 (or from 1821 to 1934). This would result in Figure 1.5. 7000 6000 5000 4000 lynx 3000 2000 1000 0 20 40 60 80 100 time Figure 1.5: Plot of the number of lynx captured against time. This data can be interpreted as representative of the overall lynx population. Plotting the vector in this way reveals that the lynx population seems to oscillate over time. Why would anyone possibly want to convert this vector to a diﬀerent basis? At this point there doesn’t seem to be any possible reason for changing to a new basis; but, as we will see later, there is a very important reason for converting data sets to a new basis. Two-dimensional Bases One of the key ideas in this book is that you can interpret a list of data values as a vector, but data can also be arranged in a two-dimensional format as rows and columns – that is, as a matrix. One common example of this is a a digitized image. A monochrome digital image such as picture on a computer screen can be seen as a two dimensional array of pixel values where each value in the matrix indicates the gray level intensity of the corresponding pixel. A color image can be seen as a combination of three such matrices. One each for the intensities of the red, blue and green components of each pixel. Let {h1 , h2 , . . . , h8 } be the Haar basis for R8 . Then each product hi hT will give an 8 × 8 matrix. j There will be 64 such matrices and, although we will not prove this, these matrices will in fact be a basis for the set of all possible 8 × 8 matrices. In other words, any 8 × 8 image is just some linear combination of these 64 basis images. Figure 1.6 is a picture of what these basis images look like. 30 1. Change of Basis Figure 1.6: A 2 dimensional Haar basis. So if you look, for example, at the image in row 3 column 2 it is generated by 1 1 −1 −1 h3 hT 2 = 0 1 1 1 1 −1 −1 −1 −1 0 0 0 1 1 1 1 −1 −1 −1 −1 1 1 1 1 −1 −1 −1 −1 −1 −1 −1 −1 1 1 1 1 −1 −1 −1 −1 1 1 1 1 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 and the black, white, and gray regions of the image correspond to entries of -1, 0, and 1 respectively. 1.3. Examples of Bases 31 The third image in the bottom row would be h8 hT . This would be 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 −1 −1 0 0 0 0 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 −1 −1 0 0 0 0 −1 −1 −1 1 1 0 0 0 0 Figure 1.6 consists of 64 8 × 8 pictures. Each picture by itself is not very interesting, but as mentioned before any 8 × 8 image can be obtained by some linear combination of these. How can you ﬁnd the coordinates of a two-dimensional array relative to a basis? Suppose A is a square matrix and we want to convert it to the two-dimensional Haar basis as described above. Let P be the change of basis matrix from the standard basis to the Haar basis. To convert matrix A we ﬁrst convert the columns of A to the Haar basis. This is done by computing the product P A. We then convert the rows of the resulting matrix to the Haar basis. We can do this by converting the rows to columns by taking the transpose and then multiplying by P . This gives, P (P A)T = P AT P T . We then take the transpose again to put rows and columns back into their original position. This gives (P AT P T )T = P AP T . Example 1.3.3 T o illustrate the above let 1 0 0 1 0 1 1 0 A= 0 1 1 0 1 0 0 1 and 1 1 1 0 1 1 −1 0 H = h1 h2 h3 h4 = 1 −1 0 1 1 −1 0 −1 Let P = H −1 , then to convert to the two-dimensional Haar basis we compute 1/2 0 0 0 0 0 0 0 P AP T = 0 0 1/2 −1/2 0 0 −1/2 1/2 The entries in this matrix are the weights given to hi hT to construct matrix A. In this j case most of those weights are 0. If we just look at the non-zero weights we have A = 1/2 h1 hT + h3 hT + h4 hT − 1/2 h3 hT + h4 hT 1 3 4 4 3 The ﬁrst three terms correspond to the diagonal entries of P AP T , the last two correspond to the oﬀ-diagonal entries. If we compute this linear combination by ﬁrst evaluating the 32 1. Change of Basis Figure 1.7: A 2 dimensional cosine basis. diagonal and oﬀ-diagonal components separately we get 1/2 h1 hT + h3 hT + h4 hT − 1/2 h3 hT + h4 hT 1 3 4 4 3 1 0 1/2 1/2 0 0 −1/2 1/2 0 1 1/2 1/2 0 0 1/2 −1/2 = 1/2 1/2 1 + 0 −1/2 1/2 0 0 1/2 1/2 0 1 1/2 −1/2 0 0 1 0 0 1 Student Version of MATLAB 0 1 1 0 = 0 1 1 0 1 0 0 1 There is a format for compressing digital images called JPEG compression. This is a technique for reducing the amount of computer memory required to store a digital image. This technique involves converting 8 × 8 images into a set of basis images that are generated from the Discrete Cosine Basis. This basis is generated by sampling the functions π 1 fk (t) = cos k(t − ) 8 2 for t = 1, 2, 3, . . . , 8. The resulting basis vectors are then multiplied together in the same way as with the Haar basis and we obtain 64 images. These basis images are shown in Figure 1.7. Again, each image corresponds to an 8 × 8 matrix of the form uvT . 1.3. Examples of Bases 33 Exercises 1 0 1. Let H be the Haar basis for R4 . Let v = . 0 1 Find [v]H . 2. Let B = {e1 , e1 + e2 , e1 + e2 + e3 , e1 + e2 + e3 + e4 }. Let H be the Haar basis for R4 . (a) What is PH←B ? (b) What is PB←H ? 3. Let f1 (x) = 1x , f2 (x) = 2x , f3 (x) = 3x , f4 (x) = 4x . If these functions are sampled at x = 0, 1, 2, 3 do the resulting vectors form a basis of R4 ? 4. Let f1 (x) = cos x, f2 (x) = cos 2x, f3 (x) = cos 3x, f4 (x) = cos 4x. If these functions are sampled at x = 0, π/4, π/2, 3π/4 do the resulting vectors form a basis of R4 ? 5. Show that if you sample a line in R3 at three diﬀerent points the resulting vectors will not be a basis for R3 . 6. Let E = {e1 , e2 } be the standard basis of R2 . Write down the basis for the vector space of all 2 × 2 a b matrices generated by products of the form ei eT . Write the matrix j as a linear combination c d of these basis vectors. 1 1 7. Let B = b1 b2 = . Write down the basis for the vector space of all 2 × 2 matrices 1 −1 4 0 generated by products of the form bi bT . Let Let A = j and let P = B −1 . Show that 2 1 P AP T gives the weights required to write A as a linear combination of the basis you found. 1 0 2 4 8. Repeat the previous problem for B = and A = . 2 1 4 2 34 1. Change of Basis Using MAPLE Example 1. In this example we will begin by using Maple to generate a basis for R5 by sampling the functions 1, x, x2 , x3 , x4 at the values x = .2, .4, .6, .8, 1.0. This is precisely what happens with the Vandermonde matrix that we saw in the previous chapter. So we can proceed as follows: >with(LinearAlgebra): >xv:=<.2,.4,.6,.8,1.0>; >B:=VandermondeMatrix(xv); >Determinant(B); .00002949 If you look at matrix B the connection between each of the 5 columns of B and each of the 5 functions being sampled should be clear. √ √ Now we will deﬁne two new vectors, u and w, by sampling f (x) = x5 + 1 and g(x) = x + 1 − x at the same values. We now have two vectors in R5 and a basis, B, for R5 . We will ﬁnd the coordinates of these vectors relative to basis B. This just involves multiplying the vectots by the change of basis matrix PB←E = B −1 . The following commands illustrate one method for sampling a function over a given set of values. First we will deﬁne the functions to be sampled using the “arrow” notation. Next we need a list of values where the functions will be sampled. We already have these values in xv. Finally we can use the map command in Maple to evaluate the functions at the given values. >f:=x -> x^5+1; >g:=x -> sqrt(x)+sqrt(1-x); >u:=map(f, xv); >w:=map(g,xv); >ub:=B^(-1).u; [1.0384, -.4384, 1.800, -3.400, 3.000]) >wb:=B^(-1).w; [1.0000, 3.0137, -8.5037, 10.9801, -5.4901] 1.03840 1.0000 −.43840 3.0137 1.8000 and [w] = −8.5037 . This tells us that [u]B = B −3.4000 10.9801 3.0000 −5.4901 Now our basis vectors, v1 , . . . , v5 , in this example can be seen as discrete approximations of the functions 1, x, x2 , x3 , x4 and u was a discrete approximation of x5 + 1. Notice: • The discrete vector u is a linear combination of the discrete basis v1 , . . . , v5 . • The continuous function x5 +1 is not a linear combination of the continuous functions 1, x, x2 , x3 , x4 . 1.3. Examples of Bases 35 We will combine 1, x, x2 , x3 , x4 using the weights of [u]B and [w]B , and then plot the results. There is another way of looking at what we are doing that might make things clearer. We started with two functions f (x) and g(x). On each of these functions we chose ﬁve points. We then found the interpolating polynomials of degree four passing through these points. In the following Maple commands we will call these interpolating polynomials f1 anf g1. There is one aspect of the following commands that sometimes causes confusion: we deﬁned f and g as functions in Maple , but f1 and g1 are being deﬁned as expressions. The plot command in Maple can be used with either functions or expressions with a slight diﬀerence in the syntax. We have opted for using expressions with the plot command. When we enter f(x) and g(x) as inputs to the plot √ √ commands below we obtain the expressions x5 + 1 and x + 1 − x. >f1:=add(ub[i]*x^(i-1),i=1..5); >g1:=add(wb[i]*x^(i-1),i=1..5); >plot([f(x), f1],x=-.2..1.2,color=black,thickness=[1,2]); >plot([g(x), g1],x=0..1, color=black, thickness=[1,2]); This gives the plots shown in Figure 1.8 and Figure 1.9. 3.5 1.4 3 1.3 2.5 1.2 2 1.1 1.5 1 1 –0.2 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 x x Figure 1.8: Plot of f (x) and f 1. Figure 1.9: Plot of g(x) and g1. We see that if we use apply the weights obtained from the discrete case to the continuous case we get a linear combination of 1, x, x2 , x3 , x4 that gives a fairly good approximation to x5 + 1 on the interval [0, 1]. The plot we drew seems to imply the functions are very close except when we are close to x = 0 where we can see the plots begin to diverge. The following plot gives a better idea about how close these two functions are. We will plot the diﬀerence between the two functions: >plot(f(x)-f1,x=0..1,color=black); >plot(g(x)-g1,x=0..1,color=black); The resulting plots are shown in Figure 1.10 and Figure 1.11. This shows the vertical distance between the functions, we see that the linear combination we com- puted oscillates around x5 + 1 with the largest distance (as was clear from the previous plot) occurring at x = 0 but with a comparatively small distance over most of the interval. The points where the 36 1. Change of Basis x 0.2 0.4 0.6 0.8 1 0 0.06 –0.01 0.04 –0.02 0.02 –0.03 0 0.2 0.4 0.6 0.8 1 x Figure 1.10: Plot of f (x) − f 1. Figure 1.11: Plot of g(x) − g1. two functions agree (where the distance is 0) are precisely the points where we sampled the functions: x = 0.2, 0.4, 0.6, 0.8, 1.0. Now lets create another basis for R5 . This time we will sample the same functions, but at the points .4, .6, .8, 1, 1.2. Here we will use the seq command in Maple to deﬁne our vector of x values. >xv2:=Vector(5, i->.2+.2*i); >C:=VandermondeMatrix(xv2); Let’s call our new basis C. What is the change of basis matrix from B to C? From results in this chapter it should be clear that this would be the matrix C −1 B. In Maple >C^(-1).B; This gives (after rounding) the change of basis matrix 1 −.2000 .0400 −.0080 .0016 0 1 −.400 .1200 −.3200 0 0 1 −.6000 .2400 0 0 0 1 −.8000 0 0 0 0 1 Hard Question: What is the connection between the entries in this change of basis matrix and the binomial theorem? Example 2. In this example we will generate a basis for R6 by sampling the functions cos kt for k = 0, 1, . . . , 5 at the points 0, π/5, 2π/5, 3π/5, 4π/5, π. We will illustrate a slightly diﬀerent approach from the previous example. The command unapply that is used in the ﬁrst line converts Maple expressions to Maple functions. So, for example, the command unapply(cos(2*t), t) would return the function t -> cos(2*t). 1.3. Examples of Bases 37 >for i from 1 to 6 do f[i]:=unapply(cos((i-1)*t),t) od; >tvals:=Vector( 6, i-> evalf((i-1)*Pi/5)); >A:=Matrix(6,6, (i,j)->f[j](tvals[i])); >Determinant(A); -62.5 The last line conﬁrms that the sampled vectors form a basis for R6 . Now we will sample the function t2 1 g(t) = + 6 t+1 at the same points and generate another vector in R6 . Then we will express this vector as a linear combination of the basis vectors. >g:=t->t^2/6+1/(t+1); >u:=map(g,tvals); >x:=A^(-1).u; # calculate the weights Finally, we will combine the continuous functions that we used to generate our basis using the weights that we found and plot the result along with g(t). >p:=add(x[i]*f[i](t),i=1..6); >data:=[seq( [tvals[i],u[i]],i=1..6)]; # the sampled points of g(t) >p1:=plot([p,g(t)],t=0..Pi): >p2:=plot(data,style=point,symbol=box): >plots[display]([p1,p2]); We get the following plot: 1.8 1.6 1.4 1.2 1 0.8 0.6 0 0.5 1 1.5 2 2.5 3 t We again see that by using the weights we found we get a linear combination that agrees exactly at the sampled points (where we computed our discrete basis). Example 3 In this example we will illustrate the idea of a two-dimensional basis. We will ﬁrst generate an 20 × 20 matrix and display it as an image. 38 1. Change of Basis >with(plots): >A:=Matrix(20,20,(i,j)-> if abs(i-10)+abs(j-10)<8 then 1 else 0 fi); >matrixplot(A,orientation=[0,0],heights=histogram,style=patchnogrid, shading=zgrayscale,scaling=constrained); This deﬁnes a 20 × 20 matrix all of whose entries are 0 or 1. If we interpret the entries in the matrix as a greyscale color where 0 is black and 1 is white then the matrix would give theimage shown in Figure 1.10: Figure 1.12: Matrix A as an image. We will now convert the image to a diﬀerent basis. Any invertible matrix corresponds to a change of basis matrix. We will deﬁne a band matrix B. >B:=BandMatrix([.05,.1,.2,.5,.2,.1,.05],3,20): >P:=B^(-1): >A1:=P.A.P^%T: >A2:=B.A.B^%T: >matrixplot(A1,orientation=[0,0],heights=histogram,style=patchnogrid, shading=zgrayscale,scaling=constrained); >matrixplot(A2,orientation=[0,0],heights=histogram,style=patchnogrid, shading=zgrayscale,scaling=constrained); This gives two other 20 × 20 matrices which correspond to the following images: Figure 1.13: Matrix A in a new basis. Figure 1.14: Matrix A in a new basis. Chapter 2 Eigenvalues and Eigenvectors What happens when you multiply a vector by a square matrix? One simple answer to this question is that you get another vector of the same size. Typically both the length and the direction of the matrix-vector product will be diﬀerent from those of the original vector but in some cases it is possible that the direction will stay the same.1 If the vectors do have the same direcection it means that multiplying the original vector by the matrix is equivalent to multiplying that vector by a scalar. At ﬁrst the fact that this can occur might seem to be nothing more than a curiosity or a coincidence but in fact it will lead to some of the most important applications of linear algebra. This chapter and the next will look at the signiﬁcance of this idea. 2.1 Eigenvectors and Eigenvalues We start with the main deﬁnition of the chapter. Deﬁnition 1 Let A be an n × n matrix. If v is a non-zero vector such that Av = λv for some scalar λ then v is called an eigenvector of A and λ is called an eigenvalue of A. For example, let 1 −1 2 1 A= , u= , v= 2 4 1 −2 Then we have 1 3 Au = and Av = 8 −6 Notice in Figure 2.1 that Au and u are not parallel. Au is not a scalar multiple of u. On the other hand Av is parallel to v. Av is a scalar multiple of v. In fact, Av = 3v so v is an eigenvector of A with corresponding eigenvalue 3. So multiplying v by A has the eﬀect of scaling v by a factor of 3. 1 In this book when we say two non-zero vectors have the same direction we will mean that one vector is a scalar multiple of the other, including the possibility that this scalar is negative or zero. In other words, both vectors lie on the same line through the origin. If one vector is a negative multiple of the other they are said to have the same direction but a diﬀerent sense. 39 40 2. Eigenvalues and Eigenvectors 8 Au 6 4 2 u –4 –2 2 4 –2 v –4 –6 Av Figure 2.1: Plot of u, v, Au, and Av. Theorem 2.1 Let A be an n × n matrix. Then A is invertible if and only if the number 0 is not an eigenvalue of A. Proof. Suppose 0 is an eigenvalue of A, then Av = 0v = 0 for some non-zero vector v. This means that the vector v would be a non-trivial solution of the system Ax = 0. Since this system has a non-trivial solution it follows that A is not invertible. On the other hand if A is not invertible then Ax = 0 has non-trivial solutions, and so there is a non-zero vector v such that Av = 0 = 0v. Therefore 0 is an eigenvalue of A with v a corresponding eigenvector. Theorem 2.2 If v1 , v2 , . . . , vp are eigenvectors of matrix A that correspond to distinct eigenvalues λ1 , λ2 , . . . , λp , then the set {v1 , v2 , . . . , vp } is linearly independent. Proof. The set {v1 , v2 , . . . , vp } is either linearly dependent or linearly independent. We will begin by assuming this set is linearly dependent and show that this assumption leads to a contradiction. We can then conclude that the set must be linearly independent. If {v1 , v2 , . . . , vp } is linearly dependent then there is some number k < p such that {v1 , v2 , . . . , vk } is linearly independent2 and vk+1 = c1 v1 + c2 v2 + · · · + ck vk Equation 1 for some scalars c1 , c2 , . . . , ck . Multiplying both sides of equation 1 by A we have Avk+1 = c1 Av1 + c2 Av2 + · · · + ck Avk Equation 2 2 This is basically saying that if the set of vectors is linearly dependent then one of the vectors is a linear combination of the others. In fact, it must be some linear combination of a linearly independent subset of the other vectors. So just suppose that the set of vectors is ordered so that these independent vectors come ﬁrst. 2.1. Eigenvectors and Eigenvalues 41 Since the vi are eigenvectors of A we have λk+1 vk+1 = c1 λ1 v1 + c2 λ2 v2 + · · · + ck λk vk Equation 3 Multiplying equation 1 by λk+1 we get λk+1 vk+1 = c1 λk+1 v1 + c2 λk+1 v2 + · · · + ck λk+1 vk Equation 4 Subtracting equation 4 from equation 3 we get 0 = c1 (λ1 − λk+1 )v1 + c2 (λ2 − λk+1 )v2 + · · · + ck (λk − λk+1 )vk Because {v1 , v2 , . . . , vk } is linearly independent the weights in this last equation must all be 0. But λi − λk+1 = 0 for i = 1, . . . , k since the eigenvalues are distinct. Therefore ci = 0 for i = 1, . . . , k. But this implies that vk+1 = 0 which is impossible since this vector is an eigenvector of A. We can then conclude that {v1 , v2 , . . . , vp } is linearly independent. Computing Eigenvalues and Eigenvectors Given a square matrix A an eigenvector is by deﬁnition a non-zero vector v that satisﬁes the equation Av = λv for some scalar λ. This equation can be rewritten as Av − λv = 0 and then by factoring we get (A − λI) v = 0 It then follows that the eigenvector v is a non-trivial solution of the system (A − λI)x = 0 since an eigenvector is a non-zero vector. This argument works in reverse also. That is, any non-trivial solution of (A − λI)x = 0 is a eigenvector of A corresponding to eigenvalue λ. So we have the important result that the eigenvectors of A are exactly the non-trivial solutions of (A − λI)x = 0. But this system has non-trivial solutions if and only if A − λI is not invertible, and this happens if and only if det(A − λI) = 0. This is the idea that we will use to ﬁnd the eigenvalues of a matrix: λ is an eigenvalue of matrix A if and only if det(A − λI) = 0. Deﬁnition 2 Given an n × n matrix A the polynomial det(A − λI) is called the characteristic polynomial of A. The equation det(A − λI) = 0 is called the characteristic equation of A. One of the implications of the above discussion is that if A is an n × n matrix then the charac- teristic polynomial of A has degree n. A must therefore have n eigenvalues though these eigenvalues don’t have to be distinct.3 If λ is an eigenvalue of matrix A, then the corresponding eigenvectors are the non-trivial solutions to (A − λI)x = 0. But the solutions to this equation constitute the null space of A − λI, so the 3 The roots of a polynomial can be found by factoring. If one factor is repeated k times the corresponding root is said to have multiplicity k. For example, the sixth degree polynomial p(t) = (t − 4)(t − 5)2 (t − 7)3 has six roots. The six roots are not distinct. The roots are 4, 5 (multiplicity 2), and 7 (multiplicity 3). 42 2. Eigenvalues and Eigenvectors eigenvectors corresponding to a speciﬁc eigenvalue (along with the zero vector) must form a subspace of Rn . This leads to the following deﬁnition. Deﬁnition 3 Let A be an n × n matrix and let λ be an eigenvalue of A. The set of eigenvec- tors corresponding to λ along with the zero vector is a subspace of Rn . This subspace is called an eigenspace. We will use the notation Eλ to refer to the eigenspace corresponding to eigenvalue λ. Note that the zero vector itself is not an eigenvector for any eigenvalue, but the zero vector is in the eigenspace of every eigenvalue. Example 2.1.1 2 1 S uppose we want to ﬁnd the eigenvalues and eigenvectors of A = . The ﬁrst step 1 2 is to compute the characteristic polynomial det(A − λI). 2−λ 1 |A − λI| = = λ2 − 4λ + 3 = (λ − 3)(λ − 1) 1 2−λ This gives eigenvalues of 3 and 1. For λ = 3 we ﬁnd a basis for the eigenspace by solving (A − 3I)x = 0 as follows −1 1 0 1 −1 0 ∼ 1 −1 0 0 0 0 1 The solution here is x = t . So E3 is a one dimensional subspace of R2 . Any non-zero 1 vector from this subspace is an eigenvector with an eigenvalue of 3. If, for example, we 4 take u = we get 4 2 1 4 8+4 12 4 Au = = = =3 1 2 4 4+8 12 4 Similarly for λ = 1 we solve (A − I)x = 0: 1 1 0 1 1 0 ∼ 1 1 0 0 0 0 1 Here we have the solution x = t . So E1 is a line through the origin in R2 . Every −1 point on this line except the origin is an eigenvector with λ = 1. So, for example, if we 5 take v = we get −5 2 1 5 10 − 5 5 Av = = = 1 2 −5 5 − 10 −5 1 1 3 Now let w = 2 +1 = . The vector w is a linear combination of eigenvectors 1 −1 1 but is not an eigenvector itself since 2 1 3 7 Aw = = 1 2 1 5 2.1. Eigenvectors and Eigenvalues 43 and this result is not a scalar multiple of w. The following diagram gives an indication of what happens when w is multiplied by A. 8 6 Aw 4 This component is scaled by 3 2 w 0 2 4 6 8 This component is scaled by 1 –2 Figure 2.2: Vector w is multiplied by A. w has components in each eigenspace which are stretched by diﬀerent amounts when multiplied by A. REMARK: There is a simple method for ﬁnding eigenvectors of a 2 × 2 matrix. Suppose you have a 2 × 2 matrix and you want to ﬁnd the eigenvectors – that is, you want to ﬁnd a basis for the eigenspaces. You ﬁrst subtract the eigenvalue down the main diagonal. At » – a b this point you would have a matrix of the form . When you reduce this matrix you c d have to get a row of zeroes. In other words you can make one of the rows equal to 0 right away.»It usually won’t matter which row. So at this point you would have a matrix of the – a b form . Now the eigenspace is just the null space of this matrix, but the null space 0 0 consists of vectors orthogonal to the rows of the matrix. To ﬁnd a vector orthogonal to » – a you just have to switch the entries and change the sign of one of them. Doing this b » – −b you get as a basis of the eigenspace. a » – 5 3 For example, the matrix has an eigenvalue of λ = 3. What is a basis of the 4 9 corresponding eigenspace. Subtracting 3 down the diagonal (actually – » just from the 5 in 2 3 the ﬁrst row) and making one of rows a row of zeroes we get . A basis for the 0 0 eigenspace will be any vector orthogonal to the ﬁrst row. So a basis for this eigenspace » – 3 would be . −2 Remember, this method is only valid for 2 × 2 matrices. Example 2.1.2 0 1 0 This next example will involve a 3 × 3 matrix. Let A = 8 0 8. 0 1 0 44 2. Eigenvalues and Eigenvectors The characteristic polynomial of A would be −λ 1 0 8 −λ 8 = 16λ − λ3 = λ(λ2 − 16) 0 1 −λ We then see that A has 3 eigenvalues: λ1 = 0, λ2 = 4, and λ3 = −4. If you look at the columns of A it is clear that they are not linearly independent and so A is not invertible. For this reason we should have expected 0 to be an eigenvalue. For λ1 = 0 we set up and reduce the following augmented matrix: 0 1 0 0 1 0 1 0 8 0 8 0 ∼ 0 1 0 0 0 1 0 0 0 0 0 0 1 It is then easy to obtain 0 as a basis for E0 . −1 For λ2 = 4 the same procedure gives −4 1 0 0 1 0 −1 0 8 −4 8 0 ∼ 0 1 −4 0 0 1 −4 0 0 0 0 0 1 This gives a basis of 4 for E4 . 1 1 We’ll skip the details but a similar procedure gives a basis of −4 for E−4 . 1 The preceding examples illustrate a 3-step method for ﬁnd eigenvalues and eigenvectors of a square matrix. The steps are: • Write down A − λI and calculate its determinant to get the characteristic polynomial. • Find the roots of the characteristic polynomial. These are the eigenvalues of A. • For each eigenvalue set up the augmented matrix of the system (A − λI)x = 0 and solve using elementary row operations. If you haven’t made any calculation errors this system must have non-trivial solutions. These non-trivial solutions are exactly the eigenvectors for the eigenvalue used. For large matrices there are more sophisticated numerical procedures that are used to ﬁnd eigen- values and eigenvectors, but for small matrices the above steps will often work easily. In some cases there are also shortcuts that can be used. For example, look at the following theorem. Theorem 2.3 The eigenvalues of a triangular matrix are the entries on the diagonal. This theorem is a simple consequence of the fact that the determinant of a triangular matrix is the product of the entries on the diagonal. Example 2.1.3 2.1. Eigenvectors and Eigenvalues 45 Find the eigenvalues and a basis for each eigenspace of 3 1 1 A = 0 3 1 0 0 1 In this case we get 3−λ 1 1 det(A − λI) = 0 3−λ 1 = (3 − λ)2 (1 − λ) 0 0 1−λ So this matrix has two eigenvalues. λ = 3 is an eigenvalue of multiplicity 2, and λ = 1 is an eigenvalue. To ﬁnd a basis for E3 we have the following 0 1 1 0 0 1 0 0 0 0 1 0 ∼ 0 0 1 0 0 0 −2 0 0 0 0 0 1 This gives a basis of 0, so E3 is in fact the x1 axis. 0 For λ = 1 we get 2 1 1 0 1 0 1/4 0 0 2 1 0 ∼ 0 1 1/2 0 0 0 0 0 0 0 0 0 −1/4 This gives a basis −1/2 for E1 or if you prefer this vector could be scaled by -4 1 1 resulting in the basis vector 2 . −4 We end this section with one more deﬁnition. Deﬁnition 4 An eigenbasis of Rn is a basis composed entirely out of eigenvectors for some n × n matrix A. 46 2. Eigenvalues and Eigenvectors Exercises 1. Is λ = 2 an eigenvalue of the following matrices? 4 1 3 3 0 2 (a) (b) (c) 2 3 2 8 1 0 2. Is λ = 4 an eigenvalue of the following matrices? 2 0 1 1 1 1 4 4 4 (a) 4 4 −2 (b) 3 3 −1 (c) 4 4 4 −2 0 5 0 1 2 4 4 4 1 3. Is −1 an eigenvector of the following matrices? 1 2 3 1 0 −1 2 0 0 0 (a) 1 3 2 (b) 2 2 −3 (c) 0 0 0 3 4 1 4 1 0 0 0 0 1 a b c 4. Show that 1 is an eigenvector of c a b . What is the corresponding eigenvalue? 1 b c a −3 1 −3 5. The matrix 20 3 10 has eigenvalues 3 and -2. Find a basis for E3 and E−2 . 2 −2 4 4 4 6. Let A = . For what value(s) of k (if any) is k 1 (a) λ = 2 an eigenvalue of A? 2 (b) is an eigenvector of A? 3 (c) is A not diagonalizable? (d) is A not invertible? 7. Find the eigenvalues of the following matrices. Find a basis of each eigenspace. 0 1 1 3 (a) (b) 2 1 −1 −3 2 3 7 −2 (c) (d) 1 4 2 3 8. Find the eigenvalues of the following matrices. Find a basis for each eigenspace. 2.1. Eigenvectors and Eigenvalues 47 0 0 1 1 0 0 (a) 0 5 0 (b) 0 5 0 −2 0 3 3 0 −2 0 1 1 1 1 0 (c) 1 0 1 (d) 1 0 1 1 1 0 0 1 1 9. Find the eigenvalues of the following matrices. Find a basis for each eigenspace. 2 1 1 1 1 1 (a) 0 2 1 (b) 1 1 1 0 0 1 1 1 1 2 0 0 0 0 2 (c) 0 2 0 (d) 0 2 0 0 0 2 2 0 0 10. Find a basis for each eigenspace of 2 0 0 0 2 1 0 0 2 1 2 0 2 1 1 1 11. Find a basis for each eigenspace of 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 12. Construct speciﬁc numerical 2 × 2 matrices, A and B, to illustrate the following statements: (a) If λ1 is an eigenvalue of A and λ2 is an eigenvalue of B, then you cannot conclude that λ1 + λ2 is an eigenvalue of A + B. (b) If λ1 is an eigenvalue of A and λ2 is an eigenvalue of B, then you cannot conclude that λ1 λ2 is an eigenvalue of AB. 0 0 1 13. Let A = 0 1 0. Notice that when you multiply a vector in R3 by A the order of the entries 1 0 0 gets reversed. What do eigenvectors of A look like? What happens to these eigenvectors when you reverse the order of the entries? a b 14. Let A = . Show that the characteristic polynomial is λ2 − tr(A)λ + det A. Here tr(A) c d stands for the trace of A, that is the sum of the entries on the main diagonal of A. This gives a simple procedure for ﬁnding the characteristic polynomial of a 2 × 2 matrix. 15. Show that the trace of a matrix is the sum of the eigenvalues. This means that similar matrices have the same same trace. 48 2. Eigenvalues and Eigenvectors 16. Let A be a square matrix. Show that A and AT have the same eigenvalues. (Hint: show that they have the same characteristic polynomial.) 17. Give an example to show that A and AT do not necessarily have the same eigenspaces. 18. Let A and B be n×n matrices. If B is invertible, show that AB and BA have the same eigenvalues by showing that they have the same characteristic equation. 19. Suppose that A is an m × n matrix and B is an n × m matrix. (a) Show that AB and BA have the same non-zero eigenvalues. (b) Show that if m > n then λ = 0 must be an eigenvalue of AB. (c) What are the eigenvalues of the matrices vT v and vvT where v is in Rn . 0 1 1 1 0 (d) Let A = and B = 1 0. What are the eigenvalues of AB and BA? 0 0 2 0 1 20. Let A be an invertible matrix. Show that if v is an eigenvector of A with corresponding eigenvalue λ, then v is also an eigenvector of A−1 with corresponding eigenvalue 1/λ. 21. Let A be a square matrix. Show that if v is an eigenvector of A with corresponding eigenvalue λ, then v is also an eigenvector of A2 with corresponding eigenvalue λ2 . 22. Show that if v is an eigenvector of both A and B, then v is an eigenvector of AB. 23. A matrix, P , is called idempotent if P 2 = P . Show that if a matrix P is idempotent then the only possible eigenvalues of P are 0 and 1. 24. The Cayley-Hamilton Theorem says that any square matrix satisﬁes it’s own characteristic equation (interpreted as a matrix equation). Verify the Cayley-Hamilton Theorem for the following speciﬁc matrices: 1 1 1 2 (a) (b) 1 1 3 4 0 1 0 0 0 2 0 0 0 1 0 (c) 0 0 .5 (d) 0 0 0 1 1 0 0 0 0 0 0 2.1. Eigenvectors and Eigenvalues 49 Using MAPLE Example 1 The LinearAlgebra package in Maple contains several procedures related to eigenvalues and eigen- 1 2 vectors which we will illustrate in this section. We begin by deﬁning a matrix A = . The 3 15 command CharacteristicPolynomial computes the characteristic polynomial of a matrix. This com- mand requires two inputs: the matrix and the variable that will be used in the characteristic polynomial. As illustrated below if you want to use the variable λ for the characteristic polynomial you must spell it out. You can use any variable you wish, so in the third example below we use t for the variable in the characteristic polynomial. If we set the characteristic polynomial equal to 0 and solve we will get the eigenvalues. >A:=<<1,3>|<2,15>>; >cp:=CharacteristicPolynomial(A,lambda); cp := λ2 − 5λ − 2 >ev:=solve(cp=0,lambda); ## This returns the eigenvalues of A 5 1√ 5 1√ + 33, − 33 2 2 2 2 >cp2:=CharacteristicPolynomial(A^(-1),lambda); ## The charpoly of the inverse of A 5 1 cp := λ2 + λ − 2 2 >solve(cp2=0,lambda); ## the eigenvalues of the inverse of A 5 1√ 5 1√ − + 33, − − 33 4 4 4 4 >rationalize(1/ev[1]); ## the reciprocals of the eigenvalues of A are the eigenvalues of the inverse of A 5 1√ − + 33 4 4 >cp3:=CharacteristicPolynomial(A^%T,t); >solve(cp3=0,t); ## A and the transpose of A have the same eigenvalues 5 1√ 5 1√ + 33, − 33 2 2 2 2 There are more direct ways of ﬁnding the eigenvalues: >Eigenvalues(A); >Eigenvectors(A); 50 2. Eigenvalues and Eigenvectors The ﬁrst command is pretty straightforward. It returns a vector whose entries are the eigenvalues of A. The second command is more complicated it returns √ √ −1 √ −1 8 + 55 2 7 + 55 2 7 − 55 √ 8 − 55 1 1 This returns a vector containing the eigenvalues and a matrix containing the corresponding eigenvec- tors. In the above example Maple found the exact values for the eigenvalues (as opposed to ﬂoating point decimal approximations. For large matrices the exact values might be extremely complicated expressions, or they might even be impossible for Maple to compute. In these cases it is better to have Maple compute ﬂoating point approximations for the eigenvalues. You can force Maple to do this if you enter the values in the matrix as decimals (actually if only one entry in the matrix is entered as a decimal thatis enough to force Maple to compute the eigenvalues in ﬂoating point format). In the next example we will use the 6 × 6 matrix 2 1 0 0 0 0 1 2 1 0 0 0 0 1 2 1 0 0 A= 0 0 1 2 1 0 0 0 0 1 2 1 0 0 0 0 1 2 >A:=BandMatrix([1, 2.0, 1], 1, 6); >Eigenvaluess(A); .1980623, .7530204, 1.554958, 2.445042, 3.246980, 3.801938 The ﬁrst command deﬁnes a 6 × 6 band matrix. Notice that one of the entries is given as a decimal. In this case Maple returns ﬂoating point approximations for the eigenvalues. Try changing the 2.0 in the deﬁnition of A to the exact value of 2 and see what Maple returns. This should convince you that ﬂoating point notation can sometimes be clearer than the exact values. Finally,in the exercises for the previous section we mentioned the Cayley-Hamilton Theorem. We will illustrate this theorem using Maple . >A:=<<1,2,3>|<2,1,2>|<3,2,1>>; >cp:=charpoly(A,t); >subs(t=A,cp); >simplify(%); 0 The ﬁrst line deﬁnes a matrix in Maple . The second line computes the characteristic polynomial of that matrix. The third line substitutes the matrix itself into the characteristic polynomial. This returns the expression A3 − 3A2 − 14A − 8 but doesn’t evaluate it. (Note: Maple is smart enough to understand that the last term is now interpreted as −8I.) Finally, the fourth line evaluates the substitution and returns the zero matrix. If you deﬁne A to be any square matrix this last result will always be the zero matrix. That is the Cayley-Hamilton Theorem: any matrix satisﬁes it’s own characteristic polynomial. One consequence of this is that any positive integer power of an n×n matrix can always be expressed in terms of a linear combination of that matrix raised to non-negative integer powers lower than n (including I). For example in the above case we can write: A4 = A(A3 ) = A(3A2 + 14A + 8I) = 3A3 + 14A2 + 8A = 3(3A2 + 14A + 8I) + 14A2 + 8A = 23A2 + 50A + 24I 2.1. Eigenvectors and Eigenvalues 51 If we wanted to reduce A10 to such a linear combination we could use Maple >A:=’A’: ### we want to use A purely symbolically >algsubs(A^3-3*A^2-14*A-8=0, A^10); This returns our solution: 736255A2 + 1988250A + 1032504I Example 2 1 3 Let A = . We know that when a vector,v, in R2 is multiplied by A then the direction of Av 3 1 will typically be diﬀerent from the direction of v. What is the angle between v and Av? cos(t) In this example we will plot the angle between Av and v for unit vectors v = . We will use sin(t) the VectorAngle command which computes the angle between two vectors. >A:=< <1,3>|<3,1>>: >v:=<cos(t),sin(t)>: >plot( VectorAngle(v, A.v), t=0..2*Pi); 3 2.5 2 angle 1.5 1 0.5 0 1 2 3 4 5 6 t Figure 2.3: The angle between v and Av. This plot is shown in Figure 2.3. What is the signiﬁcance of the maximum and minimum values of this plot? Why are there 4 of them? We also know that the length of a vector will typically change after it is multiplied by a matrix. We will plot the ratio Av v This ratio represents the stretching factor of vectors when multiplied by A. Notice that since we are plot- ting this where v is a unit vector so the denominator will be 1. There is a command in the LinearAlgebra package which will ﬁnd the length of a vector. This command is Norm and must be used with the syntax Norm(v, 2). We will plot this ratio along with the previous plot. 52 2. Eigenvalues and Eigenvectors >A:=<<1,3>|<3,1>>: >v:=<cos(t),sin(t)>: >plot( [VectorAngle(v, A.v), Norm(A.v,2)], t=0..2*Pi); 4 3 2 1 0 1 2 3 4 5 6 t Figure 2.4: The eﬀects of A on length and direction. The result in show in Figure 2.4. Explain the relation between this plot and the output of the following Maple command: >Eigenvectors(A); [-2,4] , [[-1, 1],[1, 1]] We will redo the above with the matrix 2 1 A= 3 4 >A:=<<1,3>|<3,1>>: >v:=<cos(t),sin(t)>: >Eigenvectors{A); [1.00, 5.00], [ [-.707, .707], [-.35, -1.06]] >plot( [VectorAngle(v, A.v), Norm(A.v,2)], t=0..2*Pi); The output is shown in Figure 2.5. Why does this plot have four minima while the previous example had two? We have also drawn vertical lines at these minima. What is the signiﬁcance of the intersection of these vertical lines and the upper curve (the norm of Av)? Redo this type of plot for the matrices 2 1 .8 .6 0 2 −.6 .8 2.2. Diagonalization 53 5 4 3 2 1 0 1 2 3 4 5 6 t Figure 2.5: The eﬀects of A on length and direction. 2.2 Diagonalization Deﬁnition 5 Two square matrices, A and B, of the same size are said to be similar if there is an invertible matrix P such that B = P −1 AP . Note that the condition in the above deﬁnition is the same as saying A = P BP −1 ; or if we let Q = P −1 it is equivalent to saying A = Q−1 BQ. The signiﬁcance of similarity between matrices will become clearer as the course goes on, but for now there is one particular consequence of similarity that will be pointed out. Theorem 2.4 If n × n matrices A and B are similar then they have the same characteristic poly- nomial and hence the same eigenvalues. Proof. Suppose B = P −1 AP . Then B − λI = P −1 AP − λP −1 P = P −1 (AP − λP ) = P −1 (A − λI) P It then follows that: det(B − λI) = det P −1 (A − λI) P = det(P −1 ) · det(A − λI) · det(P ) 1 = · det(A − λI) · det(P ) det(P ) = det(A − λI) 54 2. Eigenvalues and Eigenvectors We then see that A and B have the same characteristic polynomial and from this it follows that they have the same eigenvalues (with the same multiplicities). Example 2.2.1 2 1 Let A = . The trace of A is 4 and the determinant is 3, so A has the characteristic 1 2 2 polynomial λ − 4λ + 3. In Example 2.1.1 we saw that A has eigenvalues of 3 and 1 with 1 1 corresponding eigenvectors and . 1 −1 We will now deﬁne a new matrix, B, that will be similar to A. First, we deﬁne an 1 2 arbitrary invertible 2 × 2 matrix, P . We will let P = and then deﬁne 1 1 −1 2 2 1 1 2 3 3 B = P −1 AP = = 1 −1 1 2 1 1 0 1 Then A and B are similar matrices. Since B is triangular it is easy to see that the eigenvalues of B are 3 and 1. These eigenvalues have corresponding eigenvectors of 1 3 and so even though similar matrices have the same eigenvalues they have, in 0 −2 general, diﬀerent eigenspaces. Deﬁnition 6 A square matrix A is said to be diagonalizable if it is similar to a diagonal matrix. That is, if A = P DP −1 for some invertible matrix P and some diagonal matrix D. The matrix P is said to diagonalize A. Theorem 2.5 An n × n matrix A is diagonalizable if and only if it has n linearly independent eigenvectors. Proof. Suppose A has n linearly independent eigenvectors. Let v1 , v2 , . . . , vn be these linearly independent eigenvectors and let λ1 , λ2 , . . . , λn be the corresponding eigenvalues. λ1 0 ··· 0 0 λ2 · · · 0 Let P = [v1 v2 . . . vn ] and D = . . . then . . . . . . 0 0 ··· λn AP = A [v1 v2 . . . vn ] = [Av1 Av2 . . . Avn ] = [λ1 v1 λ2 v2 . . . λn vn ] The last line follows because the vectors vi are eigenvectors of A. We also have: 2.2. Diagonalization 55 λ1 0 ··· 0 0 λ2 · · · 0 PD = [v1 v2 . . . vn ] . . . . . . . . . 0 0 ··· λn = [λ1 v1 λ2 v2 · · · λn vn ] We then have AP = P D and multiplying both sides on the right by P −1 gives A = P DP −1 . (How do we know that P is invertible?) So A is similar to D and is therefore diagonalizable. The proof is half ﬁnished. We still have to show that if A is diagonalizable then A has n linearly independent eigenvectors. Suppose A is diagonalizable then A = P DP −1 for some diagonal matrix D and invertible matrix P . It then follows that AP = P D and using the same notation as above this equation can be written Av1 Av2 ... Avn = λ1 v1 λ2 v2 ... λn vn From this equation it follows that the columns of P will be eigenvectors with the eigenvalues the entries on the diagonal of D. The independence of the eigenvectors follows from the invertibility of P. Example 2.2.2 2 2 1 Let A = 1 3 1 . Can this matrix be diagonalized? 1 2 2 Step 1. The characteristic polynomial is det(A − λI) = (5 − λ)(1 − λ)2 . The eigenvalues are therefore 5 and 1 (multiplicity 2). Step 2. Find a basis for each eigenspace. 1 For λ = 5 we get v1 = 1 . 1 1 For λ = 1 we get a two dimensional eigenspace with basis vectors v2 = 0 and −1 2 v3 = −1 . 0 We now have three linearly independent eigenvectors. Step 3. The matrix P that will diagonalize A will be constructed from the eigenvectors. The eigenvectors can be used as the columns of P in any order. The eigenvalues will then turn up as the diagonal entries of D in the corresponding order. So, for example, we can let 1 1 2 P = 1 0 −1 1 −1 0 56 2. Eigenvalues and Eigenvectors Step 4. We now know that A can be diagonalized and if we use the above P we get 5 0 0 D= 0 1 0 0 0 1 There was no computation involved in ﬁnding D but if you want to check your answer you can use the fact that D = P −1 AP so: 1/4 1/2 1/4 2 2 1 1 1 2 P −1 AP = 1/4 1/2 −3/4 1 3 1 1 0 −1 1/4 −1/2 1/4 1 2 2 1 −1 0 5 0 0 = 0 1 0 0 0 1 1 1 Not every square matrix can be diagonalized. A simple example is A = . This matrix 0 1 has only one eigenvalue, λ = 1, so in order for it to be diagonalizable the corresponding eigenspace would have to be 2 dimensional. But subtracting 1 down the diagonal gives a matrix of rank 1 so the corresponding eigenspace is only 1 dimensional. The conclusion is that this matrix is not diagonalizable. The general idea is that a matrix is not diagonalizable if you can’t ﬁnd enough linearly independent eigenvectors to make up matrix P . Square matrices that cannot be diagonalized are said to be deﬁcient. The following theorem, which will be stated without proof, gives some important insight into diagonalizable matrices. Theorem 2.6 Let λ be an eigenvalue of matrix A. If λ has multiplicity n then the dimension of the corresponding eigenspace is at most n. For example, suppose A is a 4 × 4 matrix with characteristic polynomial (λ − 3)2 (λ − 5)2 . Then this matrix has two eigenvalues, both of multiplicity 2. That means there are two eigenspaces and the dimension of each eigenspace is at most 2. Taking this a bit further, it means that each eigenspace has a dimension of either 1 or 2. (The dimension can’t be 0 because an eigenspace has to contain non-zero eigenvectors.) In order for A to be diagonalizable both eigenspaces would have to have dimension 2. If at least one of these eigenspaces has dimension 1 then matrix A is deﬁcient and cannot be diagonalized. In general, for a square matrix A to be diagonalizable each eigenspace must be of the maximum possible dimension – this maximum possible dimension is the multiplicity of the corresponding eigenvalue. So a deﬁcient matrix has an eigenspace whose dimension is less than the multiplicity of the corresponding eigenvalue. Example 2.2.3 C onsider the following matrices 4 0 0 0 0 4 1 0 0 0 0 4 0 0 0 0 4 1 0 0 A = 0 0 4 0 0 B = 0 0 4 0 0 0 0 0 5 0 0 0 0 5 0 0 0 0 0 5 0 0 0 0 5 2.2. Diagonalization 57 It should be clear that 4 is an eigenvalue of multiplicity 3 for both A and B, and that 5 is an eigenvalue of multiplicity 2 for both matrices. When you subtract 5 from each diagonal entry in either matrix you get three pivot columns and two columns of zeroes; that is, the resulting matrices have rank 3. The corresponding eigenspace is therefore 2 dimensional for each matrix. The dimension of the eigenspace is equal to the multiplicity of the eigenvalue. But notice what happens when you subtract 4 from the diagonal entries of these matrices. With matrix A you get three columns of zeroes resulting in a matrix of rank 2 with a 3 dimensional eigenspace. Matrix B, on the other hand, will still contain 4 pivots (the rank will be 4) so the corresponding eigenspace is only 1 dimensional. Matrix B is deﬁcient and cannot be diagonalized. Theorem 2.7 If A = P DP −1 where D is diagonal. Then An = P Dn P −1 for any positive integer n. Proof. Suppose A = P DP −1 then An = P DP −1 · P DP −1 · P DP −1 . . . P DP −1 · P DP −1 = P D(P −1 P )D(P −1 P ) . . . (P −1 P )D(P −1 P )DP −1 = P DIDID . . . IDP −1 = P Dn P −1 The signiﬁcance of this last theorem is connected with the fact that if D is a diagonal matrix then Dn is easy to evaluate. To evaluate Dn you just have to raise each entry on the main diagonal to the nth power. So if A is diagonalizable, the most eﬃcient way of computing An is by evaluating P Dn P −1 . The importance of this result will be investigated in the next chapter. 58 2. Eigenvalues and Eigenvectors Exercises 1. Find a basis for the eigenspaces of the following matrices. Are any of these matrices diagonalizable? 2 0 0 2 1 0 2 1 0 (a) 0 2 0 (b) 0 2 0 (c) 0 2 1 0 0 2 0 0 2 0 0 2 2. Diagonalize the following matrices if possible. That is, give matrices P and D such that P −1 AP = D where D is diagonal. 2 0 1 0 0 1 (a) A = (b) A = (c) A = 1 1 2 1 1 2 3. Diagonalize the following matrices if possible. 3 1 1 3 5 5 (a) A = (b) A = (c) A = −1 5 4 2 5 5 8 −12 4. (a) Can the matrix be diagonalized? 6 −10 8 −12 (b) Find any values of α for which the matrix cannot be diagonalized. 6 α 4 2 2 5. Diagonalize A = 2 4 2 if possible. The eigenvalues of this matrix are λ = 2, 8. 2 2 4 4 0 −2 6. Diagonalize A = 2 5 4 if possible. The eigenvalues of this matrix are λ = 5, 4. 0 0 5 2 2 −1 7. Diagonalize A = 1 3 −1 if possible. The eigenvalues of this matrix are λ = 5, 1 −1 −2 2 2 0 8. Find a matrix which has eigenvalues of 2 and 4 with corresponding eigenvectors and . 1 2 a b 9. Find a matrix which has eigenvalues of 1 and 0 with corresponding eigenvectors and . b a 10. Find a matrix which has eigenvalues of -1 and 0 where -1 is an eigenvalue of multiplicity 2 with 1 1 corresponding eigenvectors 1 and −1 and the eigenvalue 0 has corresponding eigenvector 0 1 0 0. 1 2.2. Diagonalization 59 5 −3 11. Suppose A = . Matrix A can be diagonalized as 6 −4 1 1 2 0 2 −1 A= 1 2 0 −1 −1 1 Find A7 . 1 −1 4 12. Let A = 3 2 −1. The eigenvalues of A are 3, -2, and 1. 2 1 −1 (a) Find a basis for each eigenspace of A. (b) Find a matrix P that diagonalizes A. (c) Calculate A9 . 1 1 13. It was pointed out in this section that the matrix is not diagonalizable. Show that 0 1 1+ǫ 1 is diagonalizable for ǫ > 0. What are the eigenspaces of this matrix? What hap- 0 1 pens to these eigenspaces as ǫ → 0? 1 a 1 14. For what value of a, if any, is 0 1 a not diagonalizable? 0 0 2 2 15. Let A be a 2 × 2 matrix. Suppose A has eigenvalues 3 and 0 with corresponding eigenvectors 1 1 0 and . Let x = . −1 6 (a) Find Ax. (b) Find A. Let be 3 16. A a 3 × matrix. Suppose A has eigenvalues 1, 2, −4 with corresponding eigenvectors 1 1 4 1 1, 0, and 2. Let x = 2. 1 2 3 3 (a) Find Ax. (b) Find A. 17. Show that the only matrix similar to the zero matrix is the zero matrix. 18. Show that the only matrix similar to the identity matrix is the identity matrix. 19. Suppose A is an n × n matrix such that A is similar to −A. (a) Show that if n is odd then det A = 0. (b) Find an invertible 2 × 2 matrix A such that A is similar to −A. 20. Show that if A is similar to B, and B is similar to C, then A is similar to C. 21. Suppose A and B are similar matrices. Show that 60 2. Eigenvalues and Eigenvectors (a) if A is invertible then B is also invertible. (b) if A is idempotent then B is also idempotent. (Remember A is idempotent if A2 = A.) (c) if A is nilpotent then B is also nilpotent. (A is said to be nilpotent if An = 0 for some positive integer n.) 1 0 0 1 22. Show that is not similar to . 0 2 2 0 23. Suppose that A is a 4 × 4 matrix with two eigenvalues. Each eigenspace is one dimensional. Is A diagonalizable? 24. Suppose that A is a 5 × 5 matrix with 5 eigenvalues. Is A diagonalizable? 25. Suppose that A is a 5 × 5 matrix with 4 eigenvalues. Is A diagonalizable? 26. Suppose B = P −1 AP . Let v be an eigenvector of A with eigenvalue λ. Show that P −1 v is an eigenvector of B with eigenvalue λ. 27. Suppose u and v are non-zero vectors in Rn . Show that the n × n matrix uvT is diagonalizable if and only if uT v = 0. (Hint: use problem 18 from section 2.1) 2.2. Diagonalization 61 Using MAPLE Example 1 In this example we will use Maple to diagonalize a matrix. We will begin with the following 2 × 2 matrix: 8/3 4/3 A= 2/3 10/3 First we need the eigenvectors of A. In Maple we could do the following: >A:=<< 8/3,2/3>|<4/3,10/3>>; >ev:=Eigenvectors(A); [2, 4] [[−2, 1], [1, 1]] The last command tells us that the eigenvalues are 2 and 4 and that each eigenvalue has multiplicity 1. This command also returns a set of basis eigenvectors for each eigenspace. In this case each eigenspace is 1 dimensional and so each set contains one vector. We next deﬁne these eigenvectors in Maple and diagonalize A. >v1:=<-2,1>: >v2:=<1,1>: >P:=<v1|v2>; >evalm(P^(-1).A.P); 2 0 As expected, when A is diagonalized we get with the eigenvalues turning up on the diagonal. 0 4 How is the unit circle transformed by matrix A? We have seen before that the unit circle can be seen cos(t) as vectors of the form as t ranges from 0 to 2π, and we will now multiply these vectors by A. sin(t) We will plot the results along with the eigenspaces of A. >v:=<cos(t),sin(t)>: >tv:=A.v; >p1:=plot([v[1],v[2],t=0..2*Pi]): ## the unit circle >p2:=plot([tv[1],tv[2],t=0..2*Pi]): ## the circle multiplied by A >p3:=plot([-2*t,t,t=-1.5..1.5]): ## one eigenspace >p4:=plot([t,t,t=-3..3]): ## the other eigenspace >plots[display]([p1,p2,p3,p4],scaling=constrained,color=black); This gives us the plot in Figure 2.6 If you look at the points on the unit circle that intersect the eigenspaces, you should understand that these points are stretched out by factors of 2 and 4 along the eigenspaces to the points on the ellipse that intersect the eigespaces. Notice also that the eigenspaces are not the axes of the ellipse. We will learn later how to ﬁnd these axes. Try repeating the above example using the following matrices: 1 1 3 1 1/2 3/3 1 3 Example 2 62 2. Eigenvalues and Eigenvectors 3 lambda=4 2 lambda=2 1 –3 –2 –1 1 2 3 –1 –2 –3 Figure 2.6: We will look at another example involving a larger matrix. Suppose we want to diagonalize 1 4 3 5 2 1 2 3 B= 3 2 1 2 4 3 4 1 We will ﬁrst enter this into Maple . >B:=< <1,2,3,4>|<4,1,2,3>|<3,2,1,4><5,3,2,1>>: >evb:=Eigenvectors(B); Notice when B was entered into Maple one of the entries was entered as 1.0. By entering at least one entry in ﬂoating point format you force Maple into using ﬂoating point approximations for its computations rather than exact arithmetic. To see the advantage of this try changing the 1.0 to 1 and see what happens. We now want to pick out the four eigenvectors. We could just retype them as we did in the ﬁrst example but in this case the following approach is simpler: >P:=evb[2]; >P^(-1).B.P; This gives −2.0 −0.000000015 0.0000000005 0.0000000004 0.00000000823064292 10.28582791 0.000000014 −0.000000012 0.00000000712841625 0.000000013 −2.856210504 −0.000000014 −0.0000000003959861145 0.0000000009 −0.00000000100 −1.429617398 Notice here that P −1 BP didn’t quite give a diagonal matrix. The entries oﬀ the diagonal weren’t zeroes. But they were all small (around the order of 10−8 ). This is the result of using ﬂoating point approximations. When using ﬂoating point calculations numbers that should be zero turn out to be very small nonzero values as a result of rounding errors. Example 3 In this example we will illustrate another way of diagonalizing a matrix in Maple . There is a command in Maple which will diagonalize a matrix in one step and place the eigenvectors in a matrix. 2.2. Diagonalization 63 >A:=<<1,-2>|<-2,6>>: >ev,P:=Eigenvectors(A); >P^(-1).A.P; 2 0 0 5 In this example the columns of matrix P are the eigenvectors of A. The command JordanForm converts any matrix to what is called Jordan canonical form. If the matrix is diagonalizable then the Jordan form is the similar diagonal matrix. If the matrix is not diagonalzable then the Jordan form is, in a sense, the matrix that is as close to diagonal as possible. It is an almost diagonal matrix with eigenvalues down the diagonal but with some non-zero entries (in facr 1s) above the diagonal. For example, the matrix −4 3 −8 18 −3 18 13 −5 17 is not diagonalizable. The JordanForm command returns the following >A:=<<-4,18,13>|<3,-3,-5>|<-8,18,17>>; >ev,P:=Eigenvectors(A); ### note that P has only 2 non-zero columns, A ios defective >P:=JordanForm(A,output=’Q’); ### compare this with the previous output >P^(-1).A.P; ### this is almost diagonal 4 0 0 0 3 1 0 0 3 64 2. Eigenvalues and Eigenvectors 2.3 Eigenvectors and Linear Transformations You should recall that if A is an m × n matrix and x is a vector in Rn , then Ax is a vector in m R . In this context multiplication of a vector by A is said to be a linear transformation from Rn to Rm . The idea is simple: you have a vector in Rn , you multiply the vector by A, and as a result you get a vector in Rm . This transformation can also be written using the notation x → Ax, and the vector x is said to be mapped to the vector Ax.4 A linear transformation is sometimes represented using functional notation. That is, the linear transformation might be called T and then for any input vector, v, the result of applying the transformation is called T (v). In other words, T (v) = Av. Now suppose we have: 1 2 2 A= , x= 2 1 −1 with the following bases of R2 : 1 1 1 1 B= , , C= , 1 −2 1 −1 In this case multiplication by A would correspond to a linear transformation from R2 to R2 . What does x get mapped into by this transformation? A simple computation gives 1 2 2 0 Ax = = 2 1 −1 3 But now suppose that we were representing our vectors relative to a diﬀerent basis other than the standard basis. Suppose, for example, we were representing vectors relative to basis B. In that 1 case our starting vector, before the transformation, would be represented by [x]B = and this 1 1 vector would be mapped into [Ax]B = . The point we want to illustrate in this section is −1 that even if we represent our vectors relative to some other basis the linear transformation will still correspond to multiplying by a certain matrix. This matrix will not be A. If we are representing vectors relative to basis B, then what matrix would we have to multiply by in order to apply the same linear transformation? If we start with [x]B how can we ﬁnd the result of applying the linear transformation? We will break the problem up into smaller steps. First we multiply by PE←B . This will give us our starting vector in the standard basis, and we know that in the standard basis we multiply by matrix A to apply the linear transformation. Finally, we want our transformed vector represented in basis B, not the standard basis, so we multiply by PB←E and we end up with [Ax]B . A x −→ Ax B ↑ ↓ B −1 [x]B −→ [Ax]B B −1 AB If we put all of that together, what did [x]B get multiplied by? If you arrange things properly you should see that our starting vector was multiplied by PB←E APE←B = B −1 AB. This matrix is 4 Rn is also called the domain of the transformation. Rm is called the co-domain. The range of the transfor- mation is Col A. 2.3. Eigenvectors and Linear Transformations 65 called the B-matrix of the transformation. If we evaluate this matrix we get −1 1 1 1 2 1 1 3 −2 = 1 −2 2 1 1 −2 0 −1 If we use this to check our earlier computation we get 3 −2 1 1 = 0 −1 1 −1 B −1 AB [x]B = [Ax]B Now repeat the above procedure and ﬁnd the C-matrix of this linear transformation. Recall that two n × n matrices, A and B, are similar if A = P −1 BP . Since P is invertible, the columns of P form a basis for Rn . This means that the transformation x → Bx represents the same transformation as x → Ax if you interpret the ﬁrst as relative to the columns of P and the second as relative to the standard basis. The following points summarize this section • A linear transformation from Rn to Rm corresponds to multiplying a vector in Rn by an m × n matrix. • A vector has an inﬁnite number of diﬀerent representations corresponding to diﬀerent bases. As a consequence the matrix representation of a linear transformation varies with the choice of a basis. • If two matrices are similar then they correspond to the same linear transformation expressed relative to diﬀerent bases. • If a matrix A is diagonalizable then the diagonal matrix is, in a sense, the simplest way of representing the linear transformation x → Ax. This happens when we represent the transformation relative to a basis of eigenvectors of A. Example 2.3.1 L et 5 0 −4 A = 2 1 −2 3 0 −2 and let T be the linear transformation x → Ax. Let 1 1 1 B = 1 , 0 , 1 1 1 0 and 1 0 4 C = 0 , 1 , 2 1 0 3 If we deﬁne matrices B and C as usual we get 66 2. Eigenvalues and Eigenvectors 1 0 1 B −1 AB = 0 1 2 0 0 2 1 0 0 C −1 AC = 0 1 0 0 0 2 Notice that although B and C are both bases of R3 , C is composed of eigenvectors of A and so the C-matrix of T (x) = Ax is diagonal. The important thing to understand here is: In what sense do A, B −1 AB, and C −1 AC represent the same linear transformation? 4 Choose any vector in R3 . Say, v = 3. If we apply the linear transformation T to v 3 we get 5 0 −4 4 8 T (v) = Av = 2 1 −2 3 = 5 3 0 −2 3 6 8 So transformation T maps v into 5. 6 But suppose we look at things from the point of view of basis B. If we express v in terms 2 of basis B we have [v]B = 1. If we multiply this vector by the matrix B −1 AB we get 1 1 0 1 2 3 B −1 AB [v]B = 0 1 2 1 = 3 0 0 2 1 2 We want to show that this last vector is, in fact, the same vector as Av. In order to see this we should look at this result as the coordinates of a vector relative to basis B. In the standard basis that vector would then be 1 1 1 8 3b1 + 3b2 + 2b3 = 3 1 + 3 0 + 2 1 = 5 = Av 1 1 0 6 In other words, the transformation represented by matrix A relative to the standard basis would be represented by B −1 AB relative to basis B. Similarly, if we look at things from point of view of the eigenbasis, C, our initial vector 0 would be [v]C = 1. Applying the transformation to this coordinate vector we get 1 1 0 0 0 0 C −1 AC [v]C = 0 1 0 1 = 1 0 0 2 1 2 This should be the C-representation of Av. To see this we put this last result in terms of the standard basis 2.3. Eigenvectors and Linear Transformations 67 1 0 4 8 0c1 + 1c2 + 2c3 = 0 0 + 1 1 + 2 2 = 5 = Av 1 0 3 6 Every diﬀerent basis would give a a diﬀerent representation of vector v and a diﬀer- ent representation of transformation T . The matrices corresponding to these diﬀerent representations of T are exactly the matrices that are similar to A. Example 2.3.2 0 −1 1 Let A = and v = . Matrix A corresponds to a counter-clockwise rotation 1 0 0 0 of 90◦ . So we get Av = . That is, a unit vector on the x axis is transformed (rotated) 1 to a unit vector along the y axis. Now we’ll look at the same transformation from the point of view of a diﬀerent basis. 1/2 1 Let B = and let the columns of B be our new basis, B. The matrix of the 0 1 transformation x → Ax relative to basis B would be 2 −2 0 −1 1/2 1 −1 −4 B −1 AB = = 0 1 1 0 0 1 1/2 1 2 What does the transformation v → Av look like in terms of basis B? We have [v]B = , 0 and transforming this we get −1 −4 2 −2 = 1/2 1 0 1 So we have 1 0 → in the standard basis 0 1 2 −2 → in basis B 0 1 In Figure 2.7 you see vector v and the result of transforming v by multiplying by A (i.e., by rotating 90◦ ). In Figure 2.8 you see exactly the same situation but now it is overlaid with the grid corresponding to basis B. You should be able to see how the starting vector corresponds 2 to in this new coordinate system, and how the transformed vector corresponds to 0 −2 . 1 68 2. Eigenvalues and Eigenvectors Figure 2.7: Counter-clockwise rotation by 90◦ (-2,1) 1 -2 (2,0) Figure 2.8: Counter-clockwise rotation by 90◦ and basis B Exercises 1. Let 1 1 2 1 2 A= ,v = , b1 = , b2 = 1 1 −1 1 1 and let B = {b1 , b2 }. (a) Find Av, [v]B , and [Av]B . (b) Find the B-matrix of the linear transformation x → Ax. 2. Let 0 2 3 1 1 A= ,v = , b1 = , b2 = −1 1 1 1 −1 and let B = {b1 , b2 }. (a) Find Av, [v]B , and [Av]B . 2.3. Eigenvectors and Linear Transformations 69 (b) Find the B-matrix of the linear transformation x → Ax. 1 1 2 1 0 1 3. Let A = 2 1 3 and B = 1 , 1 , 0 . 1 0 1 0 1 1 (a) Find the B-matrix of the transformation T (x) = Ax. 1 (b) Let [v]B = 1. Find [T (v)]B and T (v). 1 4. Let 1 0 0 −1 −1 1 0 0 A= 0 −1 1 0 0 0 −1 1 Let H be the Haar basis of R4 . What is the H-matrix of the linear transformation x → Ax? 5. Let 2 2 1 A= ,v = 0 1 −1 and let B be a basis of eigenvectors of A. (a) Find [v]B , [Av]B , and A10 v B (b) Find A10 v. 6. Let A be a 2 × 2 rotation matrix. Let B = {e2 , e1 }. Show that the B-matrix of the linear transformation x → Ax is A−1 . 7. Show that if A is similar to B, then A2 is similar to B 2 . 8. Suppose that A is similar to A2 . What can you say about the eigenvalues of A? 9. Suppose that A and B are both invertible and that A is similar to B, show that A−1 is similar to B −1 . 10. Show that if A and B are similar then they have the same rank. 11. Suppose v is an eigenvector of A with corresponding eigenvalue λ. Let B = P −1 AP for some invertible matrix P . Show that P −1 v is an eigenvector of B. 12. Suppose B and C are similar to A. Is B + C similar to A? 13. Let T be the linear transformation from Rn to Rn deﬁned by T (x) = Ix. This is called the identity transformation. Let B be a basis of Rn . What is the B-matrix of this transformation? 14. Let T be the linear transformation from Rn to Rn deﬁned by T (x) = Ox. This is called the zero transformation. Let B be a basis of Rn . What is the B-matrix of this transformation? 70 2. Eigenvalues and Eigenvectors Using MAPLE Example 1 Similar matrices correspond to the same linear transformation relative to diﬀerent bases. In this example we will begin by deﬁnining two matrices A and P . We will then compute B = P AP −1 . It follows that A and B are similar. >A:=BandMatrix([1,2,1],1,4): >P:=Matrix(4,4, (i,j)->min(i,j)): >B:=P^(-1).A.P: We now want to illustrate the fact that A and B correspond to the same transformation from the point of view of diﬀerent bases. We will deﬁne an arbitrary vector, v, and multiply it by A. We will call the transformed vector tv. >v:=<1,4,5,2>: >tv:=A.v; tv := [6, 14, 16, 9] We now convert the vectors v and tv into the basis formed by the columns of P and call the results vp and tvp. >vp:=P^(-1).v; vp := [-2, 2, 4, -3] >tvp:=P^(-1).tv; tvp := [-2, 6 , 9 -7] Finally, to show the connection between A and B we just have to do one more computation. >B.vp; [-2, 6 , 9 -7] This is the same result as tvp. a b Now try redoing the above steps but deﬁne v to be . c d Example 2 −2 1 0 0 1 1 −2 1 0 2 Let A = and v = . Let T : R4 → R4 be deﬁned by T (x) = Ax. 0 1 −2 1 3 0 0 1 −2 4 We will ﬁnd the representation of T relative to the Haar basis, H, for R4 and apply both versions of transformation T to vector v. First we enter the relevant information into Maple . There are many ways of entering a matrix into Maple . Here we use the augment command. This command takes a sequence of vectors or matrices (having the same number of rows) and combines them into one matrix. 2.3. Eigenvectors and Linear Transformations 71 >A:=<<-2,1,0,0>|<1,-2,1,0>|<0,1,-2,1>|<0,0,1,-2>>: >H:=<<1,1,1,1>|<1,1,-1,-1>|<1,-1,0,0>|<0,0,1,-1>>: >v:=<1,2,3,4>: >tv:=A.v; tv:=[0, 0, 0, -5] The last command computed T (v) and we called the result tv in Maple . Now we want to do the same computation relative to the Haar basis. The H-representation of T will be given by H −1 AH. >B:=H^(-1).A.H; This gives the H-matrix of the transformation as: −1/2 0 −1/4 1/4 0 −3/2 1/4 1/4 −1/2 1/2 −3 −1/2 1/2 1/2 −1/2 −3 Continuing with Maple : >vh:=H^(-1).v; vh:=[5/2, -1, -1/2, -1/2] >tvh:=B.vh; tvh:=[-5/4, 5/4, 0, 5/2] >H.tvh; [0, 0, 0, -5] You should ﬁnd the above pretty straightforward. It shows that relative to the standard basis we had 1 0 2 0 → 3 0 4 −5 and relative to the Haar basis we had the same vector transformation represented as 5/2 −5/4 −1 → 5/4 −1/2 0 −1/2 5/2 72 2. Eigenvalues and Eigenvectors 2.4 Complex Eigenvalues The eigenvalues of an n × n matrix are the roots of a polynomial of degree n but not every such polynomial will have n real roots. It is possible for the characteristic polynomial to have complex roots. We have seen that real eigenvalues have a geometric interpretation as scaling factors in certain directions. In this section we will look matrices with complex eigenvalues and show that these also have a straightforward geometric interpretation. 0 1 Let A = . This matrix has the characteristic equation λ2 + 1 = 0 so we see that A has −1 0 no real eigenvalues. The characteristic equation can however be solved in terms in terms of complex numbers (see Appendix B for a review of complex numbers), and so matrix A does have complex eigenvalues. In particular, if we solve the characteristic equation we get the complex eigenvalues λ1 = i and λ2 = −i. Matrix A is a rotator, it rotates vectors in R2 clockwise by π/2 radians. It should therefore be clear that no non-zero vector is transformed into a scalar multiple of itself when multiplied by A, and so there are no eigenvectors in R2 . The fact that A has complex eigenvalues tells us that there will also be complex eigenvectors in C2 .5 These complex eigenvectors can be easily computed in 1 the usual way. For λ1 = i we will get an eigenvector v1 = . For λ2 = −i we get an eigenvector i 1 v2 = . If we multiply these eigenvectors by A we have the following: −i 0 1 1 i 1 Av1 = = =i = λ1 v1 −1 0 i −1 i and 0 1 1 −i 1 Av2 = = = −i = λ2 v2 −1 0 −i −1 −i Can matrix A be diagonalized? The answer is no if you restrict yourself to real numbers, but the answer is yes if you use complex numbers. If we proceed as before by using our two basis eigenvectors as columns of P we have −1 1 1 0 1 1 1 P −1 AP = i −i −1 0 i −i 1/2 −i/2 0 1 1 1 = 1/2 i/2 −1 0 i −i i 0 = 0 −i So, just as with real eigenvalues, when the matrix is diagonalized the complex eigenvalues turn up as the diagonal entries. cos θ sin θ As a more general example consider the matrix . This matrix has characteristic − sin θ cos θ equation λ2 − 2 cos θ λ + 1 = 0. We can now ﬁnd the eigenvalues by the quadratic formula √ 2 cos θ ± 4 cos θ2 − 4 λ= = cos θ ± − sin θ2 = cos θ ± i sin θ = e±iθ 2 5A vector with n complex entries is said to be a vector in Cn . The symbol C stands for the set of complex numbers as R stands for the set of real numbers. 2.4. Complex Eigenvalues 73 So again we get two complex eigenvalues6 that are conjugates with corresponding complex eigen- 1 vectors . We have ±i cos θ sin θ 1 cos θ + i sin θ = − sin θ cos θ i − sin θ + i cos θ 1 = (cos θ + i sin θ) i and cos θ sin θ 1 cos θ − i sin θ = − sin θ cos θ −i − sin θ − i cos θ 1 = (cos θ − i sin θ) −i The last two examples involved rotation matrices, but not only rotation matrices have complex 1 1 eigenvalues. For example, the matrix has the characteristic equation λ2 − 4λ + 5 = 0 giving −2 3 eigenvalues √ 4 ± 16 − 20 4 ± 2i λ= = =2±i 2 2 1 1 For λ = 2 + i we get an eigenvector of , and for λ = 2 − i we get an eigenvector of . 1+i 1−i Once again, as explained in Appendix B, the eigenvalues are conjugates. Scaling and Rotation in R2 a −b Let A = . What happens if we multiply a vector in R2 by a matrix of this type? First b a notice that this matrix can be rewritten a −b √ a − √a2b+b2 = a2 + b 2 a2 +b2 b a √ b √ a a2 +b2 a2 +b2 cos θ − sin θ = a2 + b 2 sin θ cos θ a where θ = arccos √ . Written this way it is easy to see that multiplying by A corresponds to a2 + b 2 √ rotating the vector through angle θ and scaling by a factor of a2 + b2 . Now suppose we have any (real) 2 × 2 matrix A with complex eigenvalues. Then A will be similar to a diagonal matrix with complex entries on the diagonal. To diagonalize A it will be necessary to use complex eigenvectors. We will now show that it is possible to ﬁnd a basis of real eigenvectors a −b such that matrix A, expressed in terms of this basis, will have the form . This means that b a any 2 × 2 matrix with complex eigenvalues is similar to a matrix that involves just scaling and rotation. 6 This is not quite true. There are some values of θ which will give real eigenvalues. What are these values? What are the eigenvalues in these cases? 74 2. Eigenvalues and Eigenvectors Theorem 2.8 Let A be a (real) 2 × 2 matrix with a complex eigenvalue λ = a + bi. Let v be a complex eigenvector corresponding to λ and let P = Re(v) Im(v) then a b A = P CP −1 where C= −b a Proof. Suppose A is a real 2 × 2 matrix with complex eigenvalues and eigenvectors. Let Av = λv, v ¯v so we also have A¯ = λ¯ where λ = a + bi. Let vR = Re(v) and vI = Im(v). We then have ¯ AvR = A (v + v) /2 = ((a + bi)v + (a − bi)¯ ) /2 = avR − bvI v and ¯ AvI = A (v − v) /2i = ((a + bi)v − (a − bi)¯ ) /2i = bvR + avI v a b Let P = vR vI and C = . Using the above results we can write −b a a b PC = vR vI −b a = avR − bvI bvR + avI = AvR AvI = A vR vI = AP So AP = P C and A = P CP −1 . There is a gap in this proof: Why must P be invertible? We leave it as an exercise to ﬁll in this part of the proof. Example 2.4.3 1 4 L et A = . This matrix has characteristic polynomial −.8 −2.2 λ2 + 1.2λ + .25 + 1 Using the quadratic formula we get eigenvalues √ −1.2 ± 1.44 − 4 = −.6 ± .8i 2 1 For λ = −.6 + .8i we get a corresponding complex eigenvector v = . We now −.4 + .2i 1 want to express A in terms of a new basis. The new basis will be b1 = Re(v) = −.4 0 1 0 and b2 = Im(v) = . So we let P = and we get .2 −.4 .2 1 0 1 4 1 0 P −1 AP = 2 5 −.8 −2.2 −.4 .2 −.6 .8 = −.8 −.6 2.4. Complex Eigenvalues 75 a −b and the resulting matrix has the structure of a scaling combined with a rotation. b a The scaling factor would be (−.6)2 + .82 = 1 and the rotation would be clockwise by arccos(−.6). Example 2.4.4 In this example we will take the reverse approach. Let cos 10◦ − sin 10◦ A = .95 sin 10◦ cos 10◦ This matrix corresponds to a counter-clockwise rotation of 10 degrees combined with a 1 scaling by a factor of .95. If we start with the vector and multiply repeatedly by A 0 eighteen times we get the set of points shown in Figure 2.9. 1 0.8 0.6 0.4 0.2 –1 –0.6 0 0.20.40.60.8 1 –0.2 –0.4 –0.6 –0.8 –1 Figure 2.9: Rotation and Scaling using A 2 −1 We will now convert matrix A to a new basis. We will arbitrarily choose P = 1 1 and let our new basis be the columns of P . Following the construction of Theorem 2.8 we let .991 −.110 B = P AP −1 = .275 .881 1 1/3 In this new basis the vector would correspond to . If we start with this point 0 −1/3 and transform it 18 times we would get Figure 2.10. The eigenspaces are also plotted here (they are determined by the columns of P ). They can be seen as the axes of a new coordinate system and it still takes 9 iterations to go through one quadrant, but now the quadrants are not all the same size. Figures 2.11 and 2.12 illustrate the result of applying 72 iterations of matrix A and B. These plots correspond to two complete cycles plus scaling. Since the scaling factor of .95 is applied at each step it follows that after 72 steps the total scaling would be (.95)72 ≈ .025. You should interpret these plots as two representations of the same thing. With matrix A there is a straightforward rotation by 10 degrees at each step combined with a scaling 76 2. Eigenvalues and Eigenvectors 0.2 –0.2 0.2 0.4 0.6 0 –0.2 –0.4 –0.6 Figure 2.10: Rotation and Scaling using B 1 0.8 0.2 0.6 0.4 –0.2 0.2 0.4 0.6 0 0.2 –1 –0.6 0 0.20.40.60.8 1 –0.2 –0.2 –0.4 –0.4 –0.6 –0.8 –1 –0.6 Figure 2.11: 72 iterations using A. Figure 2.12: 72 iterations using B. by .95. Using matrix B does the same thing in a sense, except now the axes are skewed and stretched. But in both cases the points spiral in to the origin and take 36 iterations to complete one cycle. If the scaling factor was larger than 1, the points would have spiralled away from the origin. Without a scaling factor (or, rather, if the scaling factor was 1) the points would form a closed cycle; with matrix A the points would lie on a circle, with matrix B they would lie on an ellipse. Example 2.4.5 I n this example we will illustrate how these ideas can be extended to larger matrices. We will let 1 1/2 1/2 A = 2 2 −2 0 1 1 The characteristic polynomial of A would be −λ3 + 4λ2 − 6λ + 4 = (2 − λ)(λ2 − 2λ + 2) The ﬁrst factor gives an eigenvalue of λ1 = 2 and a corresponding eigenvector would be 1 1. 1 The quadratic factor gives eigenvalues √ 2± 4−8 √ = 1 ± i = 2e±iπ/4 2 2.4. Complex Eigenvalues 77 1−i If we take λ2 = 1 + i we get a corresponding eigenvector 2i . So we also have 2 1+i λ3 = 1 − i with a corresponding eigenvector −2i . 2 Now by changing basis we can ﬁnd matrices similar to A which might have a simpler structure. There is no real basis that will result in a similar diagonal matrix but if we 1 take our real eigenvector, 1 and the real and imaginary parts of one of the conjugate 1 1 −1 complex eigenvectors, say 0 and 2 and deﬁne 2 0 1 1 −1 P = 1 0 2 1 2 0 then we will have 2 0 0 P −1 AP = 0 1 1 0 −1 1 Notice that the real eigenvalue has turned up on the diagonal entries and the complex a b eigenvalues have turned up as a block of the form . In particular, they have −b a 1 1 √ turned up as the block which corresponds to a scaling by 2 and a rotation of −1 1 π/4. 78 2. Eigenvalues and Eigenvectors Exercises 1. Find the eigenvalues of the following matrices, and a basis for each eigenspace. 1 1 1 4 (a) (b) −5 5 −.8 −2.2 5 −2 .5 −.6 (c) (d) 1 3 .75 1.1 0 −1 0 b (e) (f) 1 1 −b 0 a b 2. Let A = . −b 0 (a) What are the eigenvalues of A? (b) Under what conditions does A have complex eigenvalues? (c) What can you say about A when a = ±2b? a b 3. Find an invertible matrix P and a matrix C of the form such that A = P CP −1 for the −b a following matrices 1 1 5 −1 (a) (b) −2 3 3 2 0 1 1.08 .32 (c) (d) −4 0 −2.72 .12 0 1 1 4. Let A = and x0 = . Consider the dynamical system xt+1 = Axt . −1 0 1 (a) Plot the trajectory of this system with the given x0 . 1 2 (b) Recompute and plot the trajectory if the system is expressed relative to the basis , . 1 0 2 1 (c) Recompute and plot the trajectory if the system is expressed relative to the basis , . 0 1 −2 1 (d) Recompute and plot the trajectory if the system is expressed relative to the basis , . 2 3 0 0 0 1 1 0 0 0 5. Let C = 0 . 1 0 0 0 0 1 0 (a) What is the characteristic polynomial of C? (b) Evaluate C 2 ,C 3 , C 4 , C 5 . (c) Find the eigenvalues of C and a basis for each eigenspace. 2.4. Complex Eigenvalues 79 1 0 0 6. The matrix R1 = 0 0 1 is a rotation of 90 degrees around the x axis in R3 . The matrix 0 −1 0 0 0 1 R2 = 0 1 0 is a rotation of 90 degrees around the y axis in R3 . Find the eigenvalues and −1 0 0 corresponding eigenvectors of R1 R2 . 7. Show that the matrix P = vR vI in Theorem 2.8 must be invertible. (Hint: show that if the columns of P are not linearly independent then the eigenvalues of A would not be complex.) 80 2. Eigenvalues and Eigenvectors Using MAPLE Example 1. There is a command in Maple for plotting points in the complex plane. The command is complexplot and is part of the plots package. In this example we will deﬁne a 32×32 matrix with complex eigenvalues and plot those eigenvalues. The ﬁrst Maple command given below deﬁnes a 32 × 32 matrix A where each entry in the matrix is given by aij = cos(.6i(j − 1)) Notice the symmetry around the real axis of the plot which is a consequence of the fact that the complex eigenvalues come in conjugate pairs. >A:=Matrix(32,32,(i,j)->cos(i*(j-1)*.6)): >ev:=Eigenvalues(A): >plots[complexplot](convert(ev,list),style=point,symbol=circle); 4 2 –4 –2 2 4 –2 –4 Figure 2.13: The 32 complex eigenvalues of A Here’s a slightly more complicated example. In this case we will look at the eigenvalues of matrices of the form 0 1 0 0 ··· 0 0 k 0 1 0 · · · 0 0 0 k 0 1 · · · 0 0 . . .. . . 0 0 0 0 · · · 0 1 1 0 0 0 ··· k 0 where the parameter k ranges from -1 to 1. >f:=k->BandMatrix([k,0,1],1,24): >for i to 21 do B:=f(.1*(i-1)-1): B[24,1]:=1: ev:=Eigenvalues(M): p[i]:=complexplot(convert(ev,list), style=point,color=black,symbol=circle): od: >display([seq(p[i],i=1..21)]); 2.4. Complex Eigenvalues 81 2 1 –2 –1 1 2 –1 –2 Figure 2.14: The complex eigenvalues of B The resulting plot shows the eigenvalues of 21 diﬀerent matrices of the above form where k = −1.0, −0.9, . . . , 0.9, 1.0. We’ll look at one more example. Here we’ll let 0 1 0 0 ··· 0 0 0 1 0 ··· 0 S = 0 0 0 1 ··· 0 . . . 1 0 0 0 ··· 0 We will plot the eigenvalues of M = S 2 + S − I and of M −1 . >A:=BandMatrix([0,0,1.0],1,64):A[64,1]:=1.0: >M:=evalm(S^2+S-1): ### remember Maple interprets 1 as the identity matrix >ev1:=Eigenvalues(M): >ev2:=Eigenvalues(M^(-1))]: >plots[complexplot](convert(ev1,list),style=point); >plots[complexplot](convert(ev2,list),style=point); 1.50 .60 1.00 .40 .50 .20 –2. –1.00 0. .501.00 –1.50 -.50 0. -.50 0.0. .50 -.50 -.20 –1.00 -.40 –1.5 -.60 Figure 2.15: The eigenvalues of M and M −1 . The plots are shown in Figure 2.15. Try to ﬁgure out the relation between these two plots. Example 2. 82 2. Eigenvalues and Eigenvectors We will illustrate how to use Maple to generate some of the images used in this section. Suppose we want a rotation matrix that will take 32 steps to rotate a complete circle, then each step will rotate 2π by . Suppose we also want a starting point to spiral outward so that after one complete rotation of 32 2π radians its distance from the origin will have been doubled, then at each step the scaling would be √ 32 2. The following commands in Maple will deﬁne the appropriate matrix and compute and plot two complete cycles. The ﬁrst three lines below deﬁne the values that will determine our matrix. The ﬁfth line deﬁnes our starting point, x0 . The sixth line computes 64 points of the trajectory in a loop. The last two lines plot the points which we have computed. >c:=evalf(cos(2*Pi/32)); >s:=evalf(sin(2*Pi/32)); >r:=evalf(2^(1/32)); ### the scaling factor >A:=r*<<c,-s>|<s,c>>; ## the rotation matrix with scaling >x[0]:=<1,0>; ## the starting point >for i to 64 do x[i]:=A.x[i-1] od: ## this finds x1 to x64 >pdata:=[seq( [ [x[i][1], x[i][2] ] ), i=0..64)]: >plot(pdata,style=point,symbol=box); ### phase plot of our system We get Figure 2.16. 3 2 1 –2 –1 1 2 3 4 –1 –2 Figure 2.16: Multiplying by A Notice that it takes 8 steps to pass through each quadrant and that after each complete cycle the intercept on the horizontal axis is doubled. Now suppose we look at the same transformation relative to a diﬀerent basis. Let’s choose the basis 3 1 B= , 1 −1/3 Here are some comments concerning the following Maple commands: • The plot p2 gives the axes of the new coordinate system. The axes can be represented as the lines tv1 and tv2 . This command plots these parametric representations of the axes. The deﬁnition of p2 might look confusing at ﬁrst but remember v1[1] and v1[2] just give the ﬁrst and second components of v1. 2.4. Complex Eigenvalues 83 • It is strongly recommended that you try the same example with several other choices for a basis. If you change the values of v1 and v2 you can just reenter all the other commands to see the corresponding plot. You might have to change the size of the viewing window as given by the view option in the display command. a b • Notice that matrix B doesn’t fall into the pattern but P −1 BP = A and so a change of −b a basis will put it in that form. >v1:=<3,1>: >v2:=<1,-1/3>: >P:=<v1|v2>; >B:=P.A.P^(-1); ## P and A are similar >x0:=<1,0>; ## the starting point >for i to 64 do x[i]:=evalm(B&*x[i-1]) od: >pdata:=[seq( [ x[i][1], x[i][2] ], i=0..64)]: >p1:=plot(pdata,style=point,symbol=box): >p2:=plot({[v1[1]*t,v1[2]*t,t=-10..10],[v2[1]*t,v2[2]*t,t=-10..10]}): >plots[display]([p1,p2],view=[-6..6,-6..6]); This gives Figure 2.17. 6 4 2 –6 –4 –2 0 2 4 6 –2 –4 –6 Figure 2.17: Multiplying by B. Notice in this case if we look at the new axes as dividing the plane into 4 unequal “quadrants” it still takes 8 steps to pass through each “quadrant”. The distance of any point on the path is also still doubling after a complete cycle. Example 3. 0 1 0 0 0 0 1 0 For the next example look at the matrix A = 0 . This matrix has characteristic 0 0 1 −1 0 0 0 polynomial λ4 + 1 which has no real roots, and so A has 4 complex eigenvalues. In Maple 84 2. Eigenvalues and Eigenvectors >A:=<<0,0,0,-1>|<<1,0,0,0>|<0,1,0,0>|<0,0,1,0>>; >ev,V:=Eigenvectors(A); √ √ √ √ 2±i 2 − 2±i 2 We now see that the four complex eigenvalues are and . We will take the 2 2 real and imaginary parts of two of the eigenvectors that ARE NOT conjugates of each other (otherwise the resulting vectors would not be linearly independent.) and place them as columns in a matrix. (Note that Maple does not always return the eigenvectors in the same order so you will have to look and see which two columns to use.) >for i to 4 do v[i]:=simplify(ev[i]) od; >v1:=map(Re,Column(V,1)):v2:=map(Im,Column(V,1)); ### one conjugate pair >v3:=map(Re,Column(V,2)):v4:=map(Im,Column(V,2)); ### another conjugate pair >P:=<v1|v2|v3|v4>; This gives √ √ √ √ − 2/2 − 2/2 2/2 2/2 0 −1 0 −1 P = √ √ √ 2/2 − 2/2 − 2/2 √ 2/2 1 0 1 0 Now we will evaluate P −1 AP : >P^(-1).A.P; This gives us the matrix √ √ √2/2 √2/2 0 0 − 2/2 2/2 0 0 P −1 AP = √ √ 0 0 − 2/2 −√2/2 √ 0 0 2/2 − 2/2 This reveals the rotations hidden inside matrix A. It shows that multiplying by A corresponds to a rotation of π/4 in the x1 x2 plane and a rotation of 3π/4 in the x3 x4 plane. Note that in this case the Cayley-Hamilton theorem tells us that A8 = I so any trajectory would be periodic of length 8. An arbitrary trajectory would start with a b c b c d → → ··· c d −a d −a −b What are the next 6 steps of this trajectory? What is the trajectory in the x1 x2 plane? In the x2 x4 plane?