jordan

Categories
Tags
-
Stats
views:
26
posted:
6/14/2012
language:
English
pages:
18
Document Sample
scope of work template
							Math. H110                          Jordan’s Normal Form                                December 7, 2000 10:38 am


Jordan’s Normal Form
Our objective is to demonstrate that for any given complex n-by-n matrix B there exists at
least one invertible matrix C that transforms B by Similarity into a diagonal sum
                                  β1 I1 + J1       O                O           …       O
                                      O        β2 I2 + J2           O           …       O
                      C–1BC =         O            O           β3 I3 + J3       …       O
                                     …            …                 …           …       …
                                     O            O                 O           …   βL IL + JL

of Jordan Blocks each of the form ßI + J , where ß is an eigenvalue of B and J is obtained
from the identity matrix I either by deleting its first row and appending a last row of zeros, or
equivalently by deleting its last column and prepending a first column of zeros. For example,
here is a 4-by-4 Jordan Block:
                                                            β   1   0   0
                                          ßI + J =          0   β   1   0   .
                                                            0   0   β   1
                                                            0   0   0   β
Such a block has one repeated eigenvalue and only one eigenvector regardless of its dimension.
Every eigenvalue ßj of B appears in at least one Jordan Block, and these blocks can appear
in any order, and their various dimensions add up to the dimension n of B . We’ll see that B
determines its Jordan blocks completely except for the order in which they appear. Since every
matrix Z–1BZ Similar to B has the same blocks, they tell us all that can be known about the
geometrical effect of a linear operator whose matrix, in an unknown coordinate system, is B .
For instance they show how B decomposes the vector space into an Irreducible sum of Nested
Invariant Subspaces, as will be explained later.

An important application of Jordan’s Normal Form is the extension of the definitions of scalar
functions ƒ(ß) of a scalar argument ß to define matrices ƒ(B) . However, we shall find that
ƒ(B) is easier to find from a Pennants form of B , or from a triangular Schur form.

Jordan’s canonical form under similarity is hard to discover because it can be a discontinuous
function of its data B . For example, no matter how tiny the nonzero number µ may be,
Jordan’s Normal Form of
                                                   β   1   0    0
                                                   0   β   1    0
                                                   0   0   β    1
                                                   µ   0   0    β
must be diagonal with four 1-by-1 Jordan blocks; do you see why? And do you see why
Jordan’s Normal Form of
                                                   β   1   0    0
                                                   0   β   µ    0
                                                   0   0   β    1
                                                   0   0   0    β
is the same for all µ ≠ 0 ? Irreducible invariant subspaces are not determined uniquely if µ = 0 .

Discovering the Jordan blocks takes several steps each intended to simplify the problem. The
first step identifies the eigenvalues ßj of B as the zeros ( generally complex numbers ) of its
Characteristic Polynomial det(λI – B) = λn – Trace(B)λn–1 + … + (–1)ndet(B) = ∏j (λ – ßj) .


Prof. W. Kahan                                                                                          Page 1


                           This document was created with FrameMaker 4 0 4
Math. H110                              Jordan’s Normal Form               December 7, 2000 10:38 am


The Cayley-Hamilton Theorem:
               Every square matrix satisfies its own Characteristic Equation;
i.e., ƒ(B) = O when ƒ(λ) := det(λI–B) = ∑0≤j≤n ƒjλj is the characteristic polynomial of B .
This theorem is stated with an incorrect proof or none in many texts on linear algebra, which is
reason enough to present a correct proof here:

Let the Classical Adjoint or Adjugate of λI–B be A(λ) := Adj(λI–B) . It is known to satisfy
A(λ)(λI–B) = (λI–B)A(λ) = ƒ(λ)I . At first sight, we could replace the scalar λ by the matrix
B in the last equation to get ƒ(B) = (BI–B)A(B) = O , which is what the theorem claims. But
this is not a proof. How do we know that a matrix identity, valid for all scalar values of a
variable λ , remains valid after λ is replaced by a matrix ? It’s not so in general, as the next
examples show: Set
                 P :=   11   ,   Q :=     1 1    ,   R :=   10   , and   S := –   00   ;
                        11               –1 –1              00                    44

then PλQ = O for all scalars λ but PRQ = P ≠ O , and (Q–λI)(Q+λI) = -λ2I for all scalars
λ but (Q–PI)(Q+PI) = S ≠ –P2I . These counter-examples reveal a flaw in the simple-minded
substitution of B for λ above. A correct proof must be more complicated:

Each element of adjugate A(λ) is a polynomial in λ of degree at most n–1 ; it must have the
form A(λ) = ∑0≤j<n Ajλj in which every coefficient Aj is an n-by-n matrix. In fact every
Aj is a polynomial in B computable from the identity A(λ)(λI–B) = (λI–B)A(λ) = ƒ(λ)I ,
                        i.e. (λI – B) ∑0≤j<n Aj λj = ∑0≤j≤n ƒj λj I ,
by matching the coefficients of successive powers of λ . Begin with the coefficient of λn ;
An–1 = ƒnI = I . Then for λn–1 find that An–2 – BAn–1 = ƒn–1I , so An–2 = ƒn–1I + B . And in
general Aj–1 = ƒjI + BAj for j = n, n-1, …, 3, 2, 1, 0 in turn, starting from An := O and
ending at A–1 := O to meet end-conditions in the sums. This confirms by reverse induction that
every Aj is a polynomial in B with coefficients drawn from the numbers ƒj , and therefore
BAj = AjB just as λAj = Ajλ , justifying simple-minded substitution. Alternatively observe that
                O = A–1 = ƒ0I + B(ƒ1I + B(ƒ2I + … + B(ƒn–1I + B)…)) = ƒ(B) ,
which is what the Cayley-Hamilton theorem claims. End of proof.


Triangular Forms Similar to B
Two other forms are almost as useful as Jordan’s and far easier to exhibit. First is Schur’s
decomposition B = QUQ* in which Q* = Q–1 and U is upper-triangular with the eigenvalues
of B on its diagonal. This unitary similarity has many uses and is relatively easy to compute
with fair accuracy ( QUQ* is almost exactly B ); its existence will be demonstrated below.

The second form to which every square matrix B can be reduced by complex similarity is a
diagonal sum of triangular matrices of which each has only one eigenvalue, and this eigenvalue
is distinct from the eigenvalue of every other triangle in that sum. Though still a continuous
function of B , this similarity is more difficult to compute than Schur’s, as we shall see later.



Prof. W. Kahan                                                                             Page 2
Math. H110                                   Jordan’s Normal Form                   December 7, 2000 10:38 am


Schur’s triangularization will be shown to exist through a process of deflation ; as each
eigenvalue of B is chosen its eigenvector will be used to reduce by 1 the dimension of the
matrix from which the next eigenvalue of B will be chosen. Here is how deflation works:

Choose any eigenvalue ß1 of B and find eigenvector v1 as a nonzero solution of the singular
homogeneous linear system (ß1I – B)v1 = o . Then embed v1 in a new basis V := [v1, v2, …]
of the vector space as its first basis vector. B is the matrix of a linear operator whose matrix in
                                  β1 b
                                         T
                                                                                        β1
the new basis is V–1BV =                     because Bv1 = ß1v1 so V–1Bv1 =                  . Here B is a
                                   o B                                                  o

square matrix whose dimension is 1 less than B ’s . The eigenvalues of B are ß1 and the
eigenvalues of B because det(λI – B) = det(λI – V–1BV) = (λ – ß1)det(λI – B) . What was
just done to B can now be done to B : Choose any eigenvalue ß2 of B ( and of B ) and
solve B v2 = ß2v2 for a nonzero eigenvector v2 of B ( not of B ) and then form a new basis
V := [v2, v3, …] for the space upon which B ( not B ) acts; the first column of V–1B V is
                                                                                       T           β1 … …
  –1B v         β2                       T                                     β1     b V
V       2   =        . Set W :=    1 o       to find   (VW)–1B(VW)      =                     =    0 β2 …      .
                o                  o V                                                –1
                                                                                o    V BV          o o …

Repeating the process ultimately delivers an upper-triangular U = Q–1BQ with its eigenvalues
on its diagonal in the order in which they were chosen as eigenvalues of B .

Exercise: Use this U to deduce the Cayley-Hamilton Theorem from the factorization det(λI – B) = ∏j (λ – ßj) .
( Because the theorem’s proof given earlier required no knowledge of eigenvalues, it works also for a scalar field,
like the Rational field, whose matrices B may “lack” eigenvalues because the field is not algebraically closed.)

Schur’s triangularization is a special case of deflation performed by Unitary Similarities. The
given matrix B is regarded as a linear operator that maps a Unitary Space to itself; the space
is endowed with a length ||v|| := √(v*v) defined as the root-sum-squares of the magnitudes of
the elements of vector v . Only orthonormal bases are used for this space; every change from
one such basis to another is represented by a Unitary Matrix whose inverse equals its complex
conjugate transpose. When eigenvector v1 is found it is divided by its length to normalize it
so that ||v1|| = 1 , and then it is embedded in an orthonormal basis V := [v1, v2, v3, …] so that
V–1 = V* . There are many ways to construct such a V . One computes subsequent columns
v2, v3, … by applying Gram-Schmidt orthogonalization to the columns of [v1, I] and
discarding a resulting column of zeros. Another computes the elementary orthogonal reflector
V = V* = V–1 = I – 2uu*/u*u that swaps v1 with the first column of the identity I . Likewise
for V , so W above is unitary too, and so is the product Q of unitary matrices. Thus we
obtain Schur’s triangularization U = Q*BQ with Q* = Q–1 .

Below we’ll need a special Schur triangularization in which repeated eigenvalues of B appear
in adjacent locations on the diagonal of U . If ß1 is a repeated eigenvalue of B it is also an
eigenvalue of B and can be chosen for ß2 above. Thus can the needed ordering of B ’s
eigenvalues be imposed upon all diagonal elements of upper triangle U , at least in principle.


Prof. W. Kahan                                                                                        Page 3
Math. H110                               Jordan’s Normal Form                    December 7, 2000 10:38 am


Exercise: Use Schur’s triangularization to deduce that every Hermitian matrix A = A* is unitarily similar to a
real diagonal matrix, so the eigenvectors of A can be chosen to form a complex orthonormal basis.

A real Schur decomposition of a real square matrix B = QUQT exists in which QT = Q–1 is
real orthogonal and U is real block-upper-triangular with 1-by-1 and 2-by-2 blocks on its
diagonal; each 1-by-1 block is a real eigenvalue of B , and each 2-by-2 block is a real
matrix with two complex-conjugate eigenvalues of B . The existence is proved by choosing,
for any complex eigenvalue ß1 , a pair of orthogonal vectors that span an invariant subspace of
B belonging to this complex eigenvalue and its conjugate, and embedding the pair as the first
two vectors of a new basis for the space. This change of basis deflates the eigenproblem of B
to a real matrix B of dimension 2 less than B ’s. The deflation and subsequent real block-
triangularization is otherwise very much like the foregoing complex triangularization.

Exercise: Use Schur’s real triangularization to deduce that every real symmetric matrix A = AT is orthogonally
similar to a real diagonal matrix, so the eigenvectors of A can be chosen to form a real orthonormal basis.

Next we shall show how Schur’s U = Q*BQ can be reduced by further similarities ( changes
of basis ) to a diagonal sum of Pennants, each an upper-triangular matrix with one eigenvalue
of B repeated as often as the triangle’s dimension. First we need a …

Lemma £ : Suppose square matrices F and P of perhaps different dimensions have no
eigenvalue in common†. Define a linear operator £(Z) := ZF – PZ mapping the vector space of
matrices Z of appropriate dimensions to itself; then £(Z) = O only when Z = O , so this
linear operator £ is invertible.

Proof: Suppose £(Z) = O . Then PZ = ZF and hence P2Z = PZF = ZF2 , and similarly
PkZ = ZFk for k = 0, 1, 2, 3, ... . Now consider the characteristic polynomial of F , namely
Φ(λ) := det(λI – F) = ∏1≤j≤n (λ–ϕj) where ϕ1, ϕ2, …, ϕn are all the eigenvalues ( perhaps not
all distinct ) of F . The Cayley-Hamilton theorem says Φ(F) = O ; but all the factors of
Φ(P) = ∏1≤j≤n (P – ϕjI) are nonsingular so it is too. Expand Φ(…) in powers of its argument
to see term-by-term that Φ(P)Z = ZΦ(F) = O , so Z = O , as claimed. End of proof.

Therefore the equation £(X) = XF – PX = Y can be solved for X = £–1(Y) in terms of F, P
and Y of the right dimensions so long as F and P have no eigenvalue in common†, i.e. so
long as GCD(det(λI–F), det(λI–P)) = 1 . Entirely rational closed-form formulas for X do exist:
Let Yk := Pk–1Y + Pk–2YF + ... + PYFk–2 + YFk–1 for every integer k ≥ 0 . Evidently Y0 = O and Y1 = Y and
Yk = XFk – PkX . For each k substitute Yk for λk in Φ(λ) to get ¥ = XΦ(F) – Φ(P)X = –Φ(P)X , whereupon
X := –Φ(P)–1¥ . This is a rational formula requiring no knowledge of eigenvalues, but rarely useful for numerical
computation. Another way begins by reducing F and P by similarity each to one of the upper-triangular forms
above, say U := Q–1FQ and R := D–1PD . Then solve ZU – RZ = D–1YQ for Z element by element starting in
its lower left corner and working up to the right by diagonals. Finally set X := DZQ–1 .
——————————————————————————————————————————————
† Sometimes the phrase “ disjoint spectra ” is said instead of “ no eigenvalue in common.” First Physicists and
Chemists said “spectrum” for the set of eigenvalues of a linear operator that Quantum Mechanics associates with
an atom or molecule radiating light in colors characterized by its spectrum. Now Mathematicians say it too.



Prof. W. Kahan                                                                                     Page 4
Math. H110                                Jordan’s Normal Form                             December 7, 2000 10:38 am


Block-Diagonalizing a Block-Triangular Matrix
Lemma £ above helps us construct a triangular similarity that block-diagonalizes a given block-
                                                       –1
triangular matrix:      P Y    is similar to     I X        P Y I X           =    I –X P Y I X    =   P O        whenever
                        O F                      O I        O F O I                O I O F O I         O F
matrix X satisfies XF – PX = Y , and such an X exists when square matrices P and F have
disjoint spectra ( no common eigenvalue ). Repeat this process upon P and F so long as they
too are block-triangular and have on their diagonals square blocks whose spectra are disjoint.
Such configurations come from the special Schur triangularizations mentioned above.

Therefore, after any square matrix B has been triangularized by a similarity Q–1BQ = U in
such a way that equal eigenvalues of B are consecutive on the diagonal of U , similarities can
further reduce the triangle U ultimately to a diagonal sum of Pennants like this:
                                                 β1 I1 + N1       O                O       …       O
                                                     O        β2 I2 + N2           O       …       O
               (QK)–1B(QK) = K–1UK =                 O            O           β3 I3 + N3   …       O         ;
                                                    …             …                …       …       …
                                                    O             O                O       …   βk Ik + Nk

here ß1, ß2, ß3, …, ßk are the distinct eigenvalues of B . Each pennant ßI + N is a triangle
with only one eigenvalue on its diagonal, like this 4-by-4 example:
                                                              β   ?   ?   ?
                                               ßI + N =       0   β   ?   ?    .
                                                              0   0   β   ?
                                                              0   0   0   β
Each such pennant is copied from corresponding elements of U . Each strictly upper-triangular
N has zeros on its diagonal and is therefore Nilpotent, meaning Nm = O for some positive
integer m ≤ dim(N) . Upper-triangular matrix K is a pennant too, with 1’s on its diagonal
and nonzero superdiagonal blocks only where K–1UK puts zero blocks above the diagonal.

The foregoing diagonal sum of pennants can be shown to be a continuous function of B in the following sense: As
the elements of B all vary continuously but not too far, the elements of QK and of each pennant in the diagonal
sum also vary continuously; however the eigenvalues on each pennant’s diagonal can become no longer all equal,
though different pennants’ spectra remain disjoint. Eigenvalues can vary abruptly; an eigenvalue of multiplicity
m can split into m eigenvalues that spread apart as fast as the mth root of the perturbations in B , as in the first
example above with a tiny perturbation µ . Worse, the elements of QK or (QK)–1 can be so gargantuan that
roundoff committed during the pennants’ numerical computation gets amplified enough to obliterate some of the
data in B . A few computer programs ( not MATLAB ) try to avoid this obliteration by locating tight clusters of
B ’s eigenvalues, choosing one cluster per approximated pennant, in such a way that the elements of QK and
(QK)–1 never become intolerably big. This is a tough task. Several years ago Prof. Ming Gu proved a conjecture
of Prof. J.W. Demmel (they are both here at UCB) to the effect that attempts to avoid obliterating the data must
occasionally consume a lot of time trying to choose suitable clusters. Specifically, for all dimensions sufficiently
large there are rare matrices B for which choosing clusters consumes time that grows like an exponential function
of B ’s dimension though the time required to compute all approximate pennants would grow like the cube of
dimension ( comparable to several matrix multiplications or inversions ) if a good choice of clusters were known
in advance. Therefore our discussion of pennants and of Jordan’s Normal Form is entirely theoretical, not a
recipe for infallible numerical computation with arithmetic operations rounded to finite precision.




Prof. W. Kahan                                                                                                   Page 5
Math. H110                                  Jordan’s Normal Form                  December 7, 2000 10:38 am


Functions of Matrices and Sylvester’s Interpolation Formula
The diagonal sum of pennants helps us understand the extension of a scalar function ƒ(λ) of a
scalar argument λ to matrix arguments. We have already seen how a polynomial ƒ(…) can be
extended to a matrix argument; it happened in the Cayley-Hamilton theorem. Analogously,
exp(λ) = ∑n≥0 λn/n! can be extended to define exp(B) := ∑n≥0 Bn/n! for all square matrices
B since the infinite series converges absolutely and, ultimately, very quickly.

Exercise: The economist M. Keynes said “Ultimately we are all dead.” Roughly how many terms of the series
for exp(–1000) must be added up until every subsequent term is tinier than the sum of the series? How much
bigger than its sum is the biggest term in the series? One way to answer these questions is to use a computer to
generate those terms and the value of exp(–1000) . A better way uses Stirling’s asymptotic approximation
                            n! ≈ √(2πn) exp( n·log(n) – n + 1/(12n) +… ) for big n .
Answering these questions will help you appreciate why computers don’t compute exp(…) just from its series .

Exercise: Why bother to compute exp(…) for matrices? Confirm from the series that, for any scalar variable τ
and constant square matrix B , the derivative d exp(τB)/dτ = B·exp(τB) = exp(τB)B . Then show that the
solution y(τ) of dy/dτ = By + c(τ) is the vector function y(τ) = exp(τB)(y(0) + ∫oτ exp(–θB)c(θ)dθ ) .

                                                                                            001
Exercise: We say that matrix Y is a square root of X when Y2 = X . One of 0 1         and 0 0 0     has square
                                                                                 00
                                                                                            000
roots and the other doesn’t; say which and why. Explain why every n-by-n Hermitian Positive Definite X = X*
has at least 2n square roots Y , and if more than 2n then infinitely many. Not every such Y = √X since …

If an extension of ƒ(…) to square matrix arguments exists, it is expected to have certain
properties, among them Bƒ(B) = ƒ(B)B , C–1ƒ(B)C = ƒ(C–1BC) for all invertible C , and
     B1 O         ƒ(B 1)     O
ƒ(          ) =                     whenever ƒ(Bj) is defined for both square submatrices Bj , since
     O B2           O      ƒ(B 2)

these equations are certainly satisfied when ƒ(…) is a polynomial or an absolutely convergent
power series, as you should verify. P.A.M. Dirac summed up these expectations in one line:
                   Zƒ(B) = ƒ(B)Z for all matrices Z that satisfy ZB = BZ .
These expectations are met by Sylvester’s Interpolation Formula which expresses ƒ(B) as a
polynomial Φ(B) with coefficients that depend upon B and ƒ(…) as follows:

Suppose polynomials Φ(λ) and Ψ(λ) have these properties …
• Ψ(B) = O , and
• ƒ(λ) = Φ(λ) + Ψ(λ)·µ(λ) for a function µ(λ) sufficiently differentiable at all zeros of Ψ(…) .
Then Sylvester’s Interpolation Formula defines ƒ(B) := Φ(B) . How does this work?

First Ψ(…) has to be chosen, and then it has to be used to determine Φ(…) . One candidate
is Ψ(λ) = det(λI–B) because the Cayley-Hamilton theorem says this Ψ(B) = O , but a better
one may be the monic polynomial Ψ(λ) of minimum degree that satisfies Ψ(B) = O ; here a
monic polynomial of degree d has 1 as the coefficient of λd . This Ψ(λ) of minimum degree
divides evenly into every polynomial p(λ) satisfying p(B) = O , as would p(λ) = det(λI–B) ,
because p(λ) divided by Ψ(λ) yields a quotient polynomial q(λ) and a remainder
polynomial r(λ) = p(λ) – Ψ(λ)q(λ) of degree less than Ψ(λ) ’s ; but then r(B) = O and so
r(λ) = 0 since no nonzero polynomial r(…) of degree less than Ψ(…) ’s can satisfy r(B) = O .


Prof. W. Kahan                                                                                       Page 6
Math. H110                                Jordan’s Normal Form                      December 7, 2000 10:38 am


This Ψ(…) , called “ the Minimum Polynomial of B ” , is determined uniquely by B ; …
Exercise: Explain why. Then use the pennants form of B to show that its every eigenvalue ß is a zero of
Ψ(…) with the same multiplicity m as the maximum index m at which Nm–1 ≠ O for the pennant ßI + N . Use
Jordan’s Normal Form to show that the distinct zeros, if any, of the polynomial det(λI–B)/Ψ(λ) are those distinct
eigenvalues of B each with two or more eigenvectors; if they exist Sylvester would call B “ Derogatory.”

For Sylvester’s formula to work Ψ(…) does not have to be the minimum polynomial of B ;
any polynomial multiple of that minimum polynomial is acceptable. After Ψ(…) has been
chosen it determines uniquely a polynomial Φ(…) of minimum degree ( less than Ψ(…) ’s )
that Interpolates ( matches ) ƒ(…) and perhaps its first few derivatives ƒ'(…), ƒ"(…), …
( if they all have finite values ) at every zero ß of Ψ(…) in the following sense:
         if ß is a zero of Ψ(…) of multiplicity m ,
                 which means Ψ(ß) = Ψ'(ß) = Ψ"(ß) = … = Ψ(m–1)(ß) = 0 ≠ Ψ(m)(ß) ,
         then Φ(ß) = ƒ(ß) , Φ'(ß) = ƒ'(ß) , Φ"(ß) = ƒ"(ß) , …, and Φ(m–1)(ß) = ƒ(m–1)(ß) .

Why must Φ(…) satisfy these equations? How do they determine Φ(…) ? Why uniquely ?
Φ(…) must satisfy the last m equations at every m-tuple zero ß of Ψ(…) because ƒ(λ) = Φ(λ) + Ψ(λ)·µ(λ)
was assumed; it implies that the Taylor Series expansion of ƒ(λ)–Φ(λ) in powers of (λ–ß) begins at (λ–ß)m . If
the aggregate of those equations for all zeros of Ψ(…) determines Φ(λ) , it is determined uniquely because the
difference between two such polynomials is a polynomial of degree less than Ψ(…) ’s and yet, having all the
zeros of Ψ(…) with at least the same multiplicities, this difference would have degree no less than Ψ(…) ’s if it
did not vanish. Φ(…) is now determined by the aggregate of those equations because of their linearity in the
desired coefficients of Φ(…) ; the number of linear equations is the same as the number of coefficients, namely
the degree of Ψ(…) , and the equations’ solution is unique if it exists, so it exists and Sylvester’s ƒ(B) := Φ(B) .

Exercise: Prove that if all zeros ßj of Ψ(…) are distinct Φ(λ) = ∑j ƒ(ßj)Ψ(λ)/((λ–ßj)Φ'(ßj)) ; this is Lagrange’s
Interpolating Polynomial. Repeated zeros of Ψ(…) lead to a different Φ(…) named after Hermite.

In short, when ƒ(…) and enough of its derivatives take finite values on the spectrum of B , so
that Sylvester’s Interpolation Formula ƒ(B) := Φ(B) does exist, it provides a polynomial that
defines ƒ(B) but does not provide an always good recipe for computing it.

Analogous situations arise for functions like exp(…), sin(…), cos(…), … which are defined well by infinite
series or by differential equations both of which are impractical ways to compute those functions numerically at big
arguments; recall the Exercise above about exp(–1000) . For a matrix B of big dimension n , the n–1 matrix
multiplications required for an explicit computation of the polynomial Φ(B) would generally consume time
roughly proportional to n4 , far greater than the time ( roughly proportional to n3 ) that will be needed to
compute ƒ(B) from its pennants form when it can be computed, which is almost always. Worse, the coefficients
of a polynomial Φ(…) can be huge compared with its value, so the explicit computation of Φ(B) can turn into a
mess of rounding errors left behind after cancellation. For instance consider the Tchebyshev polynomials
Tk(λ) := cos(k·arccos(λ)) . They are polynomials because they satisfy the recurrence T0(λ) = 1 , T1(λ) = λ and
Tk+1(λ) = 2λTk(λ) – Tk–1(λ) for k = 1, 2, 3, … in turn. Although –1 ≤ Tk(λ) ≤ 1 when –1 ≤ λ ≤ 1 , coefficients
of Tk(λ) grow exponentially with k ; confirm for k ≥ 2 that Tk(λ) = 2k–1λk – 2k–3kλk–2 + … . This formula is a
numerically unstable way to compute values of Tk(λ) when λ is almost ±1 ; use the recurrence instead.

There’s a faster way than Sylvester’s Interpolation Formula to compute ƒ(B) when a pennants
form of B like (QK)–1B(QK) above is available: ƒ(B) = (QK) ƒ((QK)–1B(QK)) (QK)–1 . It is
derived from Sylvester’s polynomial thus: ƒ(B) = Φ(B) = (QK) Φ((QK)–1B(QK)) (QK)–1 , in


Prof. W. Kahan                                                                                        Page 7
Math. H110                                       Jordan’s Normal Form                         December 7, 2000 10:38 am


which inner factor Φ((QK)–1B(QK)) could be obtained from the pennants form by replacing its
every pennant ßI+N by Φ(ßI+N) . However, Φ(…) need not be computed at all. Since
Ψ((QK)–1B(QK)) = (QK)–1Ψ(B)(QK) = O , every pennant’s Ψ(ßI+N) = O ; consequently
Φ(ßI+N) = ƒ(ßI+N) , which can be computed faster than Φ(ßI+N) can as follows:

Exercise: Show that ƒ(ßI + N) = ƒ(ß)I + f'(ß)N + ƒ"(ß)N2/2 + … + ƒ(m–1)(ß)Nm–1/(m–1)! when Nm = O . Use
this formula and the pennants form of B to confirm that exp(B) = eτ exp(B–τI) for every scalar τ . ( Warning:
exp(B+C) = exp(B)·exp(C) only if BC = CB ; otherwise the Campbell-Baker-Hausdorff series must be used.)

Fortunately, Sylvester’s definition of ƒ(B) := Φ(B) as a polynomial does not compel us to
compute the polynomial, which could have high degree and huge coefficients causing loss of
accuracy to roundoff. Instead, the pennants form permits a more direct computation:
                              ƒ(β 1 I 1 + N 1)         O                  O            …         O
                                    O            ƒ(β 2 I 2 + N 2)         O            …         O
             ƒ(B) = (QK)            O                  O            ƒ(β 3 I 3 + N 3)   …         O            (QK)–1
                                    …                  …                  …            …          …
                                    O                  O                  O            …   ƒ(β k I k + N k)

in which each pennant ƒ(ßI+N) = ƒ(ß)I + f'(ß)N + ƒ"(ß)N2/2 + … + ƒ(m–1)(ß)Nm–1/(m–1)!
when Nm = O . Most often m = 1 or 2 .

Actually, we needn’t compute B ’s pennants form to compute ƒ(B) . The pennants form’s
purpose is to help us understand ƒ(B) , after which we can compute it in better but unobvious
ways developed here at UCB by Prof. B.N. Parlett and his students. An outline follows.

Consider first an approximate pennant P ≈ ßI + N ; here P is upper-triangular with all diagonal elements nearly
equal, so P–ßI ≈ N is nearly nilpotent. We can’t be sure the Taylor series
                 ƒ(P) = ƒ(ß)I + f'(ß)(P–ßI) + ƒ"(ß)(P–ßI)2/2 + … + ƒ(m–1)(ß)(P–ßI)m–1/(m–1)! + …
will converge rapidly when P–ßI ≈ N is nearly nilpotent since its elements may be huge. Still, if the dimension
m of P is small, a polynomial close to the first m terms of this series can be used in Sylvester’s Interpolation
Formula to compute ƒ(P) as follows: Let Ψ(λ) := det(λI–P) = (λ–ß1)(λ–ß2)(λ–ß3)(…)(λ–ßm) and express
       ƒ(λ) = ƒ(ß1) + (∆ƒ(ß1, ß2) + (∆2ƒ(ß1, ß2, ß3) + … + ∆m–1ƒ(ß1, ß2, …, ßm)(λ–ßm–1)...)(λ–ß2))(λ–ß1) +
                                                 + ∆mƒ(ß1, ß2, …, ßm, λ) Ψ(λ)
in terms of its Divided Differences ∆k–1ƒ(ß1, ß2, …, ßk) using Newton’s Divided Difference Formula ; these are
explained ( usually in a different notation ) in texts like Conte & de Boor Elementary Numerical Analysis 3rd ed.
(1980) ch. 2. Like derivatives, divided differences can be computed from the function ƒ(…) and scalar values of
its argument. For instance ∆ƒ(ß1, ß2) = (ƒ(ß1)–ƒ(ß2))/(ß1–ß2) if ß1 ≠ ß2 ; otherwise ∆ƒ(ß, ß) = ƒ'(ß) . Then
      ƒ(P) = ƒ(ß1)I + (∆ƒ(ß1, ß2) + (∆2ƒ(ß1, ß2, ß3) + … + ∆m–1ƒ(ß1, ß2, …, ßm)(P–ßm–1I)...)(P–ß2I))(P–ß1I)
because Ψ(P) = O . Thus ƒ(P) can be computed quickly from a polynomial for small approximate pennants P .

Next consider that ƒ( P Y ) = ƒ(P) X              has to satisfy         P Y ƒ( P Y ) = ƒ( P Y ) P Y , so X must satisfy
                      O F         O ƒ(F)                                 O F    O F        O F   O F
an equation XF–PX = Yƒ(F)–ƒ(P)Y whose solution exists when the spectra of P and F are disjoint, and X can
be computed quickly when P and F are upper-triangular; see after Lemma £ . Thus Schur’s decomposition
U = Q–1BQ provides an upper triangle from which ƒ(B) = Qƒ(U)Q–1 can be computed directly block by block,
starting at near-pennant diagonal blocks and working to the upper right, without computing U ’s pennants form.




Prof. W. Kahan                                                                                                     Page 8
Math. H110                                Jordan’s Normal Form                    December 7, 2000 10:38 am


So, ƒ(B) can be computed without reducing B to a diagonal sum of pennants by similarities.
Can we compute ƒ(B) without knowing Schur’s triangle U = Q–1BQ nor B ’s eigenvalues?

One might surmise so at first. Certainly ƒ(B) can be computed whenever ƒ(λ) = p(λ)/q(λ) is
a Rational function, a ratio of polynomials, whose denominator polynomial q(λ) has no zero
coincident with an eigenvalue of B . ( Otherwise q(B)–1 would not exist; do you see why?)
Moreover ƒ(B) = p(B)q(B)–1 = Φ(B) wherein all the coefficients of Sylvester’s polynomial
Φ(…) can be determined by finitely many rational arithmetic operations upon the elements of
B without first computing its eigenvalues. An inelegant way to do so is suggested by the …

Exercise: Use the Cayley-Hamilton theorem to express q(B)–1 , if it exists, as a polynomial in B .

Exercise: Show that, in principle, B ’s minimum polynomial can be determined by finitely many comparisons
and rational arithmetic operations upon B ’s elements. Hint: I, B, B2, B3, … can’t all be linearly independent.

The situation is different for non-rational functions. Any analytic function ƒ(…) can be
extended to ƒ(B) provided no eigenvalue of B falls upon a singularity of ƒ(…) . A function
ƒ(…) is analytic if its domain in the complex λ-plane consists of a region at every interior
point ß of which the Taylor series
                  ƒ(ß) + f'(ß)(λ–ß) + ƒ"(ß)(λ–ß)2/2 + … + ƒ(m)(ß)(λ–ß)m/m! + …
converges to ƒ(λ) for all sufficiently small |λ–ß| > 0 . The singularities of ƒ(…) are the
boundary-points of its domain. For example, polynomials and exp(…) and sin(…) are all
analytic with no finite singularity; rational functions are analytic with their singularities at their
poles ( where they take infinite values ); cotan(…) is analytic with singularities at all integer
multiples of π ; analytic functions ln(…) and √(…) are singular at 0 and along the negative
real axis across which they jump discontinuously. Every textbook about analytic functions of a
complex variable explains Cauchy’s Integral Formula
                                  ƒ(ß) = ∫Ç ( ƒ(λ)/(λ–ß) ) dλ /(2πı)
in which ı = √(–1) and the path of integration runs counter-clockwise around a closed curve Ç
lying in the domain of ƒ(…) and surrounding ß but no singularity of ƒ(…) . ( The integral is
zero if ß lies outside Ç .) This formula can be used to extend ƒ(…) to a matrix argument:
                                ƒ(B) := ∫Ç ( ƒ(λ)(λI–B)–1 ) dλ /(2πı)
provided Ç surrounds the spectrum of B but no singularity of ƒ(…) . Although no eigenvalue
of B appears explicitly in this integral formula, it delivers the same result as did Sylvester’s
Interpolation Formula. To see why, let Ψ(…) be either B ’s minimum polynomial or its
characteristic polynomial and consider Θ(ß, λ) := ( 1 – Ψ(ß)/Ψ(λ) )/(λ–ß) . Despite first
appearances, Θ(ß, λ) is a polynomial in ß of degree one less than Ψ ’s with coefficients that
are rational functions of λ . ( Do you see why?) Then we find that Sylvester’s Polynomial
         Φ(ß) := ∫Ç ƒ(λ)Θ(ß, λ) dλ /(2πı) = ƒ(ß) – Ψ(ß) ∫Ç ( ƒ(λ)/((λ–ß)Ψ(λ)) ) dλ /(2πı)
has coefficients determined by ƒ(…) , by Ψ(…) and by integration around any closed curve
Ç inside which lies the spectrum of B and no singularity of ƒ(…) . Then ƒ(B) = Φ(B) since
Ψ(B) = O , and yet no eigenvalue of B appears explicitly in the integral that defined Φ(…) .

But it’s a swindle. Attempts to simplify the integral that defined Φ(…) by expressing it in a
“closed form” in terms of elementary transcendental functions like ln(…) and arctan(…) etc.
bring the eigenvalues of B back. They persist unless ƒ(…) is rational or sometimes algebraic.


Prof. W. Kahan                                                                                       Page 9
Math. H110                             Jordan’s Normal Form               December 7, 2000 10:38 am


Example: exp(Bτ)
Solutions η(τ) of a 2nd-order linear homogeneous differential equation η" – 2αη' – γη = 0
( here η' = dη/dτ and η"(τ) = d2η/dτ2 ) with constant coefficients α and γ can be obtained
from an equivalent first-order homogeneous system of linear differential equations y' = By
with y =     η'   and a constant coefficient matrix B =   2α γ   . Solutions are y(τ) = exp(Bτ)yo
             η                                             1 0

for any constant yo = y(0) . To compute exp(Bτ) we get the eigenvalues ßj = α ± √(α2 + γ)
of B as the zeros of its characteristic polynomial det(λI – B) = λ2 – 2αλ – γ = (λ – ß1)(λ – ß2) .

ß1 ≠ ß2 if α2 + γ ≠ 0 , and then the Lagrange interpolating polynomial that matches exp(λτ)
at λ = ß1 and at λ = ß2 is Φ(λ) := ( (λ–ß2)exp(ß1τ) – (λ–ß1)exp(ß2τ) )/(ß1 – ß2) , and then
exp(Bτ) = Φ(B) . Note how this degenerates towards 0/0 as ß1 and ß2 approach equality.

ß1 = ß2 = α if α2 + γ = 0 , and then the Hermite interpolating polynomial that matches
exp(λτ) and its derivative τ·exp(λτ) at λ = α is Φ(λ) := ( 1 + τ(λ–α) )exp(ατ) , and then
exp(Bτ) = Φ(B) again. Many texts provide this and the former formula for Φ(…) .
Although exp(Bτ) is a continuous function of B , the formula for Φ(B) jumps from one form
to another as α2 + γ passes through zero. This jump, reflecting the discontinuity of Jordan’s
Normal Form of B rather than any aspect of the differential equation, is an artifact of algebra
removable by using the pennants form of B instead of Jordan’s, and Newton’s interpolating
polynomial ( twice ) instead of Lagrange’s. Set δ := (ß1–ß2)/2 = √(α2 + γ) and find
     Φ(λ) = exp(ατ)( cosh(δτ) + (λ–α)sinh(δτ)/δ ) = exp(ατ)( cos(ıδτ) – (λ–α)sin(ıδτ)ı/δ )
wherein it is understood that sinh(δτ)/δ —› τ and –sin(ıδτ)ı/δ —› τ as δ —› 0 . Despite
appearances the two forms of Φ(λ) are really the same because of identities cos(ıµ) = cosh(µ)
and sin(ıµ) = sinh(µ)ı ; use whichever is more convenient in the light of the sign of α2 + γ .
                                            ………

Example: Nonexistent and Nonunique √(…)
Nothing like Sylvester’s Polynomial Interpolation Formula can cope with extensions of some
functions ƒ(…) to some square matrices B . The difficulties will be illustrated by using √(…)
as an example. We may say that Y is a square root of B when Y2 = B , but not every such
square root Y , if any exist, is a polynomial in B as Sylvester’s formula requires of √B .
Existence is problematical when 0 is an eigenvalue of B because 0 is also a singularity of
√(…) ; its derivative is infinite there. Therefore it is not surprising that   01     has no square
                                                                               00
                                       010                                                    0        ξ  η
roots; but it is surprising that B =   000    has infinitely many square roots Y =            0        0  0   ,
                                       000                                                    0       1⁄η 0
as η and ξ run through all scalar values except η = 0 , yet √B does not exist. A derogatory
B , with more than one independent eigenvector for some eigenvalue, may have infinitely
many square roots; for instance, ±√I = ±I but the 2-by-2 square roots Y of I include also

all 2-by-2 matrices with eigenvalues +1 and –1 , namely Y = ±             1 – ξη      η           .
                                                                           ξ       – 1 – ξη



Prof. W. Kahan                                                                                    Page 10
Math. H110                                Jordan’s Normal Form                           December 7, 2000 10:38 am


Jordan’s Normal Form of Nilpotent Matrices
These notes began by displaying Jordan’s Normal Form of B ,
                                       β1 I1 + J1       O             O       …         O
                                           O        β2 I2 + J2        O       …         O
                         C–1BC =           O            O        β3 I3 + J3   …         O        ,
                                          …            …              …       …         …
                                          O            O              O       …     βL IL + JL

and now its existence will be proved. This form, the canonical form under Similarity, is
important because all matrices Similar to B have the same Jordan Normal Form except that
its Jordan blocks ßI + J may appear ordered differently along the diagonal. The proof is
complicated because the number of blocks and their dimensions depend discontinuously upon
B whenever any block is 2-by-2 or bigger, or whenever any two blocks share an eigenvalue
ß ; in short, the Jordan Normal Form is discontinuous whenever it is interesting.

These interesting cases are rare because they require that some eigenvalue be repeated, which
occurs just when the characteristic polynomial det(λI–B) and its derivative Trace(Adj(λI–B))
have at least one zero in common, and therefore have a nontrivial Greatest Common Divisor,
which occurs just when these polynomials’ coefficients satisfy a polynomial equation illustrated
by the following example:
   λ4 + ωλ3 + ξλ2 + ηλ + ζ and its derivative 4λ3 + 3ωλ2 + 2ξλ + η vanish together for some λ just when
                                           1 ω         ξ     η    ζ   0    0
                                           0 1        ω      ξ    η    ζ   0
                                           0 0         1     ω    ξ   η    ζ
                                      det( 0 0         0     4   3ω   2ξ   η )=0.
                                           0 0         4    3ω   2ξ   η    0
                                           0 4        3ω    2ξ    η   0    0
                                           4 3ω       2ξ     η    0   0    0
Thus the elements of interesting matrices B , the ones with repeated eigenvalues, satisfy one
complicated polynomial equation which places B on a convoluted hypersurface in the space of
all matrices of B ’s dimensions. This hypersurface intersects itself in many ways. For example,
in the 4-dimensional space of 2-by-2 matrices α β           the interesting ones, with a double eigenvalue, lie on a 3-
                                                    γ δ
                                                                 2
dimensional circular-conical cylinder whose equation is ( α – δ ) + 4βγ = 0 ; its self-intersections form two 2-
dimensional planes whereon α = δ and either β = 0 or γ = 0 , and these two intersect along a line whereon
α = δ and both β = 0 and γ = 0 . Only on that line do matrices have two eigenvectors for one eigenvalue.
The self-intersections fall into many families of shapes some of which correspond to different
special cases in the proof of the Jordan Normal Form’s existence, complicating it.

Our proof will be computational to minimize abstractness and to indicate how Jordan’s Normal
Form can be obtained, at least in principle, from arithmetic performed exactly. ( Rounded
arithmetic operations are problematical for reasons discussed after the pennants form above.)
The first step reduces B by Similarity to a pennants form, a diagonal sum of blocks each like
ßI + N where ß is an eigenvalue of B and N is strictly upper-triangular and therefore
nilpotent ; Nm = O for some m . Recall the pennants form (QK)–1B(QK) displayed above.


Prof. W. Kahan                                                                                            Page 11
Math. H110                                   Jordan’s Normal Form                     December 7, 2000 10:38 am


The next step seeks for each pennant ßI + N an invertible matrix A such that A–1(ßI + N)A is
a Similarity exhibiting Jordan’s Normal Form of the chosen ßI + N . If all searches succeed,
the diagonal sum  of the A ’s will provide a Similarity –1(QK)–1B(QK) that exhibits
Jordan’s Normal Form of B ’s pennants form and hence of B ( with C = QKÂ above ).

Two more steps will simplify each search. First observe that A–1(ßI + N)A = ßI + A–1NA is
Jordan’s Normal Form of ßI + N just when A–1NA is Jordan’s Normal Form of N , so no
generality is lost by seeking Jordan’s Normal Form of only nilpotent matrices. Second, the
search will exploit a proof by induction starting with the hypothesis that, for each nilpotent
matrix Ñ of dimensions smaller than N ’s , a Similarity can be found that transforms Ñ into
its Jordan Normal Form. This hypothesis is valid for all 1-by-1 nilpotent matrices since there
is only one of them, namely [0] , and the 1-by-1 identity [1] effects the desired Similarity.

Exercise: Although nilpotent Ñ need not be in its pennants form, this form is strictly upper-triangular; why?

Since N is nilpotent, Nm–1 ≠ O = Nm for some positive integer m . It is important to our
search. If m = 1 then N = O is already in Jordan’s Normal Form and our search is over. Let
us suppose m ≥ 2 and continue searching. Since Nm–1 ≠ O we can find a column vector v
such that Nm–1v ≠ o and then construct matrix V := [Nm–1v, Nm–2v, … Nv, v] . Its columns
must be linearly independent for the following reason: If ∑k≤j<m µjNjv = o for some k ≥ 0
then Nm–k–1∑k≤j<m µjNjv = µkNkv = o and so µk = 0 ; setting k = 0, 1, 2, …, m–1 in turn
implies every µj = 0 . Therefore V can be embedded in a square matrix [V, V] all of whose
columns are linearly independent; V is empty if m = dimension(N) . Anyway [V, V]–1 exists.

V was designed to satisfy NV = VJ where J is the m-by-m nilpotent Jordan block; if m = 4
       0   1   0   0
J =    0   0   1   0   , for example. Therefore [V, V]–1N[V, V] =             J ?
                                                                                    in which Ñ is empty if
       0   0   0   1                                                          ON˜
       0   0   0   0
m = dimension(N) . If Ñ is empty we have found Jordan’s Normal Form of N so our search
is over. Let us suppose 2 ≤ m < dimension(N) and continue searching.
                                             m           m
Now Ñ is not empty but still           J ?
                                                 =   J       ?
                                                                 = [V, V]–1Nm[V, V] = O . This tells us Jm = O
                                       ON˜             ˜     m
                                                     O N

( which we knew already ) and Ñm = O . Our induction hypothesis says that this nilpotent Ñ
can be transformed by a Similarity to its Jordan Normal Form A–1ÑA , say. Substituting
V A for V and renaming it V replaces Ñ by its Jordan Normal Form in a Similarity

                                             [V, V]–1N[V, V] =          J R
                                                                        ON˜

that defines R and exhibits Ñ already as a diagonal sum of Jordan blocks J1, J2, J3, … each
like J but of perhaps diverse dimensions. Later we shall see how very special R is.

To complete our search we must find one more Similarity, one that gets rid of R .


Prof. W. Kahan                                                                                        Page 12
Math. H110                                     Jordan’s Normal Form                        December 7, 2000 10:38 am


                                                          –1                                             –1
                                                    I Z        J R   I Z       J O                 I Z            I –Z
The desired Similarity takes the form                                      =          . Since                 =          ,
                                                    O I        ON˜   O I       ON˜                 O I            O I
this Similarity gets rid of R if and only if ZÑ – JZ = R . We have to compute a solution Z of
this linear equation in order to compute the desired Similarity. The equation cannot be solved
with the aid of Lemma £ above because the spectra of Ñ and J are not disjoint; they are the
same. Consequently the linear operator that maps Z to ZÑ – JZ is singular; ZÑ – JZ = R can
be solved for Z ( nonuniquely ) if and only if this equation is consistent. To demonstrate its
consistency we shall apply one of Fredholm’s Alternatives :
                   The equation Fz = r has at least one solution z if and only if
                                   wTr = 0 whenever wTF = oT .
                   ( See class notes titled “ The Reduced Row-Echelon Form is Unique”.)
              T
Instead of w r we must use Trace(WR) because this runs through all linear combinations of
the elements of R as W runs through all matrices of the same shape as RT . ( Do you agree?)
Instead of wTFz we must use Trace(W(ZÑ – JZ)) = Trace((ÑW – WJ)Z) ; the last equation
follows from Trace((WZ)Ñ) = Trace(Ñ(WZ)) . As wTFz = 0 for all z just when wTF = oT ,
so does Trace(W(ZÑ – JZ)) = 0 for all Z just when ÑW – WJ = O . Therefore, instead of
wTF = oT we must use ÑW – WJ = O . In short, …
       To find a Similarity that gets rid of R and exhibits Jordan’s Normal Form of N ,
              we must solve ZÑ – JZ = R for Z , which can be done if and only if
                             Trace(WR) = 0 whenever ÑW = WJ .
This last line is all that remains for us to prove; we must deduce that Trace(WR) = 0 for all
solutions W of ÑW = WJ from a property of R . What property of R do we need? Set
              S := Jm–1R + Jm–2RÑ + Jm–3RÑ2 + … + J2RÑm–3 + JRÑm–2 + RÑm–1 .
S = O is the property we need. To prove S = O confirm first ( by induction for m = 1, 2, 3,
                           m           m                                             m
… in turn ) that     J R
                               =   J       S
                                                , and then recall that         J R
                                                                                          = [V, V]–1Nm[V, V] = O .
                     ON˜             ˜m                                        ON˜
                                   O N
From S = O we shall deduce that Trace(WR) = 0 whenever ÑW = WJ via a sequence of
steps that simplify the problem ultimately to a nilpotent N with only two Jordan blocks.

Ñ is a diagonal sum of nilpotent Jordan blocks J1, J2, J3, … that induce partitions of R, S
and W into blocks of corresponding sizes as follows: split R = [R1, R2, R3, …] so that
           J R1 R2 R3 …
                                                                                     W1
           O J1 O O …
 J R                                                                                 W2
       =   O O J2 O …      and then S = [S1, S2, S3, …] and W =                             compatibly. Every
 ON˜                                                                                 W3
           O O O J3 …
                                                                                     …
           … … … … …

Sk := Jm–1Rk + Jm–2RkJk + Jm–3RkJk2 + … + J2RkJkm–3 + JRkJkm–2 + RkJkm–1 = O , and every
Wk satisfies JkWk = WkJ just when ÑW = WJ , so that Trace(WR) = ∑k Trace(WkRk) = 0
for all such W just when every Trace(WkRk) = 0 for every k = 1, 2, 3, … . ( You agree?)




Prof. W. Kahan                                                                                                Page 13
Math. H110                               Jordan’s Normal Form                     December 7, 2000 10:38 am


Whatever is done to cope with each Jordan block Jk and its associated Rk, Sk and Wk can
be done independently of the others; nothing is lost by pretending that Ñ is just one Jordan
block thereby dropping the subscripts k to simplify the process. In short, …
          There is a Similarity that exhibits Jordan’s Normal Form of N if and only if
                               Trace(WR) = 0 whenever ÑW = WJ
              where J and Ñ are Jordan blocks, Jm = O ≠ Jm–1 , Ñm = O , and
                                  S := ∑1≤j≤m Jm–jRÑj–1 = O .

Next let n be the dimension of Ñ ; from Ñm = O follows n ≤ m . ( Recall that m is J ’s
dimension.) Let n-by-m matrix Wo := [O, I] have m–n columns of zeros followed by the n-
by-n identity matrix. Observe that ÑWo = [O, Ñ] = WoJ . Moreover, …
                   As π(…) runs through all polynomials of degree less than n ,
                  W := π(Ñ)Wo = Woπ(J) runs through all solutions of ÑW = WJ .
To see why this is so, define function urc(X) to be the element in the Upper-Right Corner of
its matrix argument X . Thus urc(W) = ω1m . More generally the element of W in row i and
column j is ωij = urc(Ñi–1WJm–j) because pre-multiplication by Ñi–1 shifts row i up to row
1 , and post-multiplication by Jm–j shifts column j right to column m . If ÑW = WJ then
ωij = urc(Ñi–1WJm–j) = urc(Ñm–1+i–jW) =: πj–i–m+n , say. Here πk = 0 if k < 0 ( since then
Ñm–1+i–j = Ñn–1–k = O ), and k = j–i–m+n < m–0–m+n = n . Set π(λ) := ∑0≤k<n πkλk and
observe that the element in row i and column j of π(Ñ)Wo = ∑0≤k<n πkÑkWo is
        urc(Ñi–1∑0≤k<n πkÑkWoJm–j) = urc(∑0≤k<n πkÑk+i–1+m–jWo)                         … since WoJ = ÑWo
                                            = urc(∑0≤k<n πk   Ñk+i–1+m–j)               … since Wo := [O, I]
                                 = urc(πj–i–m+n    = πj–i–m+n = ωij ;
                                                            Ñn-1)
therefore every solution W of ÑW = WJ has the form W = π(Ñ)Wo as asserted above.

Next we shall demonstrate how every such W satisfies Trace(WR) = 0 , completing the proof
that there is a Similarity that exhibits the Jordan Normal Form. Trace(WR) = Trace(RW) ,
and W = ∑0≤k<n πkÑkWo , so we need only demonstrate Trace(RÑkWo) = O for all k ≥ 0 :
      Trace(RÑkWo) = ∑1≤j≤m urc(Jj–1 RÑkWo Jm–j)                      … since RÑkWo is m-by-m
                          = ∑1≤j≤m urc(Jj–1 RÑm–j+kWo)                … since WoJ = ÑWo
                          = urc( ∑1≤j≤m Jj–1 RÑm–j+kWo) = urc(SÑkWo) = O .                    End of proof.

The foregoing proof was devoted mostly to proving that the linear system ZÑ – JZ = R has at least one solution
Z , not to computing it. Elements of Z can be computed in order starting at the lower left and working up to the
right by diagonals. Computation is tricky because the system ZÑ – JZ = R is both over- and under-determined. It
is over-determined because no solution Z exists unless R is constrained by S = O , which means each diagonal
of R that ends in its last row must sum to zero. The system is under-determined because it does not determine its
solution Z uniquely; another solution is Z + Zoπ(Ñ) where Zo is obtained from the n-by-n unit matrix by
appending m–n rows of zeros, and π(…) is any polynomial of degree less than n . Jordan’s Normal Form of B
can vary discontinuously, and the Similarity C–1…C that exhibits it violently discontinuously, as B varies.



Prof. W. Kahan                                                                                      Page 14
Math. H110                                     Jordan’s Normal Form                              December 7, 2000 10:38 am


Nested Irreducible Invariant Subspaces
We know every linear operator B that maps a vector space to itself is represented by a matrix
                                              β1 I1 + J1       O            O        …           O
                                                   O       β2 I2 + J2       O        …           O
                                 C–1BC =           O           O        β3 I3 + J3   …           O
                                                   …          …            …         …           …
                                                   O          O            O         …       βL IL + JL

in Jordan’s Normal Form if an appropriate basis C is chosen. Among the basis vectors must
be eigenvectors cj ≠ o that satisfy Bcj = ßjcj , but if the Jordan Normal Form is not purely
diagonal these eigenvectors are too few to span the space; then extra vectors have to be found to
fill out the basis. These extra vectors needed for “ an appropriate basis ” are called “ principal
vectors ” or “ generalized eigenvectors.” For every Jj of dimension greater than 1 there is a
principal vector dj that satisfies Bdj = ßjdj + cj . For every Jj of dimension greater than 2
there is a principal vector ej that satisfies Bej = ßjej + dj . And so on. For example, if
                 β   1   0   0                 1              0              0           0
B = ßI + J =     0   β   1   0   , then c =    0   , d=       1    , e=      0   , f=    0   , and Bf = ßf + e ,
                 0   0   β   1                 0              0              1           0
                 0   0   0   β                 0              0              0           1
Be = ße + d , Bd = ßd + c , and finally Bc = ßc .

Unlike eigenvectors, whose directions are determined uniquely except when different Jordan
blocks have the same eigenvalue, principal vectors are intrinsically nonunique; for example,
d+c is a principal vector as well as d . In fact, for any polynomial π(…) with π(0) = 1 , the
columns of P := π(J) after the first provide another set of principal vectors for ßI + J ; do you
see why? Their intrinsic nonuniqueness makes principal vectors sometimes more difficult to
handle theoretically than the subspaces they span.

Partition the basis C into blocks of basis vectors corresponding to the Jordan blocks of
C–1BC thus: C = [C1, C2, C3, …, CL] , so that BCj = Cj(ßjI + Jj) . This shows that the range
of Cj is a subspace mapped to itself by B ; it is called an Invariant Subspace of B although
strictly speaking it is a subspace of the vector space. B decomposes this vector space into an
Irreducible sum of Invariant Subspaces : Each such subspace, spanned by those columns Cj
of C that correspond to a Jordan block (ßjIj + Jj) , is mapped to itself by B , has zero
intersection with all those other invariant subspaces, and is irreducible ( cannot itself be
decomposed into a sum of two invariant subspaces ). B ’s effect upon each invariant subspace
is revealed completely by its corresponding Jordan block, which reveals how this invariant
subspace may contain a nested sequence of invariant sub-subspaces as follows:

For simplicity, suppose ß appears in just one Jordan block ßI + J and its dimension is m ,
so Jm = O ≠ Jm–1 . Then the invariant subspace corresponding to this block is determined
uniquely as the m-dimensional null-space of (B – ßI)m . Within this invariant subspace, if
m > 1 , is another invariant sub-subspace, the (m–1)-dimensional nullspace of (B – ßI)m–1 .
And so on; the innermost nonzero invariant subspace in the nest is the nullspace of B – ßI .


Prof. W. Kahan                                                                                                     Page 15
Math. H110                                Jordan’s Normal Form                      December 7, 2000 10:38 am


Thus, when no eigenvalue of B appears in more than one Jordan block, all of B ’s invariant
subspaces, including the nested ones, are determined uniquely; and this proves that Jordan’s
Normal Form is unique except for the ordering of Jordan blocks along the diagonal. Agreed?

But the Derogatory case, when some eigenvalue ß of B appears in more than one Jordan
block, is not so simple. In this case B determines uniquely the invariant subspace associated
with ß ; it is the nullspace of (B – ßI)k for all sufficiently large k . Its further decomposition
into a sum of irreducible invariant subspaces is an accident of a computational process not
determined uniquely by B . However, B does determine the dimensions of its Jordan blocks
uniquely as follows: ( Check this by working with the Jordan Normal Form instead of B .)

For m = 1, 2, 3, … let nm(ß) be the number of Jordan blocks of dimension m with ß on
their diagonals. Finitely many nm(ß) ≠ 0 . And n1(ß) + n2(ß) + n3(ß) + … = Nullity(B – ßI) .
Also n1(ß) + 2n2(ß) + 2n3(ß) + … = Nullity((B – ßI)2) , from which follows that
n1(ß) = 2 Nullity(B – ßI) – Nullity((B – ßI)2) . In a similar fashion for k = 1, 2, 3, …,
     n1(ß) + 2n2(ß) + 3n3(ß) + … + (k–1)nk–1(ß) + knk(ß) + knk+1(ß) + … = Nullity((B – ßI)k) ,
which implies nm(ß) = –Nullity((B – ßI)m–1) + 2·Nullity((B – ßI)m) – Nullity((B – ßI)m+1) for
all m > 0 . Thus B determines the numbers of its Jordan blocks of all dimensions.

Exercise: For any nontrivial projector P , satisfying O ≠ P = P2 ≠ I , show that its Jordan blocks are all 1-by-1 .

Exercise: First show that every complex square matrix B is Similar to its transpose by showing that they have
the same Jordan Normal Form. Then show that B must be a product of two complex symmetric matrices; i.e.,
B = QR–1 where complex Q = QT and R = RT . Hint: Reverse the order of invariant subspaces’ basis vectors.



The Real Jordan Normal Form
For any given real n-by-n matrix B there exists at least one real invertible matrix C that
transforms B by Similarity into a diagonal sum
                                           E1 + K1     O         O       …      O
                                              O      E2 + K2     O       …      O
                             C–1BC =          O        O       E3 + K3   …      O
                                             …         …         …       …     …
                                             O         O         O       …   EL + KL

of Real Jordan Blocks each of the form E + K where either                                        β   1   0   0   0   0
                                                                                                 0   β   1   0   0   0
     E + K = ßI + J with a real eigenvalue ß of B , like this 6-by-6 example:                    0   0   β   1   0   0   ,
                                                                                                 0   0   0   β   1   0
                                                                                                 0   0   0   0   β   1
or                                                                                               0   0   0   0   0   0

     E + K = (ßI + µS) + J2 for a pair of complex conjugate eigenvalues ß ± ıµ of B , µ ≠ 0 ,
     and S = –ST is a diagonal sum of 2-by-2 skew-symmetric matrices that satisfies S2 = –I .

Here is a 6-by-6 example:


Prof. W. Kahan                                                                                           Page 16
Math. H110                              Jordan’s Normal Form                            December 7, 2000 10:38 am



                                                              β   µ    1   0    0   0
                                                             –µ   β    0   1    0   0
                            E + K = (ßI + µS) + J2 =          0   0    β   µ    1   0    .
                                                              0   0   –µ   β    0   1
                                                              0   0    0   0    β   µ
                                                              0   0    0   0   –µ   β
Such a block has two equally repeated complex conjugate eigenvalues ß ± ıµ and only two
complex conjugate eigenvectors regardless of its dimension, which is always even.

Every eigenvalue of B appears in at least one Jordan Block, and these blocks can appear in
any order, and their various dimensions add up to the dimension of B , which determines its
Jordan blocks completely except for the order in which they appear and the signs of their
imaginary parts µ . The proof that such a real Jordan Normal Form exists is more complicated
than the proof for the complex case but no more illuminating, so it is not presented here.

Exercise: First show that every real square matrix B is Similar to its transpose by showing that they have the
same Real Jordan Normal Form. Then show that B must be a product of two real symmetric matrices of which at
least one is invertible; i.e., B = HY–1 where real H = HT and Y = YT . Hint: See the complex case.



A Rational Canonical Form
Any monic polynomial Ψ(λ) = λn – µ1λn–1 – µ2λn–2 – … – µn–1λ – µn is the Characteristic
                                                      0    1      0      0      …        0    0
                                                      0    0      1      0      …        0    0
                                                      0    0      0      1      …        0    0
Polynomial of its Companion Matrix Y =                0    0      0      0      …        0    0     . In fact,
                                                      …   …      …      …       …        …    …
                                                      0    0      0      0      …        0    1
                                                      µn µn – 1 µn – 2 µn – 3   …        µ2   µ1

Ψ(…) is the Minimum polynomial of Y because Nullity(λI – Y) ≤ 1 . ( Do you see why?)

Conversely, for any given square matrix B there exists at least one invertible matrix C that
                                                                           Y1 O O … O
                                                                           O Y2 O … O
transforms B by Similarity into a diagonal sum C–1BC =                     O O Y3 … O              of companion
                                                                           … … … … …
                                                                           O O O … YL

matrices; Y1 is the companion of the minimum polynomial of B , and every Yj+1 is the
companion of a divisor of the minimum polynomial of Yj . Although C may be computed
from the elements of B in finitely many rational arithmetic operations, the diagonal sum and
C are discontinuous functions of the elements of B ; worse, C is violently discontinuous.

In the 1940s an algorithm called Danilewski’s Method used to be recommended as a fast way to compute C and
the diagonal sum C–1BC , but the method is unreliable for any but a small-dimensioned matrix B whenever it is
nearly derogatory; even when Y1 is not bad, subsequent companion matrices Yj+1 need not be companions of



Prof. W. Kahan                                                                                          Page 17
Math. H110                                Jordan’s Normal Form                     December 7, 2000 10:38 am


divisors of minimum polynomials of previous Yj s . Hardly any current texts describe the method, so a succinct
description is presented here. The method consists of a finite sequence of elementary rational Similarities; each is
an elementary column operation followed by its inverse operation applied to rows. For j = 1, 2, 3, … in turn, …
         In row j find the after-diagonal element of largest magnitude, and swap that element’s column with
                   column j+1 ; then swap the correspondingly numbered rows.
         Divide column j+1 by its element in row j and then multiply row j+1 by that number, unless it is zero
                   in which case stop before the division. Otherwise the element in column j+1 of row j is now 1 .
         Subtract multiples of column j+1 from all other columns to annihilate their elements in row j ; then add
                   the same multiples of all other rows to row j+1 .
The process stops either because the Similarities have reduced B to the form of a companion matrix Y , or

because they have reduced B to the form Y O       in which Y is a companion matrix but B is probably not, and
                                            R B
R is probably nonzero. Skipping the last row of [Y, O] and resuming the process from the first row of [R, B]
may ( or may not ) reduce B to companion form, but will probably not annihilate R . Lemma £ et seq. offers a
way in principle to get rid of R when Y and B turn out to have disjoint spectra; otherwise deem Danilewski’s
method to have failed because of an unlucky initial ordering of the rows and columns of B . The method may
succeed if restarted after a shuffle of B ’s rows and corresponding columns has put a different diagonal element
into the upper left-hand corner of B . Even if the method would succeed in exact rational arithmetic, it may give
poor results in rounded arithmetic if the multiples of column j+1 subtracted from previous columns have to be
huge multiples that amplify the effect of roundoff. Nobody knows a good way to compute the rational canonical
form in rounded arithmetic without computing Jordan’s Canonical Form first.



The Souriau/Frame/Faddeev Method
For a square matrix B of integers, can its characteristic polynomial
                                  ƒ(λ) := det(λI–B) = ∑0≤j≤n ƒjλj
be computed from the elements of B by exclusively integer arithmetic? If by “ exclusively
integer arithmetic ” is meant only adds, subtracts and multiplies, no divides, then all known
methods require work that grows exponentially with the dimension of B . But if “ exclusively
integer arithmetic ” lets divisions by small integers produce integer quotients, there is a faster
method traceable to U.J.J. Leverrier in the mid-nineteenth century and subsequently improved
independently by J.M. Souriau, J.S. Frame and D.K. Faddeev a century later. This method
requires a number of arithmetic operations of order dimension(B)4. Approximate methods that
compute eigenvalues first go faster for strictly numerical matrices of large dimensions, taking a
number of arithmetic operations of order dimension(B)3. Therefore this method serves mainly
for computations ( perhaps symbolic ) that cannot tolerate roundoff. Here is how it goes:

Set B1 := B and for j = 1, 2, 3, …, n := dimension(B) in turn compute
                     ƒj := Trace(Bj)/j ;           … the division goes evenly.
                     Bj+1 := B·(Bj – ƒjI) .        ( Don’t bother to compute Bn+1 .)
As a check on the computation, expect Bn = ƒnI . This vanishes when B is singular, and then
its Adjugate Adj(B) = (–1)n–1 (Bn–1 – ƒn–1I) .

Exercise: Validate the foregoing claims by observing the form Bj takes if B is the companion matrix of ƒ(…) .

I doubt the existence of an analogous method to compute B ’s minimum polynomial.



Prof. W. Kahan                                                                                       Page 18

						
Related docs
Other docs by راميالاحمد
jordan
Views: 35  |  Downloads: 0