Quadratic Functions in Linear System Analysis

Document Sample
Quadratic Functions in Linear System Analysis Powered By Docstoc
					Chapter 3

Quadratic Functions in
Linear System Analysis

The modern algorithms for analysis and design of LTI systems are best explained in terms
of quadratic forms, serving as storage functions and supply rates for the underlying state
space models. Though alternative interpretations of the basic theory are possible, the
approach leads to the simplest derivations of the main results, and is most helpful for
a researcher seeking to derive a generalization of the classical techniques to a variety of
time-varying, nonlinear, or large scale applications.
    This chapter presents the fundamental tools of the “quadratic” theory of LTI systems:
the classical Lyapunov theorem, the so-called Kalman-Yakubovich-Popov (KYP) lemma
(sometimes referred to as the (generalized) positive real lemma), and the generalized
Parrott’s theorem, as well as the framework of semidefinite programming, or linear matrix
inequality (LMI) optimization.


3.1     Storage Functions and Supply Rates
The notions of storage functions and supply rates apply to very general classes of systems,
generalizing the definitions of energy from physics models and Lyapunov functions from
autonomous systems.


3.1.1    General Dissipativity
Let S be a discrete time system with input w = w(t) ∈ IRm and output x = x(t) ∈ IRn .
Let V : I n → IR and σ : IRn × IRm → IR be two functions. The system S is said to be
        R

                                            61
62

dissipative with storage function V and supply rate σ when the inequality

                            σ(x(t), w(t)) ≥ V (x(t + 1)) − V (x(t))                       (3.1.1)

is satisfied for every input/output pair (x(·), w(·)) and for every time instance t. The
definition in the continuous time case is similar: instead of (3.1.1), it requires the integral
inequality
                             t1
                                  σ(x(t), w(t)) ≥ V (x(t1 )) − V (x(t0 ))                 (3.1.2)
                            t0

to be satisfied for all input/output pairs (x(·), w(·)) and time instances t1 > t0 .

Example 3.1.1 The system describing a passive two-port electric circuit (possibly a nonlinear
one) is dissipative with supply rate σ = vi and storage function V = E, where E is the total
circuit energy, v is the voltage between the ports, and i is the current flowing through the ports.

    Naturally, dissipativity with non-negative supply rates is not interesting: every σ ≥ 0
satisfies (3.1.1) (or (3.1.2) in the discrete time case) with V (x) ≡ 0.
    In most applications, dissipativity is applied with x being the state of the system. For
example, if S is defined by the differential equation

                                       ˙
                                       x(t) = F (x(t), w(t)),                             (3.1.3)

where F : I n × IRm → IRn is a given continuous function (for example, it could be that
            R
F (x, w) = Ax + Bw when (3.1.1) is an LTI model), V : IRn → IR is differentiable, and
σ : IRn × IRm → IR is continuous, the resulting system is dissipative with storage function
V and supply rate σ if and only if the inequality

                                                ˙
                                      σ(x, w) ≥ V (x)F (x, w)                             (3.1.4)

holds for all x, w. Similarly, a discrete time system

                                      x(t + 1) = F (x(t), w(t))                           (3.1.5)

is dissipative with storage function V and supply rate σ if and only

                                  σ(x, w) ≥ V (F (x, w)) − V (x)                          (3.1.6)

for all x, w.
                                                                                              63

Example 3.1.2 The nonlinear system
                                               x = −x + w3 ,
                                               ˙

is dissipative with storage function V (x) = 0.75|x|4/3 and supply rate σ(x, w) = w4 −xw, because
the inequality
                                   w(−x + w3 ) ≥ x1/3 (−x + w3 )
                     R.
holds for all x, w ∈ I

    A very important special case of dissipativity is represented by the notion of L2 gain:
for a system with input w = w(t), output e = e(t), and state x, its L2 gain is defined as
the maximal lower bound of those γ > 0 for which the system is dissipative with supply
rate
                                    σ = γ 2 |w|2 − |e|2
and some non-negative storage function V = V (x). The definition of the L2 gain is a
special case of an important class of dissipativity conditions defined in terms of quadratic
supply rates. Such conditions (sometimes referred to as Integral Quadratic Constraints,
or IQC) facilitate analysis of robustness of uncertain and nonlinear systems, by allowing
to reduce complex signal relations to a set of IQC, allowing the user to model everything
with linear equations and quadratic inequalities.

3.1.2     Linear-Quadratric Dissipativity
Extensive use of quadratic supply rates and quadratic storage functions for systems de-
scribed by linear equations is typical for modern robust control. An application of the
technique usually involves a linear state space model

                                         ˙
                                         x(t) = Ax(t) + Bw(t)                            (3.1.7)

(or, equivalently,
                                     x(t + 1) = Ax(t) + Bw(t)                            (3.1.8)
in the discrete time case), defined by real matrices A, B (of dimensions n-by-n and n-by-m
respectively), a quadratic form

                              V = V (x) = x′ P x               (P = P ′)                 (3.1.9)

defined by a symmetric n-by-n real matrix P , and a quadratic supply rate
                                 ′
                             x             x
          σ = σ(x, w) =              Σ           = x′ Σ11 x + 2x′ Σ12 w + w ′ Σ22 w,    (3.1.10)
                             w             w
64

defined by a symmetric (n + m)-by-(n + m) matrix Σ.
    It is easy to check that system (3.1.7) is dissipative with supply rate (3.1.10) and
storage function (3.1.9) if and only if

                              σ(x, w) − 2x′ P (Ax + Bw) ≥ 0,                       (3.1.11)

where the “≥ 0” means positive semidefiniteness of the quadratic form on the left side
of the inequality. Similarly, system (3.1.7) is dissipative with supply rate (3.1.10) and
storage function (3.1.9) if and only if

                     σ(x, w) − (Ax + Bw)′ P (Ax + Bw) + x′ P x ≥ 0.                (3.1.12)

The quadratic form inequalities can also be written as matrix inequalities: (3.1.11) is
equivalent to
                         I                   A′
                   Σ−         P A B −              P I 0 ≥ 0,                  (3.1.13)
                         0                   B′
while (3.1.12) is equivalent to

                            A′                         I
                     Σ−           P   A B         −        P     I 0   ≥ 0.        (3.1.14)
                            B′                         0

Note that the inequalities (3.1.13) and (3.1.14) are linear with respect to Σ and P : this
will be used later in the application of semidefinite programming to robustness analysis
and optimization.
    Since positive definiteness of quadratic forms is relatively easy to verify numerically,
quadratic dissipativity is frequently used to certify certain properties of linear systems.


3.1.3     Example: Lyapunov Equations and Inequalities
Let A be a Hurwitz matrix (i.e. all eigenvalues of A have negative real part), which means
that every solution x = x(t) of the differential equation

                                       ˙
                                       x(t) = Ax(t)                                (3.1.15)

converges to zero as t → +∞. Assume that our desire is to calculate, or to estimate, the
integral
                                            ∞
                                  J=            x(t)′ Rx(t)dt,
                                        0
                                                                                          65

given a symmetric matrix R = R′ and the initial value x0 = x(0). In particular, when
x0 = B and R = C ′ C, the value of J equals the square of the H2 norm of the transfer
matrix
                               G(s) = C(sI − A)−1 B.
   Trying to use a “brute force” to solve the problem, in the fashion of finding an explicit
expression for x(t) as a function of t, and integrating the resulting expression, is unlikely
to work well in all but the most simple examples. After all, to calculate x(t) analytically
requires one to find all eigenvalues of A, and for dimensions higher than 4, there is no
explicit formula for those. Translating the problem into the frequency domain by using
Fourier (or Laplace) transforms and the Parceval formulae, also leads to rather complicate
looking integrals. In contrast, the idea of quadratic storage functions produces a real
breakthrough for the task.
   Consider (3.1.15) as a special case of (3.1.7) (with B = 0), and define the quadratic
supply rate σ(x, w) = −x′ Rx. A quadratic function V (x) = x′ P x is a valid storage
function for system (3.1.15) with supply rate σ(x, w) = −x′ Rx if and only if

                                        P A + A′ P + R ≤ 0.                         (3.1.16)

Every solution P = P ′ of (3.1.16) (as a rule, (3.1.16) is referred to as the Lyapunov
inequality) provides an upper bound for J: indeed, since

                                      dV (x(t))
                                                ≤ −x(t)′ Rx(t),
                                         dt
we have                      ∞
                                 dV (x(t))
                J ≤−                       dt = V (x(0)) − V (x(+∞)) = x′0 P x0 .
                         0          dt
Moreover, replacing (3.1.16) with the Lyapunov equation

                                        P A + A′ P + R = 0                          (3.1.17)

will result in the exact expression J = x′0 P x0 . Since (3.1.17) is a linear equation, it is
usually rather straightforward to solve, either analytically or numerically. In particular,
we can see now that, as long as (3.1.17) is feasible and A has rational coefficients, the
value of J witll be a rational one as well.
    The framework presented here demonstrates a typical application of quadratic dissipa-
tivity in the linear systems theory. However, it leaves one important theoretical question
unanswered: how do we know that the equation (3.1.17) is feasible? After all, if (3.1.17)
has no solution P = P ′ , the presented approach will fail!
66

   It turns out that feasibility of equation (3.1.17) is guaranteed by a classical theorem
by Lyapunov. As the theorem is a special case of the much more powerful KYP Lemma,
to be presented in the next section, which serves a similar role in the case of general
quadratic dissipativity, the issue is not discussed in this subsection.


3.2     The KYP Lemma
The so-called Kalman-Yakubovich-Popov (KYP) Lemma, also known as the Positive Real
Lemma, and in many equivalent forms, is an eminently important theoretical statement of
modern control theory, providing insight into the connection between frequency domain,
time domain, and quadratic dissipativity properties of LTI systems. The KYP Lemma
is used in justifying a large number of analysis and optimization algorithms employed in
robust control, feedback optimization, and model reduction.

3.2.1    The statement of the KYP Lemma
In this subsection, a special version of the KYP Lemma, covering what can be referred to
as the non-singular continuous time case.

Theorem 3.2.1 Let A, B be a pair of real matrices of dimensions n-by-n and n-by-m
respectively, associated, naturally, with the LTI state space model

                                    ˙
                                    x(t) = Ax(t) + Bw(t).                          (3.2.18)

Assume that the pair (A, B) is (continuous time) stabilizable, in the sense that A + BK0
is a Hurwitz matrix for some real matrix K0 . Let
                                ′
                            x           x
                σ(x, w) =           Σ       = x′ Σ11 x + 2x′ Σ12 w + w ′Σ22 w,     (3.2.19)
                            w           w

be a Hermitian form σ : C n × C m → IR, defined by a real symmetric matrix Σ = Σ′ .
Then the following statements (a)-(e) are equivalent:

 (a) there exists ǫ > 0 such that system (3.2.18) is dissipative with supply rate σ(x, w) −
     ǫ|x|2 − ǫ|w|2;

 (b) there exists a symmetric matrix P = P ′ such that

                                σ(x, w) − 2x′ P (Ax + Bw) ≫ 0;                     (3.2.20)
                                                                                        67

 (c) σ(0, w) ≫ 0, and σ is strictly positive definite on the subspace

                      UA,B (ω) = {(x, w) ∈ C n × C m : jωx = Ax + Bw}              (3.2.21)

     for every ω ∈ IR;

 (d) σ(0, w) ≫ 0, and the Hamiltonian system
                                         ′                 ′
                         dx    dH              dψ     dH           dH
                            =−               ,    =            ,      = 0,         (3.2.22)
                         dt    dψ              dt     dx           dw

     where
                             H(ψ, x, w) = σ(x, w) − ψ ′ (Ax + Bw),                 (3.2.23)
     has no eigenvalues on the imaginary axis;
                                                      ′
 (e) σ(0, w) ≫ 0, and there exist real matrices P0 = P0 and K such that A + BK is a
     Hurwitz matrix, and

                          σ(x, w) − 2x′ P0 (Ax + Bw) = σ(0, w − Kx).               (3.2.24)

Moreover, when statements (a)-(e) are true, the matrix P0 in (e) is such that P0 ≤ P for
every P = P ′ satisfying
                           σ(x, w) − 2x′ P (Ax + Bw) ≥ 0,                        (3.2.25)
                                           −1
and can be computed in the form P0 = −Z1 Z2 , where the columns of Z = [Z1 ; Z2 ] form
a basis in the stable invariant subspace of the matrix of the Hamiltonian system from
(3.2.22).

    In Theorem 3.2.1, statement (a) means strict dissipativity of system (3.2.18) with the
quadratic supply rate σ, but with an arbitrary storage function V = V (x). Statement
(b) is the same dissipativity as in (a), but with a quadratic storage function. Statement
(c) links the dissipativity issue to the frequency domain calculation (the equality between
the H-Infinity norm and L2 gain of stable LTI systems is a special case of this relation).
    Statement (d) is related to an efficient way of verifying quadratic dissipativity of LTI
systems. The Hamiltonian equations are written in the form (3.2.22) as a reference to the
classical mechanics: they actually result in the autonomous linear system of differential
equations

       x(t)
       ˙              x(t)              A − BΣ−1 Σ21
                                                 22         BΣ−1 B ′
                                                               22
       ˙      =H             ,   H=                                          .     (3.2.26)
       ψ(t)           ψ(t)             Σ11 − Σ12 Σ−1 Σ21 A′ − Σ12 Σ−1 B ′
                                                   22              22
68

The “eigenvalues of (3.2.23)” should be understood as the eigenvalues of H. Similarly,
the “stable invariant subspace” of (3.2.23) refers to the set of 2n-dimensional vectors z0
such that the solution of the linear ODE dz/dt = Hz converges to zero as t → ∞ when
z(0) = z0 .
    Statement (e), used in the design of optimal controllers, relates to “accuracy” of the
dissipation-based analysis: P0 represents the “tightest possible” solution of the dissipation
inequality. There are obvious parallels with the relation between Lyapunov equalities and
inequalities here. When w has the meaning of the control vector, the resulting matrix K
hints at using the full state feedback control w(t) = Kx(t).

3.2.2     KYP Lemma and L2 Gain
An instructive example of application of the KYP Lemma is the following statement about
L2 gains of stable LTI models.

Theorem 3.2.2 Assume that the coefficient A of the state space model
                        ˙
                        x(t) = Ax(t) + Bw(t), e(t) = Cx(t) + Dw(t)                  (3.2.27)
is a Hurwitz matrix. Then the following conditions are equivalent:
 (a) the L2 gain of system (3.2.27) is less than 1, i.e. there exists γ < 1 such that
                                           T
                               inf             {γ|w(t)|2 − |e(t)|2 }dt > −∞         (3.2.28)
                               T >0    0

      for every solution of (3.2.27);
 (b) there exists a real matrix P = P ′ ≥ 0 such that
                           |w|2 − |Cx + Dw|2 − 2x′ P (Ax + Bw) ≫ 0;                 (3.2.29)

 (c) the H-Infinity norm G       ∞     of the transfer matrix
                                      G(s) = D + C(sI − A)−1 B                      (3.2.30)
      is less than 1.

    The equivalence between (a) and (c) is the familiar association between L2 gains and
H-Infinity norms for LTI systems. The equivalence of (a) and (c) to (b) provides a useful
link to semidefinite programming, and also serves as a key observation in the derivation
of H-Infinity optimization algorithms.
                                                                                               69

(a)⇒(c)
Assume to the contrary that (c) is not valid. This means that for every γ < 1 there exists
ω ∈ IR and w0 ∈ C m such that |w0| = 1 and |G(jω)w0| ≥ γ. Equivalently, this means
that |Cx0 + Dw0 | ≥ γ, where

                                       x0 = (jωI − A)−1 Bw0

satisfies the equation
                                         jωx0 = Ax0 + Bw0 .                               (3.2.31)
Then
                                   xc (t) = ejωt x0 , w0 (t) = ejωt w0
are complex solutions of the differential equation in (3.2.27). Hence

                               x(t) = Re{xc (t)},     w(t) = Re{wc (t)}

are real solutions of the differential equation in (3.2.27). On the other hand, if T > 0 is
such that ωT = 2πk for some integer k ≥ 0 then
                 T
                                                           T
                     {γ|w(t)|2 − |cx(t) + Dw(t)|2 }dt =      {γ|w0 |2 − |Cx0 + Dw0|2 },
             0                                             2

which converges to −∞ as T → +∞, certifying that (a) is not valid.
   Hence (a) implies (c).

(c)⇒(b)
Since A is a Huirwitz matrix, the pair (A, B) is stabilizable, hence the KYP Lemma
applies to A, B, σ, where
                           σ(x, w) = |w|2 − |Cx + Dw|2 .                     (3.2.32)
   By assumption, there exist δ1 > 0 such that

                                         σ(x0 , w0 ) ≥ δ1 |w0 |2                          (3.2.33)

whenever x0 , w0 satisfy (3.2.31) for some ω ∈ IR ∪ {∞}. Also, since A has no eigenvalues
on the imaginary axis, there exists δ2 > 0 such that

                                            |w0 |2 ≥ δ2 |x0 |2                            (3.2.34)
70

whenever x0 , w0 satisfy (3.2.31) for some ω ∈ IR ∪ {∞}. Combined together, (3.2.31) and
(3.2.33) certify that condition (c) of the KYP Lemma is satisfied. Therefore, condition (b)
of the KYP Lemma is true as well, which means existence of a matrix P = P ′ satifying the
inequality in (3.2.29). It remains to show that the matrix P must be positive semidefinite.
Indeed, substituting w = 0 into (3.2.29) yields the inequality x′ P Ax ≤ 0, which means
that the quadratic form V (x) = x′ P x is monotonically non-increasing along the solutions
x = x(t) of the autonomous system

                                             ˙
                                             x(t) = Ax(t).                                    (3.2.35)

Since A is a Hurwitz matrix, every solution of (3.2.35) converges to zero as t → ∞. Hence

                                     V (x(0)) ≥ lim V (x(t)) = 0
                                                 t→+∞

for every x(0) ∈ I n , i.e. P ≥ 0.
                 R

(b)⇒(a)
Note that (3.2.29) implies existence of γ < 1 such that

                             γ|w|2 − |Cx + Dw|2 − 2x′ P (Ax + Bw) ≥ 0.

Integrating the inequality along the solutions of (3.2.27) yields
               T
                   {γ|w(t)|2 − |e(t)|2 }dt ≥ x(T )′ P x(T ) − x(0)′ P x(0) ≥ −x(0)′ P x(0),
           0

which implies (3.2.28).

Examples
The CT transfer function G(s) = (s + 1)−1 has H-Infinity notm of 1. For the state space
model
                          ˙
                          x(t) = −x(t) + w(t), e(t) = x(t),
a non-negative quadratic storage function V (x) = px2 , p ≥ 0 certifying the L2 gain bound
via the inequlity
                              γ|w|2 − |x|2 ≥ 2xp(−x + w)
exists if and only if γ ≥ 1 (one can take p = 1). Note that the KYP Lemma, as stated in
this section, does not guarantee the existence of V for γ = 1, only for γ > 1.
                                                                                        71

   The KYP Lemma, as stated in this section, means, in particular that a quadratic
form P = P ′ (not necessarily a positive semidefinite one) satisfying (3.2.29) exists if and
only if the L-Infinity norm of G is less than 1. For example, the CT transfer function
G(s) = (s − 1)−1 has L-Infinity notm of 1, though its H-Infinity norm equals +∞. For
the state space model
                             ˙
                             x(t) = x(t) + w(t), e(t) = x(t),
a quadratic storage function V (x) = px2 satisfying the dissipation inequality

                                   γ|w|2 − |x|2 ≥ 2xp(x + w)

exists if and only if γ ≥ 1 (one can take p = −1).
    Finally, consider the state space model

                             dotx(t) = −x(t), e(t) = x(t) + w(t),

with transfer function G(s) ≡ 1. While the H-Infinity norm equals 1, a quadratic storage
function V (x) = px2 certifying the L2 gain bound via the inequlity

                                  γ|w|2 − |x + w|2 ≥ 2xp(−x)

exists for γ > 1, but does not exists for the “boundary value” γ = 1 (the solutions p = pγ
which are available for γ > 1 will converge to +∞ as γ → 1).

3.2.3    KYP Lemma and Quadratic Program Control
A rather abstract optimal control problem, with significant applications in linear feedback
optimization, can be formulated as the task of finding, given matrices A, B and the initial
value x0 , a control signal u = u(t) which drives the state x = x(t) of the model

                               ˙
                               x(t) = Ax(t) + Bu(t),        x(0) = x0 ,            (3.2.36)

to zero as t → +∞, while minimizing the quadratic cost
                                      ∞
                             J=           {|Cx(t)|2 + |u(t)|2 }dt → min .          (3.2.37)
                                  0                                 u(·)

   Consider, to begin with, a simple example of problem (3.2.36),(3.2.37), where x = x(t)
and u = u(t) are scalar signals, A = 0, and B = C = 1:
             ∞
                 {x(t)2 + u(t)2 }dt → min subject to x(t) = u(t), x(0) = 1.
                                                     ˙                             (3.2.38)
         0
72

The optimal control in (3.2.38) can be found using quadratic storage functions and a
completion of squares argument. Leaving some preliminary thought aside (for now), let
V (x) = −x2 . Then, subject to the differential equation in (3.2.38),

                                 dV (x)
                     x2 + u2 −          = x2 + u2 + 2xu = (x + u)2 ,
                                   dt
which, after integration from 0 to +∞, yields
                                            ∞
                              J =1+             |x(t) + u(t)|2 dt,                (3.2.39)
                                        0

as long as x(t) → 0 as t → ∞. While (3.2.39) clearly certifies that J ≥ 1 for every control
action, it also suggests that the “feedback rule” u(t) = −x(t) can be used to determine a
control action for which J = 1, which would be optimal: the only thing to check is that
the relation u(t) = −x(t) is compatible with x(t) → 0 as t → ∞. This yields u(t) = −e−t ,
x(t) = e−t as the optimal solution in (3.2.38). Note that taking V (x) = x2 instead of
V (x) = −x2 would yield
                                                ∞
                             J = −1 +               |x(t) − u(t)|2 dt
                                            0

which, while providing a (rather weak!) lower bound for J, cannot be used for finding
the optimal u(·), as the relation u(t) = x(t), together with the differential equation in
(3.2.38), results in a solution x(t) = et which does not converge to zero.
    For the general case of problem (3.2.36),(3.2.37), the approach described above calls
for finding a symmetric matrix P = P ′ such that

                      |Cx|2 + |u|2 − 2x′ P (Ax + Bu) = |u − Kx|2 ,                (3.2.40)

and A + BK is a Hurwitz matrix. Then the identity
                                                    ∞
                         J = −x′0 P x0 +                |u(t) − Kx(t)|2 dt,
                                                0

which is valid whenever x(t) → 0 as t → ∞, certifies that −x′0 P x0 is a lower bound of J,
achieved by the control satisfying the relation u(t) = Kx(t). While the approach would
work whenever a matrix P = P ′ with the required properties exists, the key theoretical
question is whether such P can always be found.
   It turns out that the KYP Lemma can answer the question completely, in the form of
the following statement.
                                                                                              73

Theorem 3.2.3 Let A, B, C be real matrices of dimensions n-by-n, n-by-m, and k-by-n
respectively. The following conditions are equivalent:
 (a) there exist matrices P = P ′ ≤ 0 and K such that identity (3.2.40) holds, and
     A + BK is a Hurwitz matrix;
                                                                                R
 (b) the pair (A, B) is stabilizable, and Cv = 0 whenever Av = jωv for some ω ∈ I and
     v = 0.
Moreover, if condition (a) is satisfied then the matrices P = P ′ and K are uniquely defined
by condition (a).

Proof. Assume that condition (a) is true. Then, since A + BK is a Hurwitz matrix, the
pair (A, B) is stabilizable. In addition, since the quadratic form identity (3.2.40) has all real
coefficients, it can be extended to a Hermitian form identity
                       |Cx|2 + |u|2 − 2Re{x′ P (Ax + Bu)} = |u − Kx|2 ,                  (3.2.41)
valid for all x ∈ C n , u ∈ C m . When Av = jωv, substituting u = 0, x = v into (3.2.41) would
yield |Kv|2 = 0, and hence (A + BK)v = jωv, which contradicts the assumption that A + BK
is a Hurwitz matrix.
    Now assume that (b) is true. A careful examination of (a) shows that it is equivalent to
the item (e) from the KYP Lemma, which will be true if the correspoding condition (c) is true.
Since, according to (b), the only solution of equations
                              jωx = Ax + Bw, Cx = 0, w = 0
is x = 0, w = 0, condition (c) of the KYP Lemma is satisfied.


3.3      Semidefinite Programming
This section introduces a new tool, semidefinite programming, also called linear matrix
inequality (LMI) optimization, which, among many possible applications, can be used to
optimize weights (multipliers, or D-scales) associated with standard H-Infinity feedback
design setup.

3.3.1     A Generic LMI Setup
A set of real symmetric k-by-k matrices H0 , H1 , . . . , Hm , and a 1-by-m row vector C ∈
IR1,m define an instance of a semidefinite program as the task
                             Cx → min      subject to H(x) ≥ 0                          (3.3.42)
74

of finding a real vector x ∈ IRm which minimizes Cx subject to the constraint H(x) ≥ 0,
where                                                      
                                       m                x1
                                          Hk xk , x =  .  ,
                              def                      . 
                         H(x) = H0 +                     .                    (3.3.43)
                                      k=1               xm

and the inequality α ≥ 0 for a k-by-k Hermitian matrix α = α′ means that v ′ αv ≥ 0
for every vector v ∈ C k . Since the matrix-valued function x → H(x) in (3.3.43) is linear
with respect to x (a more appropriate term would be affine), the inequality H(x) ≥ 0 is
sometimes called linear matrix inequality (LMI), and the optimization task can also be
referred to as LMI optimization.
    Semidefinite programming has a special place in optimization: it is more general
(and hence more flexible) than linear programming (which is a special case of LMI op-
timization), but still can be solved relatively efficiently (under mild assumptions, by a
polynomial time algorithm). Due to persistent presence of matrix positivity constraints
in system analysis (Lyapunov inequalities, quadratic storage functions, and other ap-
plications) semidefinite programming is well-suited for robustness analysis and feedback
optimization.
    Several MATLAB toolboxes offer LMI optimization tools. The native Robust Control
Toolbox offers an LMI optimization engine. The SeDuMi package, also well tested, can be
downloaded for free on the Web. The languages used to define a system of LMI in these
packages are not very user-friendly, so I advise one to use an interface to it, which, as part
of package IQCbeta, is freely available from the class Web site, and is already installed on
Athena. I believe (could be biased, since supervised its creation) that IQCbeta helps to
cut significantly the coding effort when solving semidefinite programs. To use IQCbeta
on Athena, put the content of

http://web.mit.edu/6.245/www/images/startup6245.m

into your startup.m file (should be in your ~
                                           /matlab/ directory). To install IQCbeta on
a personal computer, unpack iqcb030.tar.gz and set MATLAB search path to include
iqc beta and its subdirectories.

Example 3.3.1 According to a classical theorem, n-by-n real matrix A is a Hurwitz matrix
if P A + A′ P < 0 for some real symmetric matrix P = P ′ ≥ 0. If P is normalized by the extra
condition P ≤ I, the degree of stability of A can be characterized as the maximum of r such
that P A + A′ P + rI ≤ 0. The task of maximizing r subject to the constraints is a special case
of semidefinite programming, in which the components of decision vector x are the entries of P
                                                                                                75

and the scalar r (i.e. n(n + 1)/2 scalar variables to define P and one more to define r); matrix
H(x) is defined in such a way that

                                      −P A − A′ P − rI
                                                                  
                                                            0   0
                           H(x) =            0          I − P 0 ,
                                              0             0   P

which ensures that single inequality H(x) ≥ 0 is equivalent to three inequalities

                              P A + A′ P + rI ≤ 0, P ≤ I, P ≥ 0;

and C is defined in such a way that Cx = −r.
    It is quite clear that setting the LMI optimization in terms of matrices H0 , H1 , . . . , Hm ,
where m = n(n + 1)/2 + 1, would be time consuming. Also, the non-zero entries of Hi will be
the same coefficients of A repeated over and over again. To simplify the setup, the possibility
of using matrix decision parameters (for example, to work with P as a single variable), and
defining multiple matrix inequalities (to treat P A + A′ P + rI ≤ 0, P ≤ I, and P ≥ 0 separately)
is usually provided within the semidefinite programming packages.
    In IQCbeta, the semidefinite program can be defined and solved by the following code.

function [p,r]=lmiexlyap(A)
n=size(A,1);         % problem dimension
abst_init_lmi;       % initialize LMI environment of IQCbeta
p=symmetric(n);      % p is n-by-n symmetric matrix decision variable
r=symmetric;         % r is a scalar decision variable
p>0;                 % define the matrix inequalities
p<eye(n);
p*A+A’*p+r*eye(n)<0;
lmitbx_options([0 0 0 0 1]); % turn off diagnostics/trace
lmi_mincx_tbx(-r); % call the SDP optimization engine
p=value(p);          % get numerical value of optimal p
r=value(r);          % get numerical value of optimal r

    While every computational procedure utilizing fixed precision arithmetics must be used
with caution, semidefinite programming appears to be especially vulnerable to numerical
errors. Here is some short list of reasons to be on guard while using LMI optimization.

 (a) Currently used algorithms of LMI optimization begin by assuming that the optimal
     solution x0 of (3.3.42), if exists, satisfies a known length bound |x0 | ≤ R. The
     number of operations required by the algorithms grows polynomally with log(R).
76

 (b) The minimal length |x| of a vector satisfying H(x) ≥ 0 can have a double exponential
     growth with respect to k and m. For example1 , the system of LMI’s

                                              xi+1 xi
                                 x1 ≥ 4,                ≥ 0 (i = 1, 2, ..., m − 1)
                                               xi  1

           with respect to scalar decision variables x1 , . . . , xm is equivalent to

                                     x1 ≥ 4, xi+1 ≥ x2 (i = 1, 2, ..., m − 1),
                                                     i

                                                           i
           and hence its minimal solution if xi = 22 . Therefore one cannot expect that a
           solution of a semidefinite program will be small when the coefficients of matrices Hi
           are small.

 (c) Currently used algorithms of LMI optimization rely on the assumption that the set
     of all x satisfying condition H(x) ≥ 0 contains a ball of radius ǫ > 0. In fact,
     the optimization actually looks at those decision vectors x which satisfy the strict
     ineqiality H(x) > 0. The number of operations required by the algorithms grows
     polynomally with log(ǫ).

 (d) It is possible for an optimal solution of (3.3.42) not to exist even when the minimized
     quantity Cx has a finite lower bound subject to the feasibility constraint H(x) ≥ 0.
     For example, when
                                            x1 1
                                 H(x) =               , Cx = x1 ,
                                             1 x2
           Cx has a maximal lower bound 0 which is not achievable.

3.3.2           D-Scales and Robust L2 Gain Bounds
Recall that a version of KYP Lemma claims that system SCT [A, B, C, D] is stable and
has H-Infinity norm strictly less than γ > 0 if and only if there exists a positive definite
quadratic form V (x) = x′ P x such that quadratic form

                                  γ 2 |w| − |Cx + Dw|2 − 2x′ P (Ax + Bw)                (3.3.44)

is positive definite as well, i.e. if and only if the system of strict matrix inequalities

                 P > 0,          ′         ′         ′           ′
                            γ 2 E0I E0I − ECD ECD − EI0 P EAB − EAB P EI0 > 0           (3.3.45)
     1
         I have learned it from Prof. Pablo Parrilo
                                                                                                                    77

has a solution P = P ′ ,
                        EI0 =       I 0 , E0I =                    0 I      , ECD =        C D     ,
so that the entries in (3.3.45) are the matrices of quadratic forms
                  ′                                                   ′
          x             ′            x                 2         x          ′              x
                      {E0I E0I }           = |w| ,                        {ECD ECD }           = |Cx + Dw|2,
          w                          w                           w                         w
                             ′
                        x          ′           ′                           x
                                 {EI0 P EAB + EAB P EI0 }                      = 2x′ P (Ax + Bw).
                        w                                                  w
   Consider now the situation when, for a system of linear state space equations
                                                   ˙
                                                   x(t) = Ax(t) + Bw(t)                                        (3.3.46)
with input w has 2N + 2 outputs wi , ei (i ∈ {0, 1, . . . , N}) defined by
                                                           x                    x
                                         wi = Li                  , ei = Fi            ,
                                                           w                    w
and, for all i = 0, signals wi and ei satisfy the unit L2 gain bound condition
                                               T
                                    inf            {|ei (t)|2 − |wi (t)|2 }dt > −∞.                            (3.3.47)
                                    T >0   0

One possible interpretation of this setup, in which wi are different components of w, each
wi with i = 0 is the output of system ∆i (possibly nonlinear or time-varying) with input
ei , and L2 gain of ∆i is smaller than 1, is shown on Figure 3.3.1. Consider the task of
establishing, under the assumptions made, an upper bound for the closed loop L2 gain
from w0 to e0 .
     By the definition of L2 gain, if r ≥ 0 is such that
                                               T
                                    inf            {r|w0 (t)|2 − |e0 (t)|2 }dt > −∞                            (3.3.48)
                                    T >0   0

for all signals e, w satisfying the equations of system G and conditions (3.3.47) for i = 0,
                        √
the square root γ = r is an upper bound for the closed loop L2 gain from w0 to e0 .
It is easy to see that (3.3.47) (subject to (3.3.46)) implies (3.3.48) (subject to (3.3.46))
whenever there exist non-negative constants di ≥ 0 such that
                  T                                        N
        inf             r|w0(t)|2 − |e0 (t)|2 +                  di [|wi (t)|2 − |ei (t)|2 ] dt > −∞,          (3.3.49)
       T >0   0                                            i=1
78

                                                                    
                                                               ∆    
                                                                    
                                           w1 . . . wN                     e1 . . . eN
                                                          -
                                                          -
                                                          -
                                                               G

                                                     w0   -           -   e0



                                     Figure 3.3.1: Robust L2 gain Bound

also subject to (3.3.46). Since for P = P ′ > 0 and (3.3.46) imply
             T
                 {2x(t)′ P (Ax(t) + Bw(t)) = x(T )′ P x(T ) − x(0)′ P x(0) ≥ −x(0)′ P x(0),
         0

existence of P = P ′ > 0 such that quadratic form
                                                 N
                            2              2
                    r|L0 z| − |F0 z| +                di {|Li z|2 − |Fi z|2 } − 2x′ P (Ax + Bw)
                                                i=1

(with respect to z = [x; w]) is positive definite implies (3.3.49). Equivalently, if r, di , and
P = P ′ satisfy the inequalities di ≥ 0, P ≥ 0, and
                                    N
        rL′0 L0     −    ′
                        F0 F0   +                                  ′           ′
                                          di {L′i Li − Fi′ Fi } − EI0 P EAB − EAB P EI0 > 0,      (3.3.50)
                                    i=1
          √
then γ = r is an upper bound for the closed loop L2 gain from w0 to e0 . Thus, solving the
semidefinite program with decision parameters r, di and P = P ′ which seeks to minimize
r subject to LMI’s P > 0, di > 0, and (3.3.50), yields an upper bound for the closed loop
L2 gain from w0 to e0 . The coefficients di are frequently called multipliers or D-scales, and
the whole approach to bounding L2 gain is sometimes referred to as D-scaling analysis of
robustness.

Example 3.3.2 Consider the task of finding an upper bound of closed loop L2 gain from f
to q in the unity feedback system shown on Figure 3.3.2, where G = G(s) is a given SISO LTI
                                                                                              79

                            f      +                                            q
                                  -f-       M            - G(s)             -
                                   6−




                Figure 3.3.2: Uncertain Gain Feedback for Example 3.3.2

system defined by a state space model SCT [A, B, C, D], and M is an uncertain constant gain
ranging in the given interval [a, b].
    To apply semidefinite programming, define

                                   2M   b+a
                w0 = f, w1 =          −                  (f − q), e0 = q, e1 = f − q.
                                   b−a b−a

Then the nominal part of system equation can be represented by the SIMULINK diagram on
Figure 3.3.2. The normalized uncertainty is relating signals w1 and e1 according to w1 = δe1 ,
where δ ∈ [−1, 1] is an uncertain constant, and hence L2 gain “from e1 to w1 ” is not larger than
1.

                                  2    2          (b−a)/2
                                  e1   w1
                                            uncertainty scale



                     1                            (b+a)/2            G               1
                    w0                                                              e0
                                                nominal gain        plant




                         Figure 3.3.3: Nominal Model for Example 3.3.2

   MATLAB code using the SIMULINK diagram is shown below.

function g=exlmid1(G,a,b)
assignin(’base’,’G’,G);                     % export coefficients
assignin(’base’,’a’,a);
assignin(’base’,’b’,b);
[A,B,C,D]=linmod(’exlmid1mod’);             % extract plant model
n=size(A,1);                                % define LMI coefficient matrices
80

EI0=[eye(n) zeros(n,2)];EAB=[A B];
F0=[C(1,:) D(1,:)];F1=[C(2,:) D(2,:)];
L0=[zeros(1,n) 1 0];L1=[zeros(1,n) 0 1];
abst_init_lmi;                   % initialize LMI environment of IQCbeta
p=symmetric(n);                  % define matrix decision variables
r=symmetric; d=symmetric;
p>0; d>0;                        % define the matrix inequalities
L0’*r*L0-F0’*F0+L1’*d*L1-F1’*d*F1-EI0’*p*EAB-EAB’*p*EI0>0;
lmitbx_options([0 0 0 0 1]);     % turn off diagnostics/trace
R=lmi_mincx_tbx(r);              % call the SDP optimization engine
g=sqrt(R);

For simple (low order) systems, the code usually produces accurate upper bounds. For more
complicated examples, the bound can be overly conservative. Analysis via Integral Quadratic
Constraints (IQC), to be studied later in the course, allows one to improve quality of robustness
analysis dramatically within the framework of LMI optimization.
                                                                                        81

3.4     Model Reduction
This section introduces basic elements of robust control approach to model reduction
for LTI systems: numerical measures of controllability and observability, Hankel singular
numbers, balanced truncation, and Hankel optimal model reduction.

3.4.1    Hankel Singular Numbers
An attarctive approach to reducing the number of states in a state space model is to begin
by figuring out the order of relative importance between system’s states, and them remove
all states with lesser importance. The approach leads to the introduction of numerical
measures of controllability and observability, which leads to Hankel singular values as a
measures of state’s importance.

Motivation: removal of (almost) uncontrollable/unobservable modes
Removing an unobservable or an uncontrollable mode is an easy way of reducing the
dimension of the state vector in a state space model. For example, system

                                  ˙
                                  x1 = −x1 + x2 + f,
                                  ˙
                                  x2 = −2x2 ,
                                   y = x1 + x2 ,

can be replaced by
                                    ˙
                                    x = −x + f, y = x
without changing its transfer function. In this state space model, with

                       −1 1                 1
                A=                , B=          , C=     1 1 , D = 0,
                        0 −2                0

the controllability matrix

                                                     1 0
                             Mc =     B AB      =
                                                    −1 0

satisfies pMc = 0 for p = [0 1], and hence the variable px = x2 represents an uncontrollable
mode. The removal of such mode can be viewed as a canonical projection model reduction
                         ˆ            ˆ           ˆ
                     A → A = UAV, B → B = UB, C → C = CV,
82

where the columns of V form a basis in the column range of Mc (which is the same as the
null space of p), and U can be selected quite arbitrarily subject to the usual constraint
UV = I.
    Strictly speaking, the example above cannot even be considered as “model reduction”,
as the orders of the original and the projected systems are both equal to 1. A more
interesting situation is represented by the perturbed model

                                 ˙
                                 x1 = −x1 + x2 + f,
                                 ˙
                                 x2 = −2x2 + ǫf,
                                  y = x1 + x2 ,

(same A, C, D but a modified B), where ǫ > 0 is a parameter. Intuitively, one can expect
that, when ǫ > 0 is small enough,

                                  ˙
                                  x = −x + f, y = x

is still a good reduced model. This expectation can be related to the fact that x2 is
difficult to control by f when ǫ > 0 is small. One can say that the mode x2 = px, which
corresponds to the left (row) eigenvector of the A matrix (pA = −2p in this case), is
almost uncontrollable, which can be seen directly from the transfer function
                                    s+2+ǫ         1+ǫ   ǫ
                        G(s) =                  =     −   ,
                                 (s + 2)(s + 1)   s+1 s+2

which has a small coefficient at (s + 2)−1 in its partial fraction expansion.
    One can attempt to introduce a measure of importance of an LTI system’s mode
(understood as a pole a of system’s transfer function G(s)) as something related to the
absolute value of the coefficient with which (s − a)−1 enters the partial fraction expansion
of G. However, it is rarely a good idea to base a model reduction algorithm solely on
removal of “unimportant” system modes. For example, both modes a = 1−ǫ and a = 1+ǫ
of the LTI system with transfer function
                                        1     1
                             H(s) =        +      ,
                                      s+1−ǫ s+1+ǫ
where ǫ > 0 is small, are equally important, and none can be removed without introducing
                                                                               ˆ
a huge model reduction error, despite the fact that a very good reduced model H is given
by
                                       ˆ         2
                                      H(s) =        .
                                               s+1
                                                                                          83

   The objective of his subsection is to introduce a special joint measure of controllability
and observability for every vector in the state space of an LTI system. Then, the reduced
model is obtained by removing those components of the state vector which have the lowest
importance factor in terms of this measure.

Observability measures
Consider a state space model
                         ˙
                         x(t) = Ax(t) + Bf (t), y(t) = Cx(t) + Df (t),               (3.4.51)
where A is an n-by-n Hurwitz matrix (all eigenvalues have negative real part). When
f (t) ≡ 0 for t ≥ 0, the value of the output y(t) at a given moment t is uniquely defined
by x(0), and converges to zero exponentially as t → +∞. Hence the integral
                                                    ∞
                                        Eo =            |y(t)|2dt,
                                                0

measuring the “observable output energy” accumulated in the initial state, is a function
of x(0), i.e. Eo = Eo (x(0)). Moreover, since y(t) is a linear function of x(0), Eo will be a
quadratic form with respect of x(0), i.e.
                                      Eo (x(0)) = x(0)′ Wo x(0)
for some symmetric matrix Wo .
    Note that Eo = Eo (x) is a storage function for the autonomous system dx/dt = Ax
with quadratic supply rate σo (x) = −|Cx|2 . Since A is a Hurwitz matrix, Eo is the
smallest storage function for this supply rate.
    The quadratic form Eo = E0 (x(0)) can be used as a measure of observability defined
on the state space of system (3.4.51). In particular, Eo (x(0)) = 0 whenever
                             Mo x(0) = [C; CA; . . . ; CAn−1 ]x(0) = 0.
The positive semidefinite matrix Wo of the quadratic form Eo is called the observability
Gramian2 . Since, by the definition, Eo (x(0)) ≥ 0 for all x(0), the matrix Wo is always
positive semidefinite. Indeed, Wo > 0 whenever the pair (C, A) is observable.
   It is important to notice that the word “system state” actually includes two different
meanings. One meaning is that of a primal state, which, for a general model (3.4.51), is
a column n-vector. For example, for a state space model
                          ˙             ˙
                          x1 = −x1 + f, x2 = −x2 + f, y = x1 + x2 ,                  (3.4.52)
  2
      Whether this should be spelled as “Grammian” or “Gramian”, is unclear
84

a primal state is a particular column vector value of x(t) ∈ R2 , such as x(−1.2) = [1; −7].
Such column vector values are more precisely referred to as the primal states of (3.4.52).
   On the other hand, one frequently refers to (3.4.52) as a “system with two states”
(despite the fact that the set R2 has an infinite number of elements), and, in this context,
x1 = x1 (t) and x2 = x2 (t) can be referred to as the two states of the system. Let us call
such states the dual states. For a general model (3.4.51), a dual state is a particular linear
combination xp = px(t) of the scalar components of x = x(t), defined by a row n-vector
p. For example, in (3.4.52), the dual states x1 = x1 (t) and x2 = x2 (t) are defined by row
vectors p = [1 0] and p = [0 1] respectively.
   Therefore, it is natural to ask for a definition of an observability measure of a given
dual state of (3.4.51). It is defined as

                             E o (p) =      inf        Eo (x0 ) for p = 0,
                                         x0 : px0 =1

i.e. as the minimal output energy which can be observed for t ≥ 0 when px(0) = 1. Note
that infimum over an empty set equals plus infinity, hence E o (0) = ∞. When the pair
(C, A) is observable, and hence Wo > 0 is invertible, the dual observability measure is
given by
                                          1
                              E o (p) =        for p = 0.
                                          −1
                                        pWo p′
In this case the inverse
                                            1          −1
                                                   = pWo p′
                                         E o (p)
is a quadratic storage function for system dp/dt = pA + qC with quadratic supply rate
σ o (p, q) = −|q|2 . According to the KYP Lemma, E o (p)−1 is the smallest storage function
for this supply rate.
     The following theorem is frequently utilized for computing Wo numerically.

Theorem 3.4.1 Wo is the unique solution of the Lyapunov equation

                                  Wo A + A′ Wo + C ′ C = 0.                          (3.4.53)

Proof. Since system (3.4.51) is time invariant, the identity
                                                            ∞
                                x(0)′ Wo x(0) =                 |Cx(τ )|2 dτ
                                                        0

implies
                                                            ∞
                                x(t)′ Wo x(t) =                 |Cx(τ )|2 dτ.
                                                        t
                                                                                        85

Differentiating the second identity with respect to t at t = 0 yields

                                  2x(0)′ Wo Ax(0) = −|Cx(0)|2

for all x(0) ∈ Rn . Comparing the coefficients on both sides of the quadratic identity yields
(3.4.53).

    Finding a numerical solution of (3.4.53) is not easy when n is about 104 and larger. In
such situation, Theorem 3.4.1 can be used as a basis for finding an approximation of Wo .
    It is important to understand that the observability measure alone should not be the
only numerical test for choosing which states to eliminate in a model reduction procedure.
Instead, a combination of observability and a controllability measures, to be introduced
in the next subsection, should be used.

Controllability measures
Since A is a Hurwitz matrix, every input signal f = f (t) of finite energy, i.e. such that
                                                   ∞
                                       2
                                   f       =            |f (t)|2 dt < ∞,
                                                   −∞

corresponds to a unique initial condition x(0) in (3.4.51) for which the corresponding
solution x = x(t) satisfies x(t) → 0 as t → −∞. This solution is given by
                                                   ∞
                                 x(t) =                eAτ Bf (t − τ )dτ,
                                               0

where eM denotes the matrix exponent of a square matrix M. One can say that input
f = f (t) drives the system state from x(−∞) = 0 to x(0) = X(f (·)).
    Let p be a 1-by-n row vector, so that the product px(t) is a dual state of (3.4.51) –
a linear combination of components of the state space vector. The (dual) controllability
measure E c = E c (p) is defined as the maximal value of |px(0)|2 which can be achieved by
using an input f = f (t) of unit energy:

                           E c (p) = max{|pX(f (·))|2 :             f ≤ 1}.

   Accordingly, the primal controllability measure Ec = Ec (x0 ) is defined for a column
vector x0 ∈ Rn as
                                Ec (x0 ) = inf E c (p).
                                                       p: px0 =1

   The following statement describes some basic properties of these controllability mea-
sures.
86

Theorem 3.4.2 Assuming that A is an n-by-n Hurwitz matrix.
 (a) E c (p) = pWc p′ is a quadratic form with the coefficient matrix
                                                         ∞
                                                                         ′
                                         Wc =                eAt BB ′ eA t dt.
                                                     0

 (b) Wc is the unique solution of the Lyapunov equation
                                         AWc + Wc A′ = −BB ′ .                          (3.4.54)

 (c) A given state x0 ∈ Rn is reachable from zero if and only if Ec (x0 ) > 0 or, equiva-
     lently, the equation Wc p′ = x0 has a solution p′ . In this case Ec (x0 ) = px0 is the
     minimum of f (·) 2 subject to X(f (·)) = x0 .

Proof. To prove (a), note that

                                 max        −∞∞ g(t)′ f (t)dt = g ,
                                  f ≤1

hence                                           ∞
                                E c (p) =           |peAt B|2 dt = pWc p′ .
                                            0
     Statement (b) is actually a re-wording of Theorem 3.4.1, with C replaced by B ′ , A replaced
by A′ , Wo replaced by Wc , and x(0) replaced by p.
     To prove (c), consider first the case when equation Wc p′ = x0 has no solution p. Then
there exists a row vector p0 such that p0 Wc = 0 but p0 x0 = 0. Here the equality means that
|p0 X(f (·))|2 equals zero for every finite energy signal f = f (t). Since, from the inequality,
|p0 x0 |2 > 0, the state x0 is not reachable from zero.
     Now assume that x0 = Wc p′ for some p. Then f 2 ≥ pWc p′ = px0 whenever x0 = X(f (·)).
On the other hand, for
                                                      ′
                                               B ′ e−A t p′ , t ≤ 0,
                                     f (t) =
                                               0,             t > 0,
we have f   2   = px0 and x0 = X(f (·)).

  When the pair (A, B) is controllable, and, hence, Wc > 0, the primal controllability
measure Ec = Ec (x0 ) can be expressed as
                                                     1
                               Ec (x0 ) =                        for x0 = 0.
                                            x′0 Wc−1 x0
In this case E c (p) is the minimal storage function with supply rate σ c (p) = −|pB|2
for system dp/dt = pA, and Ec (x)−1 is the minimal storage function with supply rate
σc (x, f ) = −|f |2 for system dx/dt = Ax + Bf .
                                                                                         87

Joint controllability and observability measures
The joint controllability and observability measures are defined as products of the corre-
sponding controllability and observability measures:

                     Eoc (x0 ) = Eo (x0 )Ec (x0 ), E oc (p) = E o (p)E c (p).

For controllable and observable systems Wc and Wo are positive definite, and the formulae
can be simplified to
                               x′0 Wo x0                        pWc p′
                Eoc (x0 ) =               (x0 = 0), E oc (p) =         (p = 0).
                              x′0 Wc−1 x0                         −1
                                                               pWo p′
   Trying to find the extremal values of Eoc (x) or E oc (p) leads to extremum equations

                         Wo x − hWc−1 x = 0, where h = x′ Wc−1 x,
                                                         −1
                       pWc − hpW )o−1 = 0, where h = p′ Wo p′ ,
which proves that the extremal vectors and the extremal values can be defined in terms
of the eigenvalues and eigenvectors of matrix Wc Wo .
    Since Wc and Wo are positive semidefinite, all eigenvalues of Wc Wo are real and non-
negative. The square roots of those eigenvalues are called Hankel singular numbers of
system (3.4.51), and play a central role in understanding its model reduction.

3.4.2     Balanced Truncation
Balanced truncation is a model reduction algorithm based on straightforward elimination
of those states of system (3.4.51) which have lesser joint controllability and observability
measure.

Dominant Controllability and Observability Subspaces
For model reduction purposes, we are interested in finding a subspace of primal state
vectors for which the minimum of the joint controllability and observability measure
over all non-zero elements is maximal. A basis in this subspace will yield columns of
a projection matrix V . Similarly, we are interested in finding a subspace of dual state
vectors for which the minimum of the joint controllability and observability measure over
all non-zero elements is maximal. A basis in this subspace will yield rows of a projection
matrix V .
    The following theorem can be used in finding such V and U.
88

Theorem 3.4.3 Let Wc = Lc L′c and Wo = L′o Lo be the Choleski decompositions of the
controllability and observability Gramians. Let

                             ρ1 ≥ · · · ≥ ρr > ρr+1 ≥ · · · ≥ ρn ≥ 0

be the ordered eigenvenvectors of L′c Wo Lc (possibly repeated). Let ψ1 , . . . , ψr be the corre-
sponding first r normalized eigenvectors of L′c Wo Lc , i.e.
                                                         ′
                     L′c Wo Lc ψi = ρi ψi , |ψi |2 = 1, ψi ψk = 0 for i = k.
           1/2
Let σi = ρi .
 (a) ρ1 ≥ · · · ≥ ρr are also the eigenvalues of Lo Wc L′o , and the corresponding normalized
     row eigenvectors can be defined by
                                         −1 ′
                                   φi = σi ψi L′c L′o (i = 1, . . . , r).

 (b) The set of all linear combinations of vectors Lc ψi is the only r-dimensional linear
     subspace V in Rn such that Eoc (v) ≥ ρr for every v ∈ V.

 (c) The set of all linear combinations of row vectors φi Lo is the only r-dimensional
     linear subspace U of row n-vectors such that E oc (u) ≥ ρk for every u ∈ U.

 (d) UV = Ir , where
                                                                             −1/2
                                                                                         
                                                                            σ1       φ1
                    V = Lc         −1/2             −1/2
                                                            , U =               .
                                                                                 .
                              ψ1 σ1       . . . ψr σr                                      Lo .
                                                                                         
                                                                                 .
                                                                             −1/2
                                                                            σr       φr

     The proof of the theorem is obtained by inspection.

Balanced Truncation Algorithm
The projection model reduction algorithm which uses the projection matrices U, V de-
scribed in the previous subsection is called balanced truncation. The “balancing” termi-
nology comes from the following trivial observation.

Theorem 3.4.4 Assume that system (3.4.51) is both controllable and observable. Then
Wc = Lc L′c > 0 and Wo = L′o Lo > 0 are positive definite. Moreover, if Ψ is the orthogonal
matrix which columns are the ordered eigenvectors of L′c Wo Lc , and Σ is the diagonal
                                                                                     89

matrix with numbers σi on the diagonal, then the linear state transformation x = Sz,
with S = Lc ΨΣ−1/2 , yields an equivalent state space model with the coefficient matrices

                          ¯            ¯           ¯
                          A = S −1 AS, B = S −1 B, C = CS,

for which both the controllability and observability Gramians equal Σ.

    State state space models for which Wo = Wc are equal diagonal matrices with ordered
positive diagonal entries σk are called balanced. The numbers σk are called the (Han-
kel) singular numbers of the corresponding system. The (canonical) method of balanced
truncation is based on removing the states of a balanced realization which correspond to
singular numbers below a certain threshold. Indeed, a practical implementation does not
have to involve a calculation of a complete balanced model: only the projection matrices
U, V are necessary.

Example
Let
                                        1     1
                             G(s) =        +      ,
                                      s+1−ǫ s+1+ǫ
where ǫ > 0 is a small parameter. A state space model is

                   −1 + ǫ   0                   1
            A=                        , B=            , C=            1 1 , D = 0.
                     0    −1 − ǫ                1

The controllability Gramians are given by
                                                 1
                                            1   1−ǫ
                                                          1
                              Wo = Wc =                1          .
                                            2   1     1+ǫ

The Hankel singular numbers are the eigenvalues of Wo = Wc , and equal
                                          √
                                    1 1 ± 1 − ǫ2 + ǫ4
                             σ1,2 =                   .
                                    2      1 − ǫ2
The corresponding eigenvectors are

                                               1
                                 v1,2 =              1        .
                                          2σ1,2 −   1−ǫ
90

The dominant eigenvector defines

                                   √1
                                     2             1     1
                           V ≈     √1    , U≈     √
                                                    2
                                                        √
                                                          2
                                                               .
                                     2


The resulting reduced system is approximately given by
                                          √            √
                            ˆ         ˆ          ˆ
                            A ≈ −1, B ≈ 2, C =≈ 2.

Approximate Gramians
The two most expensive phases of numerical calculations associated with the canonical
method of balanced truncation are finding the Gramians Wo , Wc , and calculating the
dominant eigenvectors ψi of L′l Wo Lc . for systems of large dimensions (more than 104
states) finding the Gramians exactly becomes difficult.
    As a viable alternative, lower and upper bounds of the Gramians can be used to
provide provably reliable results. Here by lower bounds of Gramians Wo , Wc we mean
                                  −
positive semidefinite matrices Wo , Wc− for which the inequalities
                                       −
                                 Wo ≥ Wo ,   Wc ≥ Wc−

are guaranteed. The definition of upper bounds will be more strict: by upper bounds of
Gramians Wo , Wc defined by the Lyapunov equalities

                    Wo A + A′ Wo = −C ′ C,   AWc + Wc A′ = −BB ′ ,
                                                +
where A is a Hurwitz matrix, we mean solutions Wo , Wc+ of the corresponding Lyapunov
inequalities
                    +          +
                 Wo A + A′ Wo ≤ −C ′ C, AWc+ + Wc+ A′ ≤ −BB ′ .
These inequalities imply that Wo ≥ Wo and Wc+ ≥ Wc , but the inverse implication is not
                                +

always true.
   The following simple observation can be used to produce lower bounds of the Gramians.

Theorem 3.4.5 Let A be an n-by-n Hurwitz matrix. Let F be an n-by-m matrix. For
s ∈ C with Re(s) > 0 define

         a = a(s) = (sIn − A)−1 (¯In + A),
                                 s           b = b(s) =       2Re(s)(sIn − A)−1 B.

Then
                                                                                            91

 (a) a is a Schur matrix (all eigenvalues strictly inside the unit disc);

 (b) an n-by-n matrix P is a solution of the “continuous time” Lyapunov equation

                                        AP + P A′ = −BB ′

       if and only if it is a solution of the “discrete time” Lyapunov equation

                                          P = aP a′ + bb′ ;

 (c) the matrix P from (b) is the limit

                                            P = lim Pk ,
                                                 k→∞

       where the symmetric matrices P0 ≤ P1 ≤ P2 ≤ · · · ≤ P are defined by

                           P0 = 0, Pk+1 = a(sk )Pk a(sk )′ + b(sk )b(sk )′ ,

       and {sk } is a sequence of complex numbers contained in a compact subset of the
       open right half plane.

    The theorem reduces finding a lower bound of a solution of a Lyapunov equation to
solving systems of linear equations (used to produce b(sk ) and the products a(sk )Pk a(sk )′ .
    Calculation of an upper bound of a Gramian could be more tricky. One approach to
finding such upper bounds relies on having a valid energy function for A available.
    Indeed, assume that Q = Q′ satisfies

                                       AQ + QA′ < 0.

If Wc− is a good lower bound of the controllability Gramian Wc , defined by

                                   AWc + Wc A′ = −BB ′ ,

then
                                  AWc− + Wc− A′ ≈ −BB ′ ,
and hence
                          A(Wc− + ǫQ) + (Wc− + ǫQ)A′ ≤ −BB ′
for some ǫ > 0 (which will, hopefully, be small enough). Then Wc− and Wc− + ǫQ are a
lower and an upper bound for the controllability Gramian Wc .
92

Lower bounds for model reduction error
A major benefit of doing balanced truncation is given by a lower bound of the error of
arbitrary reduced models (not only those produced via balanced truncation).

                        −      ′
Theorem 3.4.6 Let Wo = Fo Fo and Wc− Fc Fc′ be lower bounds of the observability and
controllability Gramians Wo , Wc of a stable LTI model G = G(s). Let
                                          −    −
                                         σ1 ≥ σ2 ≥ · · · ≥ 0
                                                 −
be the ordered singular numbers of Fo Fc . Then σk is a lower bound for the k-th Hankel
singular number σk = σk (G) of the system, and

                                              ˆ
                                            G−G      ∞
                                                            −
                                                         ≥ σk

                 ˆ
for every system G of order less than k.

                                                                                 −
Proof. Let Zk denote the subspace spanned by the k dominant eigenvectors of Fc′ Wo Fc , i.e.
                                                  −
                                     |Fo Fc z| ≥ σk |z| ∀ z ∈ Zk .

Since Wc ≥ Fc Fc′ , every vector Fc z lies in the range of Wc , and q ′ Fc z ≤ |z|2 whenever Fc z = Wc q.
Hence every state x(0) = Fc z can be reached from x(−∞) = 0 using a minimal energy input
f = fz (t) (depending linearly on z) of energy not exceeding |z|2 . On the other hand, every state
                                                                   −
x(0) = Fc z with z ∈ Zk will produce at least |Fo Fc z|2 ≥ (σk )2 |z|2 of output energy. Since G       ˆ
is a linear system of order less than k, there exists at least one non-zero z ∈ Zk for which the
                                           ˆ
input f = fz (t) produces a zero state x(0) = 0 at zero time. Then, assuming the input is zero
                                                      −
for t > 0, the error output energy is at least (σk )2 |z|2 . Since the testing input energy is not
                                                         −                                ˆ
larger than |z|2 > 0, this yields an energy gain of (σk )2 , which means that G − G ∞ ≥ σk .       −




Upper bonds for balanced truncation errors
The result from the previous subsection states that, for a stable LTI system G, no method
                               ˆ
can produce a reduced model G of order less than k such that the H-Infinity error G−G ∞ˆ
is less than the k-th singular number σk = σk (G). The statement is easy to apply, since
                −
lower bounds σk of σk can be computed by using lower bounds of system Gramians.
    The following theorem gives an upper bound of model reduction error for the exact
implementation of the balanced truncation method.
                                                                                            93

Theorem 3.4.7 Let σ1 > σ2 > · · · > σh be the ordered set of different Hankel singular
                                        ˆ
numbers of a stable LTI system G. Let G be the reduced model obtained by removing the
states corresponding to singular numbers not larger than σk from a balanced realization of
           ˆ
G. Then G is stable, and satisfies
                                                   h
                                     ˆ
                                   G−G    ∞   ≤2         σi .
                                                   i=k


    The utility of Theorem 3.4.7 in practical calculations of H-Infinity norms of model
reduction errors is questionable: an exact calculation of the H-Infinity norm is possible
at about the same cost, and the upper bound itself can be quite conservative. Neverthe-
less, the theorem provides an important reassuring insight into the potential of balanced
truncation: since the singular numbers of exponentially stable LTI systems decay expo-
nentially, the upper bound of Theorem 3.4.7 is not expected to be much larger than the
lower bound.
    For example, for a system with singular numbers σi = 2−i , a kth order reduced model
cannot have quality better than 2−k−1, and exact balanced truncation is guaranteed to
provide quality of at least 2−k+1 .
    The proof of Theorem 3.4.7 is based on estimating the quality of balanced truncation
in the case when only the states of a balanced realization corresponding to the smallest
Hankel singular number are removed, which is done in the following technical lemma.

Lemma 3.4.1 Let W = W ′ > 0 be a positive definite symmetric n-by-n matrix satisfying
the Lyapunov equalities

                      W A + A′ W = −C ′ C, AW + W A′ = −BB ′ .                         (3.4.55)

Assume that W, A, B, C can be partitioned as
                                                                               ′
                   P 0               A11 A12                    B1            C1
         W =               , A=                  , B=                , C′ =    ′   ,
                   0 σIr             A21 A22                    B2            C2
                                                  ′
where A22 is an r-by-r matrix, and matrices B2 , C2 have r rows. Then

 (a) transfer matrices

                    G(s) = C(sIn − A)−1 B,     G1 (s) = C1 (sIn−r − A11 )−1 B1

     are stable;
94

 (b) the Lyapunov equalities
                                          ′                            ′
                       P A11 + A′11 P = −C1 C1 , A11 P + P A′11 = −B1 B1

      are satisfied;

 (c) G − G1     ∞   ≤ 2σ.

Proof. It is sufficient to consider the case when the dimension m of f = f (t) equals the
dimension k of y = y(t). (If m < k, add zero columns to B, if m > k, add zero rows to C.) First
note that re-writing (3.4.55) in terms of the blocks Aik , Bi , Ci yields

                                    P A11 + A′ P
                                             11
                                                         ′
                                                     = −C1 C1 ,                         (3.4.56)
                                    P A12 + σA′ 21   =      ′
                                                          −C1 C2 ,                      (3.4.57)
                                    σ(A22 + A′ )
                                              22     =      ′
                                                          −C2 C2 ,                      (3.4.58)
                                    A11 P + P A′11   =         ′
                                                          −B1 B1 ,                      (3.4.59)
                                    σA12 + P A′ 21   =         ′
                                                          −B1 B2 ,                      (3.4.60)
                                              ′                ′
                                    σ(A22 + A22 )    =    −B2 B2 .                      (3.4.61)
                                                       ′
Note (3.4.58) together with (3.4.61) implies that C2 = B2 θ for some unitary matrix θ. Also,
(3.4.56) and (3.4.59) prove (b).
    To prove (a), note that for every complex eigenvector v = 0 of A, Av = sv for some s ∈ C,
multiplication of the first equation in (3.4.55) by v ′ on the left and by v on the right yields

                                      2Re(s)v ′ W v = −|Cv|2 .

Hence either Re(s) < 0 or Re(s) = 0 and Cv = 0. Hence all unstable modes of A are unobserv-
able, and G = G(s) has no unstable poles. The same proof applies to G1 , since A11 satisfies
similar Lyapunov equations.
    To prove (c), consider the following state space model of the error system G − G1 :

                                  ˙
                                  x1 = A11 x1 + A12 x2 + B1 f,
                                  ˙
                                  x2 = A21 x1 + A22 x2 + B2 f,
                                  ˙
                                  x3 = A11 x3 + B1 f,
                                   e = C1 x1 + c2 x2 − C1 x3 .

It would be sufficient to find a positive definite quadratic form V (x) = x′ Hx such that

                                                               dV (x(t))
                            ψ(t) = 4σ 2 |f (t)|2 − |e(t)|2 −             ≥0
                                                                  dt
                                                                                                    95

for all solutions of system equations. Indeed, such Lyapunov function V can be readily presented,
though there is no easy way to describe the intuitive meaning of its format:

              V (x) = σ 2 (x1 + x3 )′ P −1 (x1 + x3 ) + (x1 − x3 )′ P (x1 − x3 ) + 2σ|x2 |2 .

To streamline the derivation, introduce the shortcut notation
                                                               ′           ′
               z = x1 + x3 , ∆ = x1 − x3 , δ = C1 ∆, u = σ −1 B2 x2 , q = B1 P −1 z.

The equations now take the form

                                ˙
                                ∆ = A11 ∆ + A12 x2 ,
                                 ˙
                                 z = A11 z + A12 x2 + 2B1 f,
                               ˙
                               x2 = A22 x2 + 0.5A21 (z + ∆) + B2 f,
                                 e = C1 ∆ + σθ ′ u.

We have

       ψ = 4σ 2 |f |2 − |C1 ∆ + σθ ′ u|2 − 2σ 2 z ′ P −1 (A11 z + A12 x2 + 2B1 f )
               −2∆′ P (A11 ∆ + A12 x2 ) − 4σx′ [A22 x2 + 0.5A21 (z + ∆) + B2 f ]
                                             2
           = 4σ 2 |f |2 − |C1 ∆ + σθ ′ u|2 + σ 2 |q|2 − 4σ 2 q ′ f +
               |δ|2 + 2σ 2 |u|2 − 4σ 2 u′ f − 2z ′ [σ 2 P −1 A12 + σA′ ]x2 − 2∆′ [P A12 + σA′ ]x2
                                                                     21                     21
           = 4σ 2 |f |2 − |C1 ∆ + σθ ′ u|2 + σ 2 |q|2 − 4σ 2 q ′ f + |δ|2
               +2σ 2 |u|2 − 4σ 2 u′ f + 2σ 2 q ′ u + 2σδ′ θ ′ u
           ≥ 4σ 2 |f |2 − |C1 ∆ + σθ ′ u|2 + |δ|2 + 2σ 2 |u|2 + 2σδ′ θ ′ u − σ 2 |u − 2f |2
           = 0

(the first identity transformation utilized (3.4.56), (3.4.59), (3.4.58), the second identity used
(3.4.57), (3.4.60), the third (inequality) applied minimization with respect to q, the last (identity)
depended on θ being a unitary matrix.

    It is important to realize that the lemma remains valid when the original Lyapunov
equalities are replaced by the corresponding Lyapunov inequalities (of course, the equal-
ities in (b) will get replaced by inequalities as well). Indeed, Lyapunov inequalities

                          W A + A′ W ≤ −C ′ C, AW + W A′ ≤ −BB ′

are equivalent to Lyapunov equalities
                                        ˜ ˜                 ˜˜
                          W A + A′ W = −C ′ C, AW + W A′ = −B B ′ ,
96

where
                             ˜                   ˜          C
                             B=      B Bδ      , C=              ,
                                                            Cδ
and Bδ , Cδ are chosen appropriately. Note that G(s) is a left upper corner block in the
transfer matrix
                                  ˜    ˜           ˜
                                 G = C(sIn − A)−1 B.
Since H-Infinity norm of a transfer matrix is not smaller than H-Infinity norm of its block,
                                                    ˜ ˜
applying Lemma 3.4.1 to the system defined by A, B, C yields the stated generalization.


3.5     Hankel Model Reduction
This lecture covers the basic principles of using Hankel operators in model reduction of
LTI systems.

Hankel operators
Let L2 denote the set of all integrable functions e : R → Rr such that
     r

                                      ∞
                                          |e(t)|2 dt < ∞.
                                     −∞

Let L2 (−∞, 0) denote the subset of L2 which consist of functions e such that e(t) = 0 for
       r                               r
t ≥ 0. The elements of L2 (−∞, 0) will be called anti-causal in this lecture. Similarly, let
                           r
L2 (0, ∞) be the subset of functions e ∈ L2 such that e(t) = 0 for t < 0. The elements of
  r                                         r
L2 (0, ∞) will be called causal.
  r
    Let G = G(s) be a k-by-m matrix-valued function (not necessarily a rational one),
bounded on the jω-axis. The corresponding Hankel operator H = HG is the linear
transformation which maps anti-causal square integrable functions f ∈ L2 (−∞, 0) to
                                                                              r
causal square integrable functions h = HG f ∈ L2 (0, ∞) according to the following rule:
                                                    r
h(t) = y(t)u(t), where y(t) is the inverse Fourier transform of Y (jw) = G(jω)F (jω),
F (jω) is the Fourier transform of f (t), and u(t) is the unit step function

                                              1, t ≥ 0,
                                   u(t) =
                                              0, t < 0.

    In terms of the (stable, but not necessarily causal) LTI system defined by G, the
Hankel operator maps anti-causal inputs f = f (t) to the causal parts h = h(t) = y(t)u(t)
of the complete system response y(t). In particular, when G is anti-stable, i.e. is a proper
                                                                                                97

rational transfer matrix without poles s with Re(s) ≤ 0, the associated LTI system is
anti-causal, and hence the resulting Hankel operator HG is zero. More generally, adding
an anti-stable component to G does not affect the resulting HG .


Hankel matrices

Let a > 0 be a fixed positive number. Then functions
                                 √                        k
                                  2a          a−s
                       Θk (jω) =                              , k = 0, 1, 2, . . .
                                 s+a          a+s

form an orthonormal basis in the space of stable strictly proper transfer functions, in the
sense that for every such function H = H(s) there exists a square summable sequence of
real numbers h0 , h1 , h2 , . . . satisfying
                                                  ∞
                                     H(jω) =            hk Θk (jω),
                                               k=0


i.e.
                                N                 2              ∞
                ∞
            1
                     H(jω) −         hk Θk (jω) dω =                     |hk |2 → 0 as N → ∞.
           2π   −∞             k=0                            k=N +1

In a similar sense, the inverse Fourier transforms θk = θk (t) of Θk = Θk (jω), form an
orthonormal basis in L2 (0, ∞), and the inverse Fourier transforms θk (−t) of Θk (−jω)
                        1
form an orthonormal basis in L2 (0, ∞).
                                1
   The following lemma allows one to establish a matrix representation of a Hankel
operator with respect to input basis {θk (−t)}∞ and output basis {θk (t)}∞ .
                                              k=0                        k=0



Lemma 3.5.1 Let
                                      ∞                              k
                                1                      a + jω             adω
                         gk =             G(jω)                                 .
                                π    −∞                a − jω            a2+ ω2
Then the result h = h(t) of applying HG to f (t) = θr (−t) is given by
                                              ∞
                                     h(t) =           gr+k+1θk (t).
                                              k=0
98

   An important implication of the lemma is that the matrix of HG with respect to the
input/output bases {θk (−t)}∞ , {θk (t)}∞ is the Hankel matrix
                            k=0         k=0
                                                       
                                     g1 g2 g3 g4
                                  g2 g3 g4             
                                                       
                            ΓG =  g 3 g 4              .
                                                       
                                  g4                   
                                                       
                                                   ..
                                                      .

In general, gk are matrices with real coefficients, in which case ΓG is called the block
Hankel matrix.
Proof. Consider the decomposition
                                                        ∞
                                               h(t) =         hk θk (t).
                                                        k=0

By orthonormality of θk (·),
                                           ∞                       ∞
                               hk =            h(t)θk (t)dt =          y(t)θk (t)dt,
                                       0                          −∞

where y is the response of the stable LTI system associated with G to f (t) = θr (−t). By the
Parceval formula,
                                      ∞
                                  1
                           hk =         Θk (−jω)G(jω)Θr (−jω)dω
                                 2π −∞
                          ∞                           k+r+1
                      1                    a + jω                     2adω
                 =             G(jω)                                             = gk+r+1 .
                     2π   −∞               a − jω               (a + jω)(a − jω)



Singular values of a Hankel operator
Let M be an a-by-b matrix representing a linear transformation from Rb to Ra . Remember
that the operator norm of M is defined as the minimal upper bound for all ratios |Mv|/|v|,
where v ranges over the set of all non-zero vectors in Rb . In addition, the r-th singular
number of M can be defined as the minimal operator norm of the difference ∆ = M − M ,     ˆ
        ˆ
where M ranges over the set of all matrices with rank less than r.
   These definitions extend naturally to linear transformations of other normed vector
spaces (possibly infinite dimensional). In particular, for a linear transformation M from
                                                                                                  99

L2 (−∞) to L2 (0, ∞), its operator norm is defined as the square root of the minimal
 m           k
upper bound for the ratio
                                     ∞                        0
                                                      2
                                         |(Mf )(t)| dt/           |f (t)|2dt,
                                 0                           −∞

where
                                               0
                                                    |f (t)|2 dt > 0.
                                               −∞
Such transformation M is said to have rank less than r if for every family of r functions
f1 , . . . , fr ∈ L2 (−∞, 0) there exist constants c1 , . . . , cr , not all equal to zero, such that
                   k

                                 c1 (Mf1 ) + · · · + cr (Mfr ) ≡ 0.

Finally, the r-th singular number of M can be defined as the minimal operator norm of
                           ˆ          ˆ
the difference ∆ = M − M , where M ranges over the set of all matrices with rank less
than r.
    This allows us to talk about the k-th singular number of the Hankel operator HG
associated with a given matrix-valued function G = G(jω), bounded on the imaginary
axis. The largest singular number is called the Hankel norm G H of G, while the k-th
singular number is called the k-th Hankel singular number of G.
    For rational transfer matrices G, calculation of singular numbers of the correspond-
ing Hankel operator can be done using observability and controllability Gramians. The
following theorem was, essentially, proven in the lectures on balanced truncation.

Theorem 3.5.1 Let A be an n-by-n Hurwitz matrix, B, C be matrices of dimensions n-
by-m and k-by-n respectively, such that the pair (A, B) is controllable, and the pair (C, A)
is observable. Let Wc , Wo be the corresponding controllability and observability Gramians.
Then, for G(s) = C(sI − A)−1 B, the Hankel operator HG has exactly n positive singular
numbers, which are the square roots of the eigenvalues of Wc Wo .

    It is also true that a Hankel operator with a finite number of positive singular numbers
is defined by a rational transfer matrix.

Theorem 3.5.2 Let G = G(jω) be a bounded matrix-valued function defined on the
imaginary axis. Let a > 0 be a positive number. If the Hankel operator HG has less than
r positive singular numbers then the coefficients
                                           ∞                           k
                                     1                    a + jω            adω
                            gk =               G(jω)
                                     π    −∞              a − jω           a2+ ω2
100

coincide for k > 0 with such coefficients of a stable strictly proper transfer matrix G1 of
order less than r.
    For some non-rational transfer matrices, analytical calculation of σi may be possible.
For example, the i-th largest singular number of HG , where G(s) = exp(−s), equals 1 for
all positive i.
    In general, singular numbers of HG will converge to zero if G = G(jω) is continuous
on the extended imaginary axis (note that G(s) = exp(−s) is not continuous at ω = ∞).
The converse statement is not true.

The Hankel optimal model reduction setup
Let G = G(s) be a matrix-valued function bounded on the jω-axis. The task of Hankel
                                                                    ˆ
optimal model reduction of G calls for finding a stable LTI system G of order less than a
given positive integer m, such that the Hankel norm ∆ H of the difference ∆ = G − G     ˆ
is minimal.
    Since Hankel operator HG represents a “part” of the total LTI system with transfer
matrix G, Hankel norm is never larger than H-Infinity norm. Hence, Hankel optimal
model reduction setup can be viewed as a relaxation of the “original” (H-Infinity optimal)
model reduction formulation. While no acceptable solution is available for the H-infinity
case, Hankel optimal model reduction has an elegant and algorithmically efficient solution.

3.5.1    The AAK theorem
The solution of the Hankel optimal model reduction problem is based on the famous
Adamyan-Arov-Krein (AAK) theorem, presented in this section.

The AAK Theorem
The famous Adamyan-Arov-Krein theorem provides both a theoretical insight and (taking
a constructive proof into account) an explicit algorithm for finding Hankel optimal reduced
models.
Theorem 3.5.3 Let G = G(s) be a matrix-valued function bounded on the jω-axis. Let
σ1 ≥ σ2 ≥ . . . σm ≥ 0 be the m largest singular values of HG . Then σm is the minimum
        ˆ                                       ˆ
of G − G H over the set of all stable systems G of order less than m.
   In other words, approximating Hankel operators by general linear transformations of
rank less than m cannot be done better (in terms of the minimal L2 gain of the error)
than approximating it by Hankel operators of rank less than m.
                                                                                       101

    The proof of the theorem, to be given in this section for the case of a rational trans-
fer matrix G = G(s), is constructive, and provides a simple state space algorithm for
calculating the Hankel optimal reduced model.

H-Infinity quality of Hankel optimal reduced models
It is well established by numerical experiments that Hankel optimal reduced models usu-
ally offer very high H-Infinity quality of model reduction. A somewhat conservative de-
scription of this effect is given by the following extension of the AAK theorem.

Theorem 3.5.4 Let G = G(s) be a stable rational function. Assume that the Hankel
singular numbers σk = σk (G) of G satisfy

                       σm−1 > σm = σm+1 = · · · = σm+r−1 > σm+r .

                                                               ˜     ˜       ˜
σk (G) = σm (G) for m ≤ k < m+r, and σm+r (G) < σm (G). Let σm > σm+1 > σm+2 > . . .
                                                                               ˜
be the ordered sequence of different Hankel singular values of G, starting with σm = σm
     ˜
and σm+1 = σm+r . Then
                                                 ˆ
 (a) there exists a Hankel optimal reduced model GH of order less than m such that
                                                   m

                                                               r
                                       ˆ
                                   G − GH    ∞   ≤ σm +            σk ;
                                         m
                                                           k>m


                          ˆ
 (b) there exists a model G∗ of order less than m such that
                            m

                                          ˆ
                                      G − G∗     ∞   ≤         ˜
                                                               σk .
                                            m
                                                         k≥m


    Just as in the case of the basic AAK theorem, the proof of Theorem 3.5.4 is construc-
tive, and hence provides an explicit algorithm for calculation of reduced models with the
described properties. In practice, the actual H-Infinity norm of model reduction error is
much smaller.
    It is important to remember that the Hankel optimal reduced model is never unique
(at least, the “D” terms do not have any effect on the Hankel norm, and hence can be
modified arbitrarily). The proven H-Infinity model reduction error bound is guaranteed
only for a specially selected Hankel optimal reduced model. Also, the reduced model from
(b) is not necessarily a Hankel optimal reduced model.
102