ACCURACY CERTIFICATES FOR COMPUTATIONAL PROBLEMS WITH CONVEX STRUCTURE
by Arkadi Nemirovski1 , Shmuel Onn2 and Uriel G. Rothblum3
April 1, 2007
Abstract The goal of the current paper is to introduce the notion of certificates which verify the accuracy of solutions of computational problems with convex structure; such problems include minimizing convex functions, variational inequalities with monotone operators, computing saddle points of convex-concave functions and solving convex Nash equilibrium problems. We demonstrate how the implementation of the Ellipsoid method and other cutting plane algorithms can be augmented with the computation of such certificates without essential increase of the computational effort. Further, we show that (computable) certificates exist whenever an algorithm is capable to produce solutions of guaranteed accuracy. Key words: convexity, certificates, computation in convex structures, convex minimization, variational inequalities, saddle points, convex Nash equilibrium
Supported in part by NSF grant DMI 0619977 Supported in part by a grant from ISF - the Israel Science Foundation, by a VPR grant at the Technion and by the Fund for the Promotion of Research at the Technion 3 Supported in part by a grant from ISF - the Israel Science Foundation, by a VPR grant at the Technion and by the Fund for the Promotion of Research at the Technion
2
1
1
Introduction
Convexity properties, of functions and domains, have long been known to provide useful structure that enhances the solution of computational problems (see, e.g., [11, 13, 18, 21, 22, 25, 1] and references therein). Still, for problems with convex structure that are continuous and nonlinear (see [6] for a formal definition of linear problems) exact arithmetic, even if possible, is typically not sufficient for the availability of computational methods that will determine precise solutions. Consequently, one has to rely on iterative algorithms that generate approximate solutions. An important issue when executing computation to achieve a particular goal is the generation of a certificate that can be used to efficiently validate that the goal has been accomplished. In the current paper we introduce the notion of certificates in the context of computational problems “with convex structure”. Problems that we consider include minimizing convex functions, solving variational inequalities corresponding to monotone operators, computing saddle points of convexconcave functions and solving convex Nash equilibrium problems; all of the above are defined over solids (convex compact sets with nonempty interiors) in a Euclidean space. For each of these problems we define an accuracy measure that is supposed to reflect the proximity of a given feasible solution to being a precise solution; for example, for the minimization problem the accuracy measure is the difference between the value of the objective function at the computed feasible solution and the optimal value. The certificates we introduce provide easily computable bounds on the accuracy of approximate solutions. The idea of certificates that are to verify properties of outcomes of computations is not new. In particular, the classic open problem of “P=NP?” focuses on the difference between verifying that a feasible point satisfies a prescribed target condition vs. the generation of such a point. An extreme exploration of the difference between generating answers and certificates that verify that the answers are correct occurs in the study of “Zero Knowledge Proofs” in Theoretical Computer Science. The focus in this area is on generating certificates that verify the ability to solve a problem without yielding information about how it is done and what the computed solution is (e.g., [10, 9]). We next provide two (closely related) simple examples of algorithms whose implementation is typically accompanied with the generation of certificates. The problems that are solved have linear structure and the algorithms solve the problems precisely (assuming exact arithmetic). Example 1.1 Consider the problem of determining whether or not a Linear Programming problem is feasible. The first phase of the Simplex method is known to be able to produce a (correct) answer to this problem. Further, standard application of the method provides one with a feasible solution, certifying feasibility. In addition, the method can be used to produce Farkas multipliers for problems that are infeasible; the Farkas Lemma assures that such multipliers certify infeasibility. In either case, the produced certificate can be used to efficiently verify the reached conclusion. Example 1.2 Consider the problem of identifying an optimal solution for a Linear Programming problem which is known to be feasible and bounded. The goal is then to produce a feasible solution that maximizes the objective function over the feasible domain. The Simplex Method is known to be able to accomplish this task and produce such a solution. Once a solution is provided, it is easy to certify its feasibility, but the verification of optimality is more involved. For that goal, Duality theory tells us that the production of a dual feasible solution that has the same objective as a given (primal) feasible solution certifies optimality of that given solution. Of course, once such a pair is produced, primal and dual feasibility are easy to check as is the 1
verification that the corresponding primal and dual objective functions coincide. As it turns out, the execution of the Simplex method can be augmented with computation that will generate such a certificate for the generated solution. To illustrate the notion of an accuracy certificate and its role, consider the simplest problem with convex structure, specifically, minimizing a convex function F over a solid X ⊆ Rn , int X ⊂ Dom (F ). We equip the problem with the natural accuracy measure opt (x) = F (x) − inf X F , and the goal of the computation is to build x ∈ X with opt (x) ≤ , where > 0 is a given tolerance. We consider an oracle-based algorithm, that is, an algorithm which has no a priori knowledge of F and X, except for the fact that X is a solid contained in a given solid B and F is a convex function with Dom (F ) ⊇ int X. Additional information on F , X can be obtained solely from the Separation and First Order oracles as follows: given on input a point x ∈ Rn , the Separation oracle reports whether x ∈ int X, and if it is not the case, returns a separator – a nonzero vector e such that e, y − x ≤ 0 for all y ∈ X. The First Order oracle, given on input a point x known to belong to int X, returns the value F (x) and a subgradient F (x) of F at x. A generic deterministic algorithm capable to solve the problem in this “decision environment” can be described as follows: it generates subsequent search points xt ∈ Rn , t = 1, 2, .... At step τ , the algorithm calls the Separation oracle to check whether xτ ∈ int X, and if it is not the case (“nonproductive step”), gets from the oracle a vector eτ = 0 separating xτ and X. In the case of xτ ∈ X (“productive step”), the algorithm further calls the First Order oracle to get the value F (xτ ) and a subgradient eτ = F (xτ ) of F at xτ . The information on F, X accumulated in course of τ steps can be identified with the execution protocol Pτ = {(xt , et )}τ t=1 augmented by partitioning of the index set {1, ..., τ } into the sets Iτ , Jτ of productive, resp., nonproductive steps and by the values F (xt ) of F at the points xt , t ∈ Iτ . Based solely on this protocol, the method either generates a new search point xτ +1 , or terminates and builds an approximate solution xτ which should belong to int X and satisfy the accuracy requirement τ opt (x ) ≤ . The accuracy certificate as defined below for a convex minimization problem offers a natural way to determine whether it is time to terminate and if it is the case, how to generate an approximate solution. Specifically, a certificate for an execution protocol Pτ is a collection ξ = {ξt }τ of nonnegative weights such that t∈Iτ ξτ = 1. The B-residual of a t=1 certificate ξ is defined as cert (ξ) = maxx∈B τ ξt et , xt − x . A simple result (Proposition 2.1 t=1 below) states that whenever Pτ admits an accuracy certificate ξ, the best (i.e., with the smallest value of F ) among the points xt , t ∈ Iτ , let this point be denoted by xbst , same as the point xτ [ξ] = t∈Iτ ξt xt , belongs to int X and satisfies the relation opt (x) ≤ cert (ξ). Thus, whenever an algorithm is able to equip the current execution protocol Pτ with a certificate ξ such that τ cert (ξ) ≤ , it can terminate and output either xbst , or x [ξ], as a feasible approximate solution to the problem with guaranteed accuracy . Note that the fact that a candidate ξ = {ξt }τ is an accuracy certificate for Pτ with t=1 is easy to verify, provided that B is “simple” (Euclidean ball, simplex, box), so cert (ξ) ≤ that a certificate indeed can be viewed as an easy-to-verify proof of -optimality. Moreover, a certificate ξ with cert (ξ) ≤ allows to build a strictly feasible -optimal solution xτ [ξ] to the problem, and this solution is defined solely in terms of the certificate, without reference to the values of F – a property which becomes indispensable when passing from problems of convex minimization to other problems with convex structure. Note that while some “intelligent” – capable to guarantee a prescribed accuracy upon termination – oracle-based algorithms for nonsmooth convex minimization (e.g., subgradient descent and bundle methods) explicitly produce accuracy certificates, the most attractive, at least from 2
the theoretical standpoint, polynomial time oracle-based methods, like the Ellipsoid algorithm, do not produce certificates of this type. E.g., the Ellipsoid algorithm becomes intelligent, provided that the objective is Lipschitz continuous on X and that we know in advance not only an “upper bound” B on X, but also a “lower bound” on X – a positive r such that X is known to contain a Euclidean ball of radius r. Under these assumptions, one can equip the Ellipsoid algorithm with a theoretically valid stopping criterion based on certain relation between , the volume of the current ellipsoid and the on-line lower bound on the Lipschitz constant of F built by the algorithm. One of the goals of this paper is to demonstrate that the Ellipsoid algorithm, same as other known polynomial time oracle-based algorithms for convex optimization, can be augmented with computationally cheap techniques for building just defined accuracy certificates. Moreover, these certificates are fully compatible with theoretical efficiency estimates of the methods (meaning that the number of steps until a prescribed accuracy is certified is as stated in the theoretical efficiency estimate)4 . This illustrates the first way to utilize the accuracy certificates – they allow to equip convex optimization algorithms with theoretically valid stopping criteria or to improve the existing criteria. For example, in the case of Ellipsoid algorithm and other polynomial time cutting plane methods, the techniques for building certificates proposed in this paper require no additional a priori information on F and X, in contrast to the usual stopping criteria which require Lipschitz continuity of F and a priori knowledge of a “lower bound” on X. The advantages of certificates become much more significant when passing from convex minimization problems to other problems with convex structure, like variational inequalities with monotone operators or convex-concave saddle point problems. In the latter case, the known oracle-based polynomial time algorithms “as they are” just do not produce solutions. At their best, these methods reduce generating an -solution to solving an auxiliary piecewise linear convex minimization problem completely similar to problems (17), (31) below, with all the unpleasant drawbacks of such a reduction, see comments following (17). We shall see, however, that when equipped with certificates (which can be built in exactly the same way as in the case of convex minimization), the methods like the Ellipsoid algorithm allow to solve equally easily all problems with convex structure, not only the minimization ones. It should be added that when a certificate accompanies an execution protocol, it allows a recipient to verify efficiently the accuracy guarantees, as reported by an algorithm, without any information about how the algorithm works. Finally, certificates turn out to be crucial for “dual algorithms”, where a computational algorithm is applied to the problem dual to the problem of interest, and certificates are used to convert the resulting -optimal dual solution into a feasible -optimal solution to the problem of interest. “Dual algorithms” form the subject of a forthcoming companion to the current paper. The rest of this paper is organized as follows. In Section 2 we focus on accuracy certificates in oracle-based convex minimization; aside of their definition and basic results on their role in validating quality of approximate solutions, the issues we have already touched in the above discussion, we demonstrate that these certificates are, in a sense, necessary: whatever be an oracle-based algorithm for convex minimization capable to minimize, say, every Lipschitz continuous, with a given constant, convex function over any n-dimensional solid within a preTo the best of our knowledge, the only known from the literature result in this direction deals with an LP program solved by the Ellipsoid algorithm; specifically, it is shown in [4] that in this case the method can be augmented with a technique for producing feasible solutions to the dual problem with the duality gap certifying the standard polynomial time efficiency estimate of the method. The technique used in [4] seems to be quite different from the one we propose below.
4
3
scribed accuracy > 0, the associated execution protocol upon termination always admits an accuracy certificate ξ with cert (ξ) ≤ . In Section 3, we extend the notion of certificates and the associated basic results (including those on the necessity of certificates for validating accuracy of approximate solutions) from problems of convex minimization to a wider family of problems with convex structure, including solving variational inequalities with monotone operators and approximating saddle points of convex-concave functions or, more general, approximating solutions to convex Nash equilibrium problems. In Section 4 we demonstrate that all known polynomial time oracle-based algorithms for problems with convex structure – the family comprised of the Ellipsoid algorithm and few other polynomial time implementations of the Cutting Plane scheme, see Section 4.1 – can be augmented by a technique for building accuracy certificates. This technique is, first, computationally cheap (the computational effort to equip an execution protocol with a certificate is in all cases a small fraction of the effort required to build this protocol), and, second, is fully compatible with the known polynomial time efficiency estimates. The concluding Section 5 illustrates the potential of certificates on the simplest example of recovering a nearly-optimal solution to the dual of an LP problem when solving the primal problem by a cutting plane algorithm with certificates. This example can be seen as a simple precursor to the aforementioned forthcoming companion of the current paper.
2
2.1
Certificates for Convex Minimization problems
Convex Minimization problems
In what follows, a Convex Minimization problem (CMP) is of the form Opt = inf F (x),
x∈X
(1)
where • X ⊂ Rn is a solid (convex compact set with a nonempty interior) represented by Separation oracle – a black box which, given on input a point x ∈ Rn , reports whether or not x ∈ int X, and in the case of x ∈ int X, returns a separator – a vector e = 0 such that e, y − x ≤ 0 for all y ∈ X. • F : X → R ∪ {+∞} is a convex function with Dom (F ) = {x : F (x) < ∞} ⊇ int X; this function is represented by First Order oracle – a black box which, given on input a point x ∈ int X, returns the value F (x) and a subgradient F (x) of F at x. A point x ∈ int X is called a strictly feasible solution to (1). A proximity measure for such a point x to optimality is defined by
opt (x)
= F (x) − inf F (y) = F (x) − Opt.
y∈X opt (x)
(2)
A point x is called -optimal for (1), if
≤ , that is, if F (x) ≤ Opt + .
2.2
Certificates for convex minimization problems
An oracle-based algorithm for convex optimization is a computational method for solving problems (1) within a prescribed accuracy > 0 in the just outlined “computational environment”. As it was explained in Introduction, such a method, as applied to a particular problem (1) and 4
a given value of , produces execution protocols Pτ = {(xt , et )}τ , where τ = 1, 2, ... is the t=1 current number of steps, xt ∈ Rn are the search points generated so far, and et is either a nonzero vector, reported by the Separation oracle and separating xt and X (this is the case at a nonproductive steps t – those with xt ∈ int X), or is a subgradient F (xt ) of F at xt reported by the First Order oracle (this is the case at productive steps t – those with xt ∈ int X). The range 1 ≤ t ≤ τ of the values of t associated with an execution protocol Pτ is split into the sets Iτ , Jτ of indices of productive, resp., nonproductive steps, and the protocol is augmented by the values F (xt ) of the objective at productive search points xt – those with t ∈ Iτ . Execution protocol Pτ accumulates all information on the problem (1) collected by the algorithm in course of the first τ steps and, in particular, determines whether or not the execution terminates at step τ . Before termination, Pτ determines the next search point xτ +1 , upon termination – the resulting approximate solution xτ , which, in our framework, should be a strictly feasible and -optimal solution to (1). The question we are interested in is how to certify – in particular, to convince an external observer – that xτ is indeed strictly feasible and -optimal. The inclusion xτ ∈ int X can be immediately certified by the Separation oracle. We are about to demonstrate that a natural way to certify -optimality of xτ is offered by certificates defined as follows. Let Pτ be an execution protocol. A certificate for this protocol is, by definition, a collection ξ = {ξt }τ of weights such that t=1 (a) ξt ≥ 0 for each t = 1, . . . , τ (b) t∈Iτ ξt = 1 (3)
Note that certificates exist only for protocols with nonempty sets Iτ . Given a solid B known to contain X, an execution protocol Pτ and a certificate ξ for this protocol, we can define the quantity
τ cert (ξ|Pτ , B)
≡ max
x∈B t=1
ξt et , xt − x
(4)
which we call the residual of the certificate ξ on B, and the approximate solution induced by ξ xτ [ξ] ≡
t∈Iτ
ξt xt
(5)
which clearly is a strictly feasible solution to (1). Given, in addition to B, Pτ and ξ, the values of F at the points xt , t ∈ Iτ , we can also define the quantity F∗ (ξ|Pτ , B) ≡ min
x∈B t∈Iτ
ξt [F (xt ) + et , x − xt ] +
cert (ξ|Pτ , B).
t∈Jτ
ξt et , x − xt
=
t∈Iτ
ξt F (xt ) −
(6)
Note that the quantities defined by (4) and (6) are easy-to-compute, provided that B is simple, in the sense that it is easy to minimize a linear form over B. The role of the just defined quantities in certifying accuracy of approximate solutions to (1) stems from the following simple observation: Proposition 2.1 Let Pτ be a τ -point execution protocol associated with the CMP (1), ξ be a certificate for Pτ and B ⊃ X be a solid.
5
(i) One has F∗ (ξ|Pτ , B) ≤ Opt; consequently, for every feasible solution x of the given CMP it holds: (7) opt (x) ≤ F (x) − F∗ (ξ|Pτ , B). (ii) Let xτ be the best – with the smallest value of F – of the search points xt generated at bst the productive steps t ∈ Iτ . Then both xτ and xτ = xτ [ξ] are strictly feasible solutions of the bst given CMP, with
τ opt (xbst )
≤
cert (ξ|Pτ , B)
and
τ opt (x )
≤
cert (ξ|Pτ , B).
(8)
Proof. (i): Let x ∈ X. Then, due to the origin of vectors et and since F is convex, we have et , x − xt ≤ 0 for t ∈ Jτ and F (xt ) + et , x − xt ≤ F (x) for t ∈ Iτ . Taking weighted sum of these inequalities with the weights determined by a certificate ξ, we get (invoking (3)) ξt [F (xt ) + et , x − xt ] +
t∈Iτ t∈Jτ
ξt et , x − xt ≤ F (x),
whence, taking the infimum of both sides over x ∈ X ∩ Dom F ,
min
x∈X t∈Iτ
ξt [F (xt ) + et , x − xt ] +
t∈Jτ
ξt et , x − xt ≤ Opt.
It remains to note that the left hand side in this inequality is ≥ F∗ (ξ|Pτ , B) due to X ⊆ B. So, (i) has been proved. (ii): Since the points xt with t ∈ It belong to int X and X is convex, both xτ (which is one bst of these points) and xτ (which, by (3), is a convex combination of these points) belong to int X and thus are strictly feasible solutions to (1). Next, from (6), F∗ (ξ|Pτ , B) =
t∈Iτ
≥ F (xτ ) (∗)
ξt F (xt ) −
cert (ξ|Pτ , B)
≥ min F (xt ) −
t∈Iτ =F (xτ
cert (ξ|Pτ , B),
(9)
bst
)
(the inequality (∗) given by (3) and the convexity of F ). So, (8) follows from (7). Proposition 2.1 says that in order to demonstrate that a strictly feasible solution x to (1) produced by an algorithm in course of τ steps solves (1) within a given accuracy , it suffices to point out a certificate ξ for the associated execution protocol Pτ such that F (x) − F∗ (ξ|Pτ , B) ≤ , (10)
B being a solid containing X. Note that given, along with x and Pτ (the entities which the algorithm in question produces in any case) a certificate ξ, (10) is easy to verify (recall that F∗ (·|Pτ , B) is easy to compute). So, a certificate ξ satisfying (10) indeed can be considered as a simple proof of the claim “x is an -optimal solution to (1)”. We are about to demonstrate that this sufficient certificate-based condition for demonstrating -optimality of a feasible solution is necessary as well.
6
2.3
Necessity of accuracy certificates for convex minimization problems
Consider the situation as follows. We intend to solve the CMP (1), and our a priori information on the problem is that its feasible domain X is contained in a given solid B ⊂ Rn and that the objective F belongs to a given “wide enough” family F of convex functions, e.g., is convex and continuous on X, or convex and piecewise linear, or convex and Lipschitz continuous, with a given constant L, on X. All remaining information on the problem can be obtained solely from a Separation oracle representing X and a First Order oracle representing F . Now consider a solution algorithm which, as applied to a problem (1) compatible with our a priori information, in a number τ < ∞ of steps terminates and returns a strictly feasible solution x to the problem along with a valid upper bound ≥ 0 on the corresponding accuracy opt (x); here both τ and can depend on the particular problem the algorithms is applied to. Adding, if necessary, one more step, we can assume w.l.o.g. that the approximate solution returned by the algorithm upon termination always is one of the search points generated by the algorithm in course of the solution process. We are about to demonstrate that (!) If our hypothetic algorithm, as applied to a (whatever) CMP that is compatible with our a priori information, terminates after finitely many steps τ and returns a strictly feasible solution x along with a valid upper bound on opt (x), then the associated execution protocol Pτ admits a certificate ξ ∗ that satisfies (10). The reason for (!) stems from the following observation. Assume that the algorithm in question as applied to a problem minx∈X F (x) compatible with our a priori information terminates in τ steps, generating execution protocol Pτ = {(xt , et )}τ , a strictly feasible solution x ∈ {xt : t ∈ t=1 Iτ } (note that this implies Iτ = ∅) and a valid upper bound on opt (x), and let us set X = {x ∈ B : et , x − xt ≤ 0, t ∈ Jτ }, F (x) = max[F (xt ) + F (xt ), x − xt ].
t∈Iτ
(11)
We further observe that if F is the family of convex and continuous, or convex piecewise linear, or convex Lipschitz continuous, with a given constant L, functions on X, then F ∈F ⇒F ∈F (12)
Observe that both our a priori information on X and F and the information collected in course of solving the problem do not contradict the assumption that X is exactly X, and F is exactly F 5) . Since the accuracy bound is guaranteed, x should therefore be an -solution to the CMP Opt = min F (x),
x∈X
(13)
that is, [F (x) =] F (x) ≤ Opt + . The validity of (!) is now given by the following simple fact: Proposition 2.2 Assume that (12) holds. The quantity Opt equals the maximum, over all certificates ξ for Pτ , of F∗ (ξ|Pτ , B), so that there exists a certificate ξ ∗ for Pτ such that F∗ (ξ ∗ |Pτ , B) = Opt = min F (x);
x∈X
(14)
(15)
in particular, a strictly feasible solution x ∈ {xt : t ∈ Iτ } satisfies (14) if and only if ξ ∗ and x satisfy (10) (the sufficient condition for -optimality of x).
The latter is true if F is any family of functions that guarantees (12)— this is what “wide enough family” means in our context.
5)
7
Proof. Let ξ be a certificate for Pτ . As F (xt ) = F (xt ) and F (xt ) = F (xt ) for each t ∈ Iτ , we have that F∗ (ξ|Pτ , B) = F∗ (ξ|Pτ , B). Now, as we have observed that Pτ can be considered as an execution protocol associated with the feasible set X and the objective F , Proposition 2.1.(i) assures that F∗ (ξ|Pτ , B) = F∗ (ξ|Pτ , B) ≤ Opt. To prove (15) all we need to verify is that the latter inequality becomes equality for a properly chosen certificate ξ. We next observe that Opt = min s : s ≥ F (xt ) + F (xt ), x − xt , t ∈ Iτ , et , x − xt ≤ 0, t ∈ Jτ , x ∈ B . The right hand side optimization problem is clearly convex, below bounded and satisfies the Slater condition6 . Invoking the standard Lagrange Duality Theorem (see. e.g., [2, Chapter 5]), we conclude that the Lagrange dual of this problem
max
ξ≥0
x∈B,s
inf s +
t∈Iτ
ξt [−s + F (xt ) + F (xt ), x − xt ] +
t∈Jτ
ξt et , x − xt
is solvable with the optimal value Opt. Denoting by ξ ∗ the optimal solution to this dual, we get ξ ∗ ≥ 0 and
∗ ξt ) + t∈Iτ t∈Iτ ∗ ξt [F (xt ) + F (xt ), x − xt ] + t∈Jτ ∗ ξt et , x − xt .
Opt = inf s(1 −
x∈B,s
(16)
∗ In particular, inf in the right hand side is > −∞, which is possible only when t∈Iτ ξt = 1, ∗ ≥ 0 to imply that ξ ∗ is a certificate for P . It remains to note that with which combines with ξ τ ∗ ∗ t∈Iτ ξt = 1, (16) reads Opt = F∗ (ξ |Pτ , B). The remaining conclusions of the proposition are now immediate.
We have seen that a Convex Optimization algorithm which is “enough intelligent” to guarantee a given accuracy when minimizing convex and, say, Lipschitz continuous with a given constant, functions in the decision environment specified in Section 2.1 definitely accumulates enough data to build a certificate which justifies the accuracy guarantees. We further observe that given an execution protocol Pτ with Iτ = ∅, Proposition 2.2 shows that we can find such a certificate, if it exists. This can be done by building the best associated certificate ξ, the one with as large F∗ (ξ|Pτ , B) as possible (the larger is the lower bound F∗ (ξ|Pτ , B) on the optimal value, the better is the accuracy of a feasible solution as certified by ξ). So, to find the desired certificate, we have to solve the auxiliary convex-concave saddle point problem
max min
ξ∈Ω x∈B t∈Iτ
ξt [F (xt ) + F (xt ), x − xt ] +
t∈Jτ
ξt et , x − xt ,
(17)
with Ω ≡ {ξ ∈ Rτ : t∈Iτ ξt = 1}. Moreover, as the bracketed expression in (17) is bilinear in + ξ and x, specifying B as a simplex, the saddle point problem (17) can be reduced to a linear program. But, in either case, this approach has the following shortcomings: 1. The number of variables in (17) is the number of steps carried out so far by the algorithm, and therefore augmenting a solution method with the computation of a certificate by solving (17) will increase the computational complexity.
That is, there exists a feasible solution (¯, x) where all the inequality constraints are satisfied as strict s ¯ inequalities.
6)
8
2. Ω is unbounded. 3. One has to account for the tolerance in the solution of (17) in getting the tolerance of the certificate. Recall that by Proposition 2.1, all we need in order to demonstrate that a particular convex minimization algorithm as applied to a particular program (1) admits certain efficiency estimate (that is, certain upper bound (τ ) on the accuracy of approximate solutions generated after τ steps) is to point out certificates ξ τ for the associated protocols Pτ , τ = 1, 2, ..., which certify this efficiency estimate via (10). The situation becomes especially nice when we can demonstrate that cert (ξ τ ) ≤ (τ ); in this case, the certificates allow for both building strictly feasible approximate solutions xτ and certifying their (τ )-optimality. Now, some of “intelligent” Convex Optimization algorithms – those capable to guarantee a prescribed accuracy upon termination (like Subgradient Descent and bundle methods) explicitly produce certificates justifying theoretical efficiency estimates of the algorithms. However, theoretically most powerful polynomial-time general-purpose convex optimization algorithms, like the Ellipsoid method, while being intelligent in the aforementioned sense, “as they are” do not produce explicitly certificates justifying the theoretical efficiency estimates. In Section 4 we will show that one can augment these algorithms with computable online and at a low cost certificates compatible with the efficiency estimates of the algorithms, and that this can be achieved for a wider family of problems than convex minimization. Our immediate goal is to describe this “wider family of problems” and to extend on it the notion of a certificate.
3
3.1
Certificates for Variational Inequalities corresponding to Monotone Operators and Convex Nash Equilibrium Problems
Monotone and Nash operators
Monotone operators and variational inequalities. Let Φ(x) : Dom (Φ) → Rn be a monotone operator, meaning that Dom (Φ) ⊂ Rn is a convex set and Φ(x ) − Φ(x ), x − x ≥ 0 ∀x , x ∈ Dom (Φ).
Let X ⊂ Rn be a solid, and Φ be a monotone operator with Dom (Φ) ⊇ int X. The pair (X, Φ) defines a variational inequality problem (VIP) find x∗ ∈ X : Φ(x), x − x∗ ≥ 0 ∀x ∈ Dom (Φ) ∩ X. Note that in the literature the just defined x∗ are called weak solutions to the VIP in question, as opposed to strong solutions x∗ ∈ X ∩ Dom (Φ) : Φ(x∗ ), x − x∗ ≥ 0 ∀x ∈ X. It is immediately seen that when Φ is monotone, then strong solutions are weak ones; under mild regularity assumptions (e.g., continuity of Φ), the inverse is true as well (see [15] or Proposition 3.1 in the survey [12]). It is well known that under our assumptions (X is a solid, Dom (Φ) ⊇ int X, Φ is monotone) weak solutions exist and form a closed convex subset X∗ ⊆ X.
To make the paper self contained, here is a demonstration of the latter claim. By definition, X∗ is the solution set of a system of nonstrict linear inequalities and as such is convex and closed. All we need is to prove that X∗ = ∅. Assume that there exists no x ∈ X satisfying the infinite system of linear inequalities “ Φ(y), y − x ≥ 0 for all y ∈ Dom (Φ) ∩ X” and we will establish a contradiction.
9
Since X is compact and the solution set of every single inequality from this system is closed, there exists a finite subsystem of the system that has no solution in X; let this subsystem be Φ(yi ), yi − x ≥ 0 for i = 1, ..., q. Consider the function f on X with f (x) = mini Φ(yi ), y− x . As f is negative everywhere on X and (clearly) continuous, its maximum over X is negative. With Λ as the standard simplex {λ ∈ Rq : λ ≥ 0,
i
λi = 1}, it follows that maxx∈X minλ∈Λ [
i
λi Φ(yi ), yi − x ] < 0. As the bracketed term λi yi . Then
is bilinear and both X and Λ are compact, one can interchange “min” and “max”. In particular, we conclude that for some λ ∈ Λ one has minx∈X x ∈ Dom (Φ) ∩ X and, by the monotonicity of Φ, 0 > the desired contradiction.
i
λi Φ(yi ), yi − x < 0. Let x = λ Φ(yi ), yi − x ≥ i i
i
λ Φ(x), yi − x = 0, i i
One of our coming goals is to approximate a weak solution of a VIP defined by a pair (X, Φ) where X is a solid and Φ is a monotone operator. We introduce a convenient relevant proximity measure vi (x|X, Φ) given for points x ∈ X by
vi (x|X, Φ)
=
y∈Dom (Φ)∩X
sup
Φ(y), x − y ;
(18)
when X and Φ are evident from the context we shall use the abbreviated notation vi (x). Observe that vi (x) is a convex function on X which is finite on Dom (Φ) ∩ X ⊃ int X due to Φ(y), x − y ≤ Φ(x), x − y for all x, y ∈ Dom (Φ); besides this, vi (·) clearly is nonnegative on Dom (Φ) ∩ X ⊃ int X. Being convex on X and finite and nonnegative on int X, vi (·) is nonnegative everywhere on X. This nonnegative function vanishes exactly on X∗ , so that the quantity vi (x) can be treated as the inaccuracy of a point x ∈ X when viewed as an approximate solution to the VIP in question. Convex Nash equilibrium problems and Nash operators. Let Xi ⊂ Rni , i = 1, ..., m, be solids, and let X = X1 × X2 × ... × Xm ⊂ E = Rn1 × ... × Rnm = Rn1 +...+nm . A point x ∈ E is an ordered tuple (x1 , ..., xm ) with xi ∈ Rni ; for such a point and for i ∈ {1, ..., m}, we denote by xi the projection of x onto the orthogonal complement of Rni in E, and write x = (xi , xi ). The Nash equilibrium problem on X is specified by a collection of m real-valued functions Fi (x) = Fi (xi , xi ) with common domain D = D1 × ... × Dm , where for every i = 1, . . . , m, int Xi ⊆ Di ⊆ Xi . A Nash equilibrium associated with these data is a point x∗ ∈ D such that for every i = 1, . . . , m the function Fi ([x∗ ]i , xi ) of xi attains its minimum over xi ∈ Di at [x∗ ]i , and the Nash equilibrium problem is to find/to approximate such an equilibrium. The standard interpretation of Nash equilibrium is as follows: there are m players, i-th choosing a point xi ∈ Di , and incurring cost Fi (x), where x = (x1 , ..., xm ) is comprised of choices of all m players. Every player is interested to minimize his cost, and Nash equilibria are exactly the tuples x = (x1 , ..., xm ) ∈ D of choices of the players where every one of the players has no incentive to deviate from his choice xi . There is a natural way to quantify the inaccuracy of a point x ∈ D as an approximate Nash equilibrium; the corresponding proximity measure is
m N (x) = i=1
[Fi (xi , xi ) − inf Fi (xi , yi )].
yi ∈Di
(19)
This is nothing but the sum, over the players, of the maximal gain a player i can get by deviating from his choice xi when the remaining players stick to their choices. 10
We intend to consider convex Nash equilibrium problems, meaning that D is convex and for every i = 1, . . . , m the function Fi (x) = Fi (xi , xi ) is convex in xi ∈ Di , is concave in xi ∈ D1 × ... × Di−1 × Di+1 × ... × Dm and, in addition, the function F (x) = m Fi (x) is convex i=1 on D. Given a convex Nash equilibrium problem, we associate with it the Nash operator Φ(x) = (Φ1 (x), Φ2 (x), ..., Φm (x)) : int X → E, where for every x ∈ int X and i = 1, . . . , m, Φi (x) is a subgradient of the convex function Fi (xi , ·) at the point xi . It is well known that the Nash operator of a convex Nash equilibrium problem is monotone.
To make the paper self-contained, here is the demonstration. Let x, y ∈ int X, and let us prove that Φ(x) − Φ(y), x − y ≥ 0, or, which is the same, that Φ(¯ + ∆) − Φ(¯ − ∆), ∆ ≥ 0, where x x 1 x = 2 (x + y) and ∆ = 1 (x − y). We have ¯ 2 Φ(¯ + ∆) − Φ(¯ − ∆), ∆ = x x ≥
i i i
Φi (¯ + ∆), ∆i x
≥ F (¯+∆)−Fi (¯i +∆i ,¯i ) x x x (a) i
+
i
Φi (¯ − ∆), −∆i x
≥ F (¯−∆)−Fi (¯i −∆i ,¯i ) x x x (a) i
i
Fi (¯ + ∆) − Fi (¯ + ∆ , xi ) + Fi (¯ − ∆) − Fi (¯ − ∆i , xi ) x x ¯ x x ¯
i
= F (¯ + ∆) + F (¯ − ∆) − x x ≥ F (¯ + ∆) + F (¯ − ∆) − x x
Fi (¯i + ∆i , xi ) + Fi (¯i − ∆i , xi ) x ¯ x ¯
≤ 2Fi (¯) x (b)
2F (¯) (c) 0 x ≥
where (a), (b) are due to the fact that Fi (ui , ui ) are convex in ui and concave in ui , and (c) is due to the convexity of F =
i
Fi .
It is easily seen that when the functions Fi are continuous on X, the weak solutions to the VIP associated with the Nash operator and X are exactly the Nash equilibria (see the Appendix). However, the proximity measure N (x) of a point x ∈ X for the Nash equilibrium problem is generally different from its proximity measure vi (x) with respect to the corresponding VIP. We note that Nash equilibrium solutions are known to exist under weaker conditions that do not require concavity of the functions Fi (x) = Fi (xi , xi ) in xi and the convexity of F (x) in x. But, the weaker conditions do not assure convexity of the set of the Nash eqilibiria nor the availability of efficient computation of (approximate) solutions. We make the following observations about instances of convex Nash equilibrium problems: A. In the case of m = 1 the convex Nash equilibrium problem is just to minimize a convex function F (x) = F1 (x) over Dom (F ), where int X ⊆ Dom F ⊆ X. Thus, CMP’s are instances of convex Nash equilibrium problem. Further, the Nash operator Φ = Φ1 is (a section of) the subgradient field of F over int X. The (weak) solutions to the VIP defined by (X, Φ) are just the minimizers of the lower semicontinuous extension of F from int X onto X. Further, for x ∈ Dom (F ) one has
N (x)
= F (x) −
y∈Dom (F )
inf
F (y) =
opt (x),
where opt (x) is the accuracy measure defined in Subsection 2.1 for the corresponding CMP. Further, if we view a convex Nash equilibrium problem with m = 1 as a VIP with Dom Φ = int X, the convexity of F assures that F (x) − F (y) ≥ Φ(y), x − y for all x ∈ Dom F and y ∈ int X and therefore
N (x)
=
opt (x)
= F (x) −
y∈Dom (F )
inf
F (y) ≥ sup Φ(y), x − y = y∈int X
vi (x).
(20)
11
Note that the “gap” between vi (x) and N (x) can be large, as is seen in the case of x X = [0, D] ⊂ R, F (x) = 0 min[L, D−s ]ds, where ≤ LD. Indeed, in this case vi (D) = max0 0 is a given in advance required accuracy; here again both τ and xτ are specified by the algorithm itself according to the information accumulated so far. Adding, if necessary, one extra step, we can assume w.l.o.g. that xτ ∈ {x1 , ..., xτ }, and, in particular, that upon termination, the set Iτ = {t ≤ τ : xt ∈ int X} is nonempty. Assume that an oracle-based algorithm is indeed capable to solve within a given accuracy all VIP’s compatible with our a priori information that X ⊆ B and Φ : int X → Rn is monotone and bounded by a given constant L, and let Pτ = {(xt , et )}τ be the execution protocol upon the t=1 termination of the algorithm as applied to the VIP defined by (X, Φ), and xτ ∈ {x1 , ..., xτ }∩int X be the corresponding approximate solution. The question we are interested in is whether Pτ admits a certificate ξ with “small” cert (ξ|Pτ , B), or, equivalently, to which extent the existence of “good” – with small cert – certificate (which we have seen suffices to guarantee accuracy of a properly defined approximate solution) is also necessary to guarantee good accuracy. The following is an answer to this question: Proposition 3.3 Consider a oracle-based algorithm capable to solve within vi -accuracy > 0 any VIP corresponding to a pair (X, Φ) where X is a solid included in a given solid B and Φ is a monotone and bounded by L operator with Dom (Φ) = int X. Let Pτ = {(xt , et )}τ be the t=1 execution protocol upon termination of the algorithm as applied to the VIP defined by (X, Φ) 15
from the just described family, and xτ ∈ {xt : t ∈ Iτ } be the resulting approximate solution of vi -accuracy . Then there exists a certificate ξ for the protocol such that √ LD , (28) cert (ξ|Pτ , Φ) ≤ where D is the Euclidean diameter of B. Remark 3.2 Proposition 3.2 states that when solving the VIP defined by (X, Φ), a certificate ξ with cert (ξ|Pτ , B) ≤ allows to build an approximate solution xτ ∈ int X with vi (xτ |X, Φ) ≤ . Proposition 3.3, in turn, says that whenever execution protocol Pτ allows to build an approximate √ solution xτ with vi (xτ |X, Φ) ≤ , the protocol admits certificate ξ with cert (ξ|Pτ , B) ≤ LD . Observing that vi (x|X, Φ) = supy∈int X Φ(y), x − y ≤ LD whenever x ∈ int X, the only inter√ , so there is a “gap” between esting case is the one when LD, and in this case LD the sufficiency and the necessity results stated by Propositions 3.2 and 3.3. We do not know whether this gap comes from the essence of the matter or reflects weakness of Proposition 3.3 (see the forthcoming Remark 3.4). Proof of Proposition 3.3. 10 . We start with the following, important by its own right, result on the existence of monotone extensions of bounded monotone operators. Lemma 3.1 Let Y = {x1 , ..., xN } ⊂ Rn , Φ : Y → Rn be monotone (i.e., Φ(xi ) − Φ(xj ), xi − xj ≥ 0 for all i, j) and bounded by L (i.e., Φ(xi ) 2 ≤ L for all i), and let x ∈ Rn \Y . Then Φ can be extended to a monotone and bounded by L mapping of Y ∪ {x}. Proof. For j = 1, . . . , N , let Fj = Φ(xj ), let Ω = {ξ ∈ RN : ξ ≥ 0, N ξj = 1}, and for ξ ∈ Ω, j=1 let Fξ = N ξj Fj and xξ = N ξj xj . For each j, Fj 2 ≤ L and for each ξ ∈ Ω, Fξ 2 ≤ L j=1 j=1 and, from the monotonicity of Φ on Y , 0≤
j s j ξj
ξj ξs Fj − Fs , xj − xs =
j
ξj Fj , xj +
s
ξs Fs , xs − 2 Fξ , xξ
assuring that
Fj , xj ≥ Fξ , xξ ; consequently, for each F ∈ RN
N
F, x − F, xξ − Fξ , x +
j=1
ξj Fj , xj ≥ F − Fξ , x − xξ .
(29)
The assertion of our lemma is equivalent to the statement that the quantity A ≡ max F 2 ≤L minj F − Fj , x − xj is nonnegative; given this fact, the required monotone extension of F from Y onto Y ∪ {x} is given by Φ(x) = F∗ , where F∗ is the maximizer in the maximization yielding A. We have A = max min F − Fj , x − xj
F F
2 ≤L
j
= max min[
2 ≤L
N
ξ∈Ω j=1 N
ξj F − Fj , x − xj ] ξj F − Fj , x − xj ]
“max” and “min” can be exchanged as the bracketed term is bilinear in ξ and F and {F : F 2 ≤ L}, Ω are convex compact sets
= min max [
ξ∈Ω F
2 ≤L
j=1
= min max [ F, x − F, xξ − Fξ , x
ξ∈Ω F
2 ≤L 2 ≤L
+
N j=1
ξj Fj , xj ]
≥ min max F − Fξ , x − xξ
ξ∈Ω F ξ∈Ω
by (29) as Fξ
2
≥ min Fξ − Fξ , x − xξ = 0. 16
≤ L for each ξ ∈ Ω
Lemma 3.2 Let L be a positive real, X ⊂ Rn be a solid, Y ⊂ X be a nonempty set, and Φ : Y → Rn be monotone and bounded by L. Then there exists a bounded by L monotone extension of Φ from Y onto the entire X. Proof. Given a monotone and bounded by L mapping Φ : Y → Rn with Y ⊂ X, let us look at the set Y all pairs (Y , Φ ) with Y ⊂ Y ⊂ X and Φ being a monotone and bounded by L mapping on Y which coincides with Φ on Y . Introducing partial order on these pairs: [(Y , Φ ) (Y , Φ )] ⇔ [Y ⊃ Y & Φ |Y = Φ ]
and applying Zorn lemma, there exists a -maximal pair (Y , Φ). All we need to prove is that Y = X. Indeed, assume otherwise and let x ∈ X\Y . For every point y ∈ Y , the set Fy = {F : F 2 ≤ L, F − Φ(y), x−y ≥ 0} is compact. Lemma 3.1 shows that each finite family of sets of this type has a nonempty intersection, whence by standard compactness arguments the intersection of all these sets is nonempty. Let F be a point in the intersection. Extending Φ from Y to Y + = Y ∪ {x} by setting Φ(x) = F , we get a pair in Y which is (Y , Φ), a contradiction which proves that Y = X. 20 . In the situation described in Proposition 3.3, let us set X = {x ∈ B : et , x − xt ≤ 0, t ∈ Jτ }, where, as always, Jτ = {t ≤ τ : xt ∈ int X}. As in the proof of Proposition 2.2, setting δ = max min Φ(xt ), xt − x ,
x∈X t∈Iτ
(30)
there exists a certificate ξ for Pτ such that
cert (ξ|Pτ , B)
= δ;
ξt et , x − xt .
in fact, this certificate is given by a solution to the problem
ξ≥0,
max
t∈Iτ
ξt =1 x∈B
min
t∈Iτ
ξt et , x − xt +
t∈Jτ
(31)
The conclusion in Proposition 3.3 is trivially true when δ ≤ 0, so that let us assume that δ > 0. Let Y = {xt : t ∈ Iτ }. The function x → miny∈Y Φ(y), y−x is continuous and thus attains its maximum over X, say at x∗ . Evidently, x∗ ∈ Y (since otherwise δ = miny∈Y Φ(y), y − x∗ ≤ 0), and since xτ ∈ {x1 , ..., xτ } ∩ int X, we have δ ≤ Φ(xτ ), xτ − x∗ ≤ Φ(xτ ) In particular, D ≥ xτ − x∗ Next, set F ≡δ and note that F
2 2 2
xτ − x∗ δ . L
2
≤ L xτ − x∗ 2 .
≥
(32)
xτ − x∗ D xτ − x∗
2
≤
δ D
≤ L and for each y ∈ Y ,
2
F, y − x∗ ≤ F
y − x∗
2
≤
δ D = δ ≤ Φ(y), y − x∗ D
17
(the last inequality follows from the definition of δ). We see that extending Φ from the set Y onto the set Y ∪ {x∗ } by mapping x∗ to F preserves monotonicity and boundedness by L. Lemma 3.2 assures that we can further extend the resulting mapping from Y ∪ {x∗ } onto the entire X to get a monotone and bounded by L operator Φ on the entire X. As Φ(x∗ ), xτ − x∗ = F, xτ − x∗ = δ xτ − x∗ 2 δ xτ − x∗ 2 = D xτ − x∗ D
2
2
≥
δ2 , LD
(33)
δ (we have used (32)), we have that vi (xτ |X, Φ) ≥ LD . Now, the a priori information on (X, Φ) and the information accumulated by the algorithm in question upon termination do not contradict the√ assumption that X = X and Φ = Φ, whence δ2 ≤ vi (xτ |X, Φ) ≤ . Thus, cert (ξ|Pτ , Φ) = δ ≤ LD . LD
Remark 3.3 1. The proof of Proposition 3.3 shows that a sufficient condition for a certificate ξ to satisfy (28) is for it to solve the concave maximization problem (31). It follows that given an execution protocol, a certificate satisfying (28) can actually be computed. 2. Proposition 3.3 shows that if the search points of an execution protocol include an approximate solution to the VIP of guaranteed VIP-accuracy , the protocol admits a certificate ξ that satisfies (28). Of course, Proposition 3.2 assures that xτ [ξ] is then an approximate solution with √ τ LD . vi (x [ξ]) ≤ cert (ξ|Pτ , E) ≤ Remark 3.4 Lemmas 3.1 and 3.2 imply that a protocol is guaranteed to certify -accuracy for xτ if and only if the following first order sentence is true ∀y, F {[(y ∈ B) (||F ||2 ≤ L) t∈Jt ( et , y − xt ≤ 0) ⇒ [ F, y − xτ ≥ − ]},
t∈It
F − Ft , y − xt ≥ 0]
(34)
a sentence in which all the quantifiers are universal. Proposition 3.2 shows that (34) is implied by the sentence ∃ξ{ cert (ξ|Pτ , B) ≤ } and Proposition 3.3 shows that (34) implies the sentence √ ∃ξ{ cert (ξ|Pτ , B) ≤ LD }. The question about the availability of a certificate that is both necessary and sufficient for (34) reduces to the question of whether (34) has a logically equivalent sentence of the form ∃ξ{φ(ξ, . . .)} (where φ is a formula that is easy to verify). Such an equivalent statement always exits for sentences that are linear in their quantified variables (see [5]); but, this is not the case for the quantified variables of (34). We pose the following question as an open problem: determine if there exists an “existential” sentence that is equivalent to (34) and identify one if the answer is positive.
4
Ellipsoid Algorithm with Certificates
As it was already mentioned, the Ellipsoid algorithm7 is “an intelligent” method which, however, does not produce accuracy certificates explicitly. Our local goal is to equip the method with a “computationally cheap” mechanism for producing certificates with residuals converging to 0, as τ → ∞, at the rate justifying the usual polynomial time theoretical efficiency estimate of the Ellipsoid algorithm. In fact, the technique to be described is applicable to a general Cutting Plane scheme for solving problems with convex structure, and it makes sense to present the technique in question in this general context, thus making the approach more transparent and extending its scope.
This method was developed in [25] and independently and slightly later – in [22]; for its role in the theory of Convex Optimization see, e.g., [13, 11, 17, 1] and references therein.
7
18
4.1
Generic Cutting Plane algorithm
A generic Cutting Plane algorithm works with a vector field Φ : Dom Φ → Rn defined on a convex set Dom Φ ⊂ Rn and with a solid X, int X ⊂ Dom Φ. The algorithm, as applied to (X, Φ), builds a sequence of search points xt ∈ Rn along with a sequence of localizers Qt – solids such that xt ∈ int Qt , t = 1, 2, .... The algorithm is as follows: Algorithm 4.1 Initialization: Choose a solid Q1 ⊃ X and a point x1 ∈ int Q1 . Step t, t = 1, 2, ...: Given xt , Qt , 1. Call Separation oracle, xt being the input. If the oracle reports that xt ∈ int X (productive step), go to 2. Otherwise (nonproductive step) the oracle reports a separator et = 0 such that et , x − xt ≤ 0 for all x ∈ X. Go to 3. 2. Call Φ-oracle to compute et = Φ(xt ). If et = 0, terminate, otherwise go to 3. 3. Set Qt+1 = {x ∈ Qt : et , x − xt ≤ 0}. Choose, as Qt+1 , a solid which contains the solid Qt+1 . Choose xt+1 ∈ int Qt+1 and loop to step t + 1. For a solid B ⊂ Rn , let ρ(B) be the radius of Euclidean ball in Rn with the same n-dimensional volume as the one of B. A Cutting Plane algorithm is called converging on (X, Φ), if for the associated localizers Qt one has ρ(Qt ) → 0, t → ∞. 4.1.1 An implementation: the Ellipsoid algorithm
The Ellipsoid algorithm is, historically, the first “polynomial time” implementation of the Cutting Plane scheme. In this algorithm, 1. The initial localizer Q1 is the centered at the origin Euclidean ball B of radius R known to contain X; 2. All localizers Qt are ellipsoids represented as the images of the unit Euclidean ball under affine mappings: Qt = {x = Bt u + xt : uT u ≤ 1} so that the search points xt are the centers of the ellipsoids; 3. Qt+1 is the ellipsoid of the smallest volume containing the half-ellipsoid Qt+1 . The corresponding updating (Bt , xt ) → (Bt+1 , xt+1 ) is given by
T qt = Bt et , pt = √
T Bt qt T B BT q qt t t t
[Bt : n × n nonsingular]
, (35)
Bt+1 = α(n)Bt + (γ(n) − α(n))Bt pt pT , t 1 xt+1 = xt − n+1 Bt pt where n > 1 is the dimension of x and n n , γ(n) = α(n) = √ . 2−1 n+1 n 19
(36)
The ellipsoid method is converging, with ρ(Qt+1 ) = κ(n)ρ(Qt ), κ(n) = α
n−1 n
(n)γ n (n) =
1
n (n + 1)
n+1 2n
(n − 1)
n−1 2n
≤ exp{−
1 }. 2n(n − 1) (37)
Other implementations. Aside of the Ellipsoid algorithm, there are several other implementations
of the Cutting Plane scheme with “rapid convergence”: ρ(Qt ) ≤ p(n)ρ(Q1 ) exp{−t/p(n)} with a fixed polynomial p(n) > 0. The list includes • the Center of Gravity method [14, 19]: Q1 = X, xt is the center of gravity of Qt , Qt+1 = Qt+1 ; here p(n) = const. This method, however, is of academic interest only, since it requires finding centers of gravity of general-type polytopes, which is NP-hard; • the Inscribed Ellipsoid algorithm [23] where Q1 is a box containing X, xt is the center of the ellipsoid of the (nearly) largest volume contained in Qt and Qt+1 = Qt+1 ; here again p(n) = const; • Circumscribed Simplex algorithm [3, 24], where Qt are simplexes, xt are the barycenters of Qt ; here p(n) = O(1)n3 . The Ellipsoid and the Circumscribed Simplex algorithms are examples of the stationary Cutting Plane scheme – one where Qt = xt + Bt C for a fixed solid C, 0 ∈ int C. In order for such a scheme to be converging, C should possess specific and rare property as follows: for every e = 0, the set Ce = {x ∈ C : e, x ≤ 0} can be covered by an affine image of C under an efficiently computable affine mapping which reduces volumes by factor exp{−1/p(n)} for an appropriate polynomial p(n) > 0. For a long time, the only known solids with this property were ellipsoids and simplexes. Recently it was discovered [7] that the required property is shared by all compact cross-sections of symmetric cones by hyperplanes; here a symmetric cone is, by definition, a finite direct product of irreducible factors which are the Lorentz cones and the cones of positive semidefinite symmetric real/Hermitian/quaternion matrices. Among the associated converging stationary Cutting Plane algorithms, the Ellipsoid method, associated with the Lorentz cone, is the fastest in terms of the guaranteed rate of convergence of ρ(Qt ) to 0, and the Circumscribed Simplex method, associated with the direct product of nonnegative rays (which are nothing but cones of positive semidefinite real 1 × 1 matrices), is the slowest one8 .
4.2
Building the certificates: preliminaries
To equip a generic Cutting Plane algorithm with certificates, let us treat our original “universe” Rn as the hyperplane E = {(x, t) ∈ Rn+1 : t = 1} in E + = Rn+1 , and let us associate with the localizers Qt the sets Q+ which are convex hulls of the sets Qt (treated as subsets in E) and the t origin in E + : 9. Q+ = {[sx; s] : 0 ≤ s ≤ 1, x ∈ Qt } t Let us further associate with vectors et ∈ Rn the cuts e+ = [et ; − et , xt ] ∈ Rn+1 . Observe t that the convex hulls Q+ = {[sx; s] : 0 ≤ s ≤ 1, x ∈ Qt+1 } of the origin in E + and the sets t+1 Qt+1 ⊂ E, the sets Q+ and the cuts are linked by the relation t Q+ = {z ∈ Q+ : e+ , z ≤ 0} ⊂ Q+ . t t t+1 t+1 (38)
Recall that the polar of a closed convex set P ⊂ E + , 0 ∈ P , is the set Polar (P ) = {z ∈ E + : z, p ≤ 1 ∀p ∈ P }. An immediate observation is as follows:
It should be stressed that we are speaking about the theoretical worst-case-oriented complexity bounds; “in reality” the Circumscribed Simplex algorithm seems to be faster than the Ellipsoid one. 9 Here and in what follows we use “MATLAB notation”: for matrices v 1 , ..., v k with common number of columns (e.g., for column vectors), [v 1 ; ...; v k ] stands for the matrix obtained by writing v 2 beneath v 1 , v 3 beneath v 2 , and so on.
8
20
Proposition 4.1 Assume that et = 0, so that Qt+1 is well defined. Then Polar (Q+ ) ⊂ Polar (Q+ ) = Polar (Q+ ) + R+ e+ t t t+1 t+1 (39)
Proof. The inclusion in (39) is evident due to the inclusion in (38) (“the larger is the set, the less is its polar”). The equality in (39) is given by the following simple statement: Lemma 4.1 Let P, Q be two closed convex sets in E + containing the origin and such that P is a cone, and let int P ∩ int Q = ∅. Then Polar (P ∩ Q) = Polar (Q) + P∗ , where P∗ = {z ∈ E + : z, u ≤ 0 ∀u ∈ P }. Proof of Lemma: By standard facts on polars, Polar (P ∩Q) = cl (Conv (Polar (P ) ∪ Polar (Q))). In our case, Polar (P ) = P∗ ; since P∗ is a cone and Polar (Q) contains the origin, the convex hull of the union of P∗ and Polar (Q) is dense in the arithmetic sum of these two sets, that is, Polar (P ∩ Q) = cl (P∗ + Polar (Q)). It follows that all we should verify in order to prove Lemma is that the set P∗ + Polar (Q) is closed. But this is immediate: let ui ∈ P∗ and vi ∈ Polar (Q) be such that ui + vi → w, i → ∞; we should prove that w ∈ P∗ + Polar (Q). To this end it is clearly enough to verify that the sequences {ui } and {vi } are bounded. The latter is nearly evident: by assumption, there exists a ball B of a positive radius which is contained in both P and Q. The quantities si = minx∈B ui + wi , x form a bounded sequence due to ui + vi → w, and ui , x ≤ 1, vi , x ≤ 1 for all x ∈ B due to B ⊂ P ∩ Q. Therefore we have min ui , x = min [ ui + vi , x − vi , x ] ≥ si − 1,
x∈B x∈B
that is, the sequence {minx∈B ui , x }∞ is below bounded. Since the sequence {maxx∈B ui , x }∞ i=1 i=1 is above bounded by 1, we conclude that the sequence {maxx∈B ui , x − minx∈B ui , x }∞ is i=1 bounded. Recalling that B is a ball of positive radius, we see that the sequence {ui }∞ is i=1 bounded; via the boundedness of the sequence {ui + vi }∞ , this implies that {vi }∞ is bounded i=1 i=1 as well. From Lemma to equality in (39): By (38), Q+ = Q+ ∩ P , where P is the cone P = {z ∈ t t+1 E + : e+ , z ≤ 0}. Since xt ∈ int Qt and et = 0, the interiors of P and Q+ have a nonempty t t intersection, so that by Lemma 4.1 Polar (Q+ ) = Polar (Q+ ) + P∗ = Polar (Q+ ) + R+ e+ . t t t t+1
4.3
Building the certificates: the algorithm
Equipped with Proposition 4.1, we can build certificates in a “backward fashion”, namely, as follows. Algorithm 4.2 Given an iteration number τ , we build a certificate for the corresponding protocol Pτ = {(xt , et )}∞ as follows: t=1 At a terminal (i.e., with eτ = 0) step τ : Since eτ = 0 can happen at a productive step only, we set here ξt = 0, t < τ , ξτ = 1, which, due to eτ = 0, results in a certificate for Pτ with
cert (ξ|Pτ , Q1 )
= 0.
At a nonterminal (i.e., with eτ = 0) step τ : 1. We choose a “nearly most narrow stripe” containing Qτ +1 , namely, find a vector h ∈ Rn such that max h, x − min h, x ≤ 1 (40)
x∈Qτ +1 x∈Qτ +1
21
and h
2
≥
1 , 2χρ(Qτ +1 )
χ ≤ 4n.
(41)
Note that such an h always exists, and, for all known converging cutting plane algorithms, can be easily found.
1 Indeed, when Qt+1 is an ellipsoid, one can easily find h satisfying (40) with h 2 ≤ 2ρ(Qt+1 ) (see below). In the general case, we can apply the “ellipsoidal construction” to the Fritz John ellipsoid of Qt+1 – an ellipsoid Q ⊃ Qt+1 with ρ(Q) ≤ nρ(Qt+1 ), see [8]. Note that for all aforementioned “rapidly converging” implementations of the Cutting Plane scheme, except for the center-of-gravity one (which in any case is not implementable), the Fritz John ellipsoids of the localizers are readily available.
2. Observe that both the vectors h+ = [h; − h, xτ +1 ] ∈ E + , h− = −h+ clearly belong to Polar (Q++1 ). Applying Proposition 4.1 recursively, we build representaτ tions (a) h+ = τ λt e+ + φ+ [λt ≥ 0, φ+ ∈ Polar (Q+ )] t t=1 1 (42) + τ (b) h− = −h+ = t=1 µt et + ψ + [µt ≥ 0, ψ + ∈ Polar (Q+ )] 1 The certificate for the protocol Pτ is well defined only for those τ for which the set Iτ = {t ≤ τ : xt ∈ int X} is nonempty and, moreover, the quantity dτ ≡ t∈Iτ [λt + µt ] is positive. In this case, the certificate is given by ξt = and we also set Dτ Wτ λt + µt , 1 ≤ t ≤ τ, dτ
2
= t∈Iτ [λt + µt ] et 2 = dτ t∈Iτ ξt et = maxt∈Iτ maxx∈X et , x − xt .
Note: The quantities Dτ , Wτ are used solely in the convergence analysis, and Algorithm 4.2 is not required to compute these quantities. Implementation of Algorithm 4.2 in the case of Ellipsoid method in the role of Algorithm 4.1 is really easy. Indeed, • to find h in rule 1, it suffices to build the singular value decomposition Bτ +1 = U DV 1 (U, V are orthogonal, D is diagonal with positive diagonal entries) and to set h = 2σi U ei∗ , ∗ where ei are the standard basic orths and i∗ is the index of the smallest diagonal entry σi∗ in D. 1 We clearly have σi∗ ≤ |Det(Bτ +1 )|1/n = ρ(Qτ +1 ), so that h 2 ≥ 2ρ(Qτ +1 ) (i.e., we ensure (41) T with χ = 1). Besides this, maxQt+1 h, x − minQt+1 h, x = max u 2 ≤1, v 2 ≤1 Bt+1 h, u − v = 1 T 2 max u 2 ≤1, v 2 ≤1 V ei∗ , u − v = 1, as required in rule 1; T • we clearly have Polar (Q+ ) = {[e; s] : Bt e 2 + xt , e + s ≤ 1}, so that given [g; a] ∈ t + Polar (Qt+1 ), to find a representation [g; a] = [f ; b] + re+ with [f ; b] ∈ Polar (Q+ ) and r ≥ 0 t t reduces to solving the problem (known in advance to be solvable) of finding r ≥ 0 such that T Bt (g − ret ) 2 + xt , g + a ≤ 1. A solution r∗ can be found at the cost of just O(n2 ) arithmetic T T operations. Specifically, if the vectors p = Bt g and q = Bt et have nonpositive inner products, pT q r∗ = 0 is a solution, otherwise a solution is given by r∗ = qT q . 22
From the just presented remarks it follows that with the Ellipsoid method in the role of Algorithm 4.1, the computational cost of building certificate for Pτ is O(n3 ) + O(τ n2 ) arithmetic operations (a.o.), provided that the subsequently generated data (search points xt , matrices Bt and vectors et ) are stored in the memory. Note that the cost of carrying out τ steps of the Ellipsoid algorithm is at least O(τ n2 ) (this is the total complexity of updatings (xt , Bt ) → (xt+1 , Bt+1 ), t = 1, ..., τ , with computational expenses of the oracles excluded). It follows that when certificates are built along a “reasonably dense” subsequence of steps, e.g., at steps 2,4,8,... the associated computational expenses basically do not affect the complexity of the Ellipsoid algorithm, see Discussion below.
4.4
The main result
Theorem 4.1 Let Algorithms 4.1 – 4.2 be applied to (X, Φ), where X ⊂ Rn is a solid and Φ : int X → Rn is a vector field, and let r = r(X) be the largest of the radii of Euclidean balls contained in X. (i) Let τ be an iteration number such that ξ = {ξt }τ is well defined. Then ξ is a certificate t=1 for the corresponding execution protocol Pτ = {(xt , et )}τ . If τ is the terminal iteration number t=1 (i.e., eτ = 0), then cert (ξ|Pτ , Q1 ) = 0, otherwise
cert (ξ|Pτ , Q1 )
≤
2 dτ
(43)
and, besides this,
2 τ < r ⇒ cert (ξ|Pτ , Q1 ) ≤ Wτ . (44) Dτ r− τ (ii) Let D(Q1 ) be the Euclidean diameter of Q1 . Then for every nonterminal iteration number τ one has r −1 . (45) Dτ ≥ D−1 (Q1 ) 2χρ(Qτ +1 ) (iii) Whenever τ is a nonterminal iteration number such that
τ
≡
ρ(Qτ +1 ) ≤ the certificate ξ is well defined, and
r2 , 16χD(Q1 )
(46)
16χD(Q1 )Wτ ρ(Qτ +1 ), (47) r2 so that when Φ is semi-bounded on int X: VarX (Φ) ≡ supx∈int X,y∈X Φ(x), y − x < ∞, we have
cert (ξ|Pτ , Q1 )
≤
16χD(Q1 )VarX (Φ) ρ(Qτ +1 ). (48) r2 In particular, in the case of the Ellipsoid algorithm (where χ = 1 and ρ(Qt+1 ) ≤ t R exp{− 2n(n−1) }, R being the radius of the ball B = Q1 ), the certificate ξ is well defined, provided that τ 32R2 exp{ }≥ , (49) 2n(n − 1) r2 and in this case 32R2 VarX (Φ) τ exp{− }, (50) cert (ξ|Pτ , Q1 ) ≤ r2 2n(n − 1) provided that Φ is semi-bounded on X.
cert (ξ|Pτ , Q1 )
≤
23
Proof. (i): When τ is terminal, the validity of (i) is evident. Now let τ be nonterminal with well defined ξ. The fact that ξ is a certificate for Pτ is evident form the construction; all we need is to verify (43) and (44). Let x ∈ Q1 and let x+ = [x; 1], so that x+ ∈ Q+ . Summing up 1 relations (42.a-b), and dividing both sides of the resulting equality by dτ , we get
τ t=1
ξt e+ = −d−1 (φ+ + ψ + ); τ t
multiplying both sides by x+ , we get
τ τ
ξt et , x − xt =
t=1 t=1
ξt e+ , x+ = −d−1 φ+ + ψ + , x+ ≥ −2d−1 , τ τ t
where the concluding inequality is given by the inclusions φ+ , ψ + ∈ Polar (Q+ ) and x+ ∈ Q+ . 1 1 The resulting inequality holds true for every x ∈ Q1 , and (43) follows. To prove (44), let τ be such that ≡ τ < r, and let x be the center of Euclidean ball B of ¯ the radius r which is contained in X. Observe that t ∈ Iτ ⇒ et , x − xt ≤ Wτ ∀x ∈ B, whence et , x − xt ≤ Wτ − r et 2 . Recalling what is et for t ∈ Jτ ≡ {1, ..., τ }\Iτ , we get the ¯ relations Wτ − r et 2 , t ∈ Iτ (Pt ) : et , x − x t ≤ ¯ 0, t ∈ Jτ Now let x ∈ Q1 , and let y =
(r− )x+ x ¯ , r τ
so that y ∈ Q1 . By (43) we have
[λt + µt ] et , xt − y ≤ 2;
t=1
multiplying this inequality by r and adding weighted sum of inequalities (Pt ), the weights being [λt + µt ] , we get
τ
[λt + µt ] et , rxt − ry + x − xt ≤ 2r + Wτ dτ − r Dτ . ¯
t=1 (r− )(xt −x)
The right hand side in this inequality, by the definition of , is Wτ dτ , and we arrive at the relation τ τ Wτ (r − ) [λt + µt ] et , xt − x ≤ Wτ dτ ⇔ ξt et , xt − x ≤ . r− t=1 t=1 This relation holds true for every x ∈ Q1 , and (44) follows. (i) is proved. (ii): As above, let x be the center of Euclidean ball B of radius r which is contained in X. ¯ Consider first the case when h, x −xτ +1 ≥ 0. Multiplying both sides in (42.a) by x+ = [¯ +re; 1] ¯ x with e ∈ Rn , e 2 ≤ 1, we get ¯ h, re ≤ h, re + h, x − xτ +1 = h+ , x+ = τ λt et , x + re − xt + φ+ , x+ ¯ t=1 ¯ ≤ τ λt et , x + re − xt + 1 [since x+ ∈ Q+ , φ+ ∈ Polar (Q+ )] t=1 1 1 ≤ 1 + t∈Iτ λt et , x + re − xt [since x + re ∈ X and et separates xt and X for t ∈ Jτ ] ¯ ¯ ≤ 1 + t∈Iτ λt et 2 D(Q1 ) ≤ 1 + Dτ D(Q1 ). 24
The resulting inequality holds true for all unit vectors e; maximizing the left hand side over these h 2 −1 1 e, we get Dτ ≥ rD(Q1 ) . Recalling that h 2 ≥ 2χρ(Qτ +1 ) , we arrive at (45). We have established (ii) in the case of h, x − xτ +1 ≥ 0; in the opposite case we can use the same reasoning with −h ¯ in the role of h and with (42.b) in the role of (42.a). (ii) is proved. (iii): This is an immediate consequence of (i), (ii) and the evident fact that whenever Φ is semi-bounded on int X, we have Wτ ≤ VarX (Φ) for all τ . Combining Proposition 3.2 and Theorem 4.1, we arrive at the following result: Corollary 4.1 Let X ⊂ Rn be a solid which is contained in the centered at the origin Euclidean ball B = Q1 of radius R, and let r = r(X) be the largest of the radii of Euclidean balls contained in X. Further, let Φ : Dom Φ → Rn , Dom Φ ⊃ int X, be a semi-bounded on int X operator. For appropriately chosen absolute constant O(1) and every ∈ (0, VarX (Φ)], with τ ( ) = O(1)n2 ln RVarX (Φ) r ,
the Ellipsoid algorithm, augmented with Algorithm 4.2 for building certificates, as applied to (X, Φ), in τ ≤ τ ( ) steps generates a protocol Pτ and a certificate ξ for this protocol such that
cert (ξ|Pτ , B)
≤ ,
along with the corresponding solution xτ = • when Φ is monotone, we have
t∈Iτ
ξt xt ∈ int X. As a result, ≤ ;
τ vi (x |X, Φ)
• when Φ is the Nash operator of a convex Nash equilibrium problem on X (in particular, in the case of convex minimization), we have
τ N (x )
≤ .
Discussion. Corollary 4.1 states that solving a convex Nash equilibrium problem (or a variational inequality) associated with a semi-bounded, monotone operator Φ within accuracy “costs” O(1)n2 ln RVarX (Φ) steps of the Ellipsoid algorithm, with (at most) one call to the r Separation and the Φ-oracles and O(1)n2 additional a.o. per step; note that this is nothing but the standard theoretical complexity bound for the Ellipsoid algorithm. The advantage offered by the certificates is that whenever a certificate ξ with cert (ξ|Pτ , B) ≤ is built, we know that we already have in our disposal a strictly feasible approximate solution of accuracy . Thus, in order to get a strictly feasible solution of a prescribed accuracy > 0, we can run the Ellipsoid algorithm and apply from time to time Algorithm 4.2 in order to generate a strictly feasible approximate solution and to bound from above its nonoptimality, terminating the solution process when this bound becomes ≤ . Note that this implementation, while ensuring a desired quality of the resulting approximate solution, requires no a priori information on the problem (aside of its convexity and the knowledge of a ball B containing the feasible domain X). Moreover, if Algorithm 4.2 is invoked along a “reasonably dense, but not too dense” sequence of time instants, e.g., at steps 2,4,8,..., the number of steps upon termination will be at most the one given by the theoretical complexity bound of the Ellipsoid algorithm, and the computational effort to run Algorithm 4.2 will be a small fraction of the computational effort required to run the Ellipsoid algorithm itself.
25
4.5
Incorporating “deep cuts”
The certificates we use can be sharpened by allowing for “deep cuts”. Specifically, assume that the separation oracle provides more information than we have postulated, namely, that in the case of x ∈ int X this oracle reports a vector e = 0 and a nonnegative number a such that X ⊆ {y : e, y − x ≤ −a}. In this situation, all our results remain valid when the residual of a certificate on a solid B ⊃ X is redefined as
τ cert (ξ|Pτ , B) = max x∈B t=1
ξt [ et , xt − x − at ],
(51)
where [et ; at ] is the output of the Separation oracle at a nonproductive step t and at = 0 for productive steps t, and the sets Qt+1 in Algorithm 4.1 are defined as Qt+1 = {x ∈ Qt : et , x − xt ≤ −αt }, (52)
where αt = at for all t. With the Ellipsoid method in use, the latter modification allows to replace the updating formulae (35), (36) with
T qt = Bt et , pt = √
T B t qt T B BT q qt t t t 2
, µt = √
n Bt+1 = (1 − µt )2 n2 −1
1/2
αt , T T qt Bt Bt qt (1−µt )(n−1) (1+µt )(n+1)
Bt + 1 −
Bt pt pT , ; t
(53)
xt+1 = xt −
1+nµt n+1 Bt pt
new updating formulae, while still ensuring that the ellipsoid Qt+1 contains Qt+1 , result in larger reduction in the volumes of subsequent ellipsoids. In the case of convex minimization, we can further allow for “deep cuts” also at productive steps by setting at = F (xt ) − min {F (xs ) : s ≤ t, xs ∈ int X} , αt = at /2
s
for a productive step t (at is used in (51), αt – in (52) and (53)). With this modification, all our CMPrelated results remain valid (modulo changes in absolute constant factors in Theorem 4.1), provided that the approximate solutions we are considering are the best found so far solutions xτ . Moreover, in this bst situation the relation τ opt (xbst ) ≤ cert (ξ|Pτ , B) (cf. (8)) remains valid whenever B is a solid containing the “leftover” part Xτ = {x ∈ X : et , x − xt ≤ −at ∀t ∈ Iτ } of the feasible domain. Thus, using “deep cuts” in the cutting plane scheme and when defining the residual allows one to accelerate somehow the algorithms and to obtain somehow improved accuracy bounds. The proofs of the just outlined “deep cut” modifications of our results are obtained from the proofs presented in this paper by minor and straightforward modifications.
5
Application Example: Recovering Dual Solutions to Large LP Problems
Here we present a simple and instructive application example for certificate machinery; more examples will be given in the forthcoming followup paper [16]. Consider a Linear Programming program with box constraints min F (x) ≡ c, x ,
x∈X
X = {x ∈ Rn : Ax ≡ [A; I; −I]x ≤ b = [b; 1; 1]} 26
(54)
where 1 is the all-one vector. Note that X is contained in the box B = {x ∈ Rn : −1 ≤ x ≤ 1}. Assuming that X has a nonempty interior and A has no all-zero rows, we have int X = {x ∈ Rn : Ax < b}. Assume that we have in our disposal oracle, which, given on input a point u ∈ Rn , reports whether u ∈ int X, and if it is not the case, returns a constraint aT x ≤ bj(u) j(u) from the system Ax ≤ b which is not strictly satisfied at x = u. This oracle can be considered as a Separation oracle for X, the separator in the case of x ∈ int X being given by the vector aj(x) . Assume that we are solving (54) by an iterative oracle-based method and after a number τ of steps have in our disposal the corresponding execution protocol Pτ = {(xt , et )}τ along with t=1 a certificate ξ with small residual on the box B. Let us look what kind of “LP information” can we extract from this certificate. For t ∈ Jτ , the vectors et are (the transposes of) certain rows aT t ) of the matrix A such that aT t ) xt ≥ bj(xt ) . Setting j(x j(x λτ = j
t∈Jt :j(xt )=j
ξt , j = 1, ..., m,
where m is the number of rows in A, we get a nonnegative vector λτ such that ξt et = AT λτ ,
t∈Jτ t∈Jτ t∈Iτ τ
ξt et , xt ≥ bT λτ . ξt xt and taking into account the outlined
Now, when t ∈ Iτ , we have et = c. Setting xτ = remarks, the relation
cert (ξ|Pτ , B)
= max
x∈B t=1
ξt et , xt − x
implies that
cert (ξ|Pτ , B)
≥ [cT xτ + bT λτ ] + max −x, [c + AT λτ ] = [cT xτ + bT λτ ] +
x∈B
c + AT λτ
1
(55)
“duality gap”
“dual infeasibility”
(note that B is the unit box). Now, AT = [AT , I, −I]; decomposing λτ accordingly: λτ = [λτ ; λτ ; λτ ], we can easily build nonnegative ∆λ+ , ∆λ− ∈ Rn such that the vector A + − λτ = [λτ ; λτ + ∆λ+ ; λτ + ∆λ− ] ≥ 0 satisfies the relation c + AT λτ = 0 and ∆λ+ 1 + ∆λ− 1 = − A + c + AT λτ 1 , whence bT λτ ≤ c + AT λτ 1 + bT λτ (note that the entries in b corresponding to nonzero entries in λτ − λτ are ±1). Invoking (55), we arrive at xτ ∈ X, λτ ≥ 0, c + AT λτ = 0, cT xτ + bT λτ ≤ cert (ξ|Pτ , B). In other words, in the case in question a certificate for Pτ straightforwardly induces a pair (xτ , λτ ) of feasible solutions to the problem of interest and its LP dual such that the duality gap, evaluated at this pair, does not exceed cert (ξ|Pτ , B). This example is of significant interest when the LP problem (54) has a huge number of constraints which are “well organized” in the sense that it is relatively easy to find a constraint, if any, which is not strictly satisfied at a given point (for examples, see., e.g., [11]). By Corollary n c 4.1, the latter property allows, given > 0, to build in O(n2 ln r(X)2 ) steps, with a single call to Separation oracle for X and O(n2 ) additional operations per step, a protocol Pτ and associated 27
certificate ξ such that cert (ξ|Pτ , B) ≤ . With the just presented construction, we can easily convert (Pτ , ξ) into a pair of feasible solutions to the primal and the dual problems with duality gap ≤ . Note that in our situation the dual problem has a huge number of variables, which makes it impossible to solve the dual by the standard optimization techniques. In fact, with huge m it is even impossible to write down efficiently a candidate solution to the dual as a vector; with the outlined approach, this difficulty is circumvented by allowing ourselves to indicate indices and values of only nonzero entries in the “sparse” dual solution we get.
Appendix
The purpose of this appendix is to verify that when considering a convex Nash equilibrium problem for which the underlying functions Fi are continuous on the corresponding set X, the weak solutions to the VIP defined by the Nash operator and X are exactly the Nash equilibria. Here is the proof. Assume that the functions Fi in a convex Nash equilibrium problem are continuous on X. Then, of course, Fi (xi , xi ) are convex in xi ∈ Xi for every xi ∈ X i = X1 ×..×Xi−1 ×Xi+1 ×...×Xm and are concave in xi ∈ X i for every xi ∈ Xi , and F (x) = i Fi (x) is convex on X. a) Let x∗ be a Nash equilibrium, and let us prove that x∗ is a weak solution to the Nash 1 ¯ 2 ¯ VIP. Let y ∈ int X, ∆ = 2 (y − x∗ ) and x = 1 (y + x∗ ), so that x ∈ int X. We have
1 2
Φ(y), y − x∗ = Φ(¯ + ∆), ∆ = x x i [Fi (¯
i
Φi (¯ + ∆), ∆i x
≥ F (¯+∆)−Fi (¯i +∆i ,¯i ) x x x (a) i
≥
+ ∆) − Fi (¯i + ∆i , xi ) + Fi (¯ − ∆) − Fi (¯i − ∆i , xi ) ] x ¯ x x ¯
≤ =F (xi ,(x∗ )i )−Fi (xi ,¯i ) (b) 0 ∗ ∗ x i i xi ¯ xi ¯ i [ Fi (¯ + ∆ , xi ) + Fi (¯ − ∆ , xi ) ] ≤ 2Fi (¯) x (c)
= F (¯ + ∆) + F (¯ − ∆) − x x
≥ F (¯ + ∆) + F (¯ − ∆) − 2F (¯) (d) 0 x x x ≥
where (a) is due to convexity of Fi (y i , ·), (b) due to the fact that Fi (xi , xi ) attains it minimum ∗ in xi ∈ Xi at the point (x∗ )i , (c) is due to the concavity of Fi (xi , xi ) in xi ∈ X i and (d) is due ¯ to the convexity of F . We see that Φ(y), y − x∗ ≥ 0 for all y ∈ int X, so that x∗ is a weak solution to the VIP in question. b) Now let x∗ be a weak solution to the VIP, and let us prove that x∗ is a Nash equilibrium. Assume, on the contrary, that for certain i the function Fi (xi , xi ) does not attain its minimum ∗ over xi ∈ Xi at the point (x∗ )i ; w.l.o.g., let it be the case for i = m. Let v be a minimizer of the convex continuous function Fm (xm , ·) on Xm ; then the function f (s) = Fm (xm , (x∗ )m + s[v − ∗ ∗ (x∗ )m ]) is nonincreasing in s ∈ [0, 1] and there exists s ∈ (0, 1) such that f (¯)−f (1) > 0, or, which ¯ s ¯ is the same, Fm (x∗ , (x∗ )m + s[v − (x∗ )m ]) > Fm (xm , v). Since Fm (x) is continuous in x ∈ X, ∗ m we can find, first, v ∈ int Xm close enough to v and, second, a small enough neighbourhood U ¯ (in X m ) of the point xm such that ∗ ∀(u ∈ U ) : Fm (u, (x∗ )m + s[¯ − (x∗ )m ]) − Fm (u, v ) ≥ > 0. ¯v ¯ Let us choose u ∈ U ∩ int X m and let ¯ x[ρ, δ] = (xm + ρ[¯ − xm ], (x∗ )m + δ[¯ − (x∗ )m ]), u v ∗ ∗
uρ vδ
(56)
28
so that x[ρ, δ] ∈ int X for 0 < ρ, δ ≤ 1. For 1 ≤ i < m and 0 ≤ ρ < 1, 0 < δ ≤ 1 we have
ρ Φi (x[ρ, δ]), xi [ρ, δ] − (x∗ )i = 1−ρ Φi (x[ρ, δ]), ui − xi [ρ, δ] ¯ ≤ ρ i [ρ, δ], u ) − F (x[ρ, δ])] ≤ ρ M, ¯i i (a) 1−ρ [Fi (x 1−ρ
(57)
where M/2 is an upper bound on |Fj (x)| over j = 1, ..., m and x ∈ X; here (a) is given by the convexity of Fi in xi ∈ Xi . We further have uρ ∈ U , whence, by (56), ¯ − ≥ Fm (¯ρ , v ) − Fm (¯ρ , (x∗ )m + s[¯ − (x∗ )m ]) = Fm (x[ρ, 1]) − Fm (x[ρ, s]) u ¯ u ¯v ¯ 1 m [ρ, s]), v − (x ) = s Φm (x ¯ ∗ m ds. ¯ We see that there exists δ = δρ ∈ [¯, 1] such that Φm (xm [ρ, δρ ], v − (x∗ )m ≤ − , or, which is s ¯ the same, Φm (x[ρ, δρ ]), xm [ρ, δρ ] − (x∗ )m = δρ Φm (x[ρ, δρ ]), v − (x∗ )m ≤ −δρ ≤ −(1 − s) . ¯ ¯ Combining this relation with (57), we get Φ(x[ρ, δρ ]), x[ρ, δρ ] − x∗ ≤ (m − 1) ρ M − (1 − s) . ¯ 1−ρ
For small ρ > 0, the right hand side in this inequality is < 0, while the left hand side is nonnegative for all ρ ∈ (0, 1] since x∗ is a weak solution to the VIP. We got the desired contradiction.
29
References
[1] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization, SIAM, Philadelphia, 2001. [2] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004. [3] V.P. Bulatov and L.O. Shepot’ko, Method of centers of orthogonal simpexes for solving convex programming problems (in Russian), In: Methods of Optimization and Their Application, Nauka, Novosibirsk 1982 . [4] B.P. Burrell, M.J. Todd, Ellipsoid method generates dual variables, Mathematics of Operations Research 10:4 (1985), 688–700. [5] B.C. Eaves and U.G. Rothblum, Dines-Fourier-Motzkin quantifier elimination and an application of corresponding transfer principles over ordered fields, Mathematical Programming 53:307-321, 1992. [6] B.C. Eaves and U.G. Rothblum, Formulation of linear problems and solution by a universal machine, Mathematical Programming 65;263-310, 1994. [7] D. Gabelev (2003), Polynomial time cutting plane algorithms associated with symmetric cones, M.Sc. Thesis in OR and Systems Analysis, Faculty of Industrial Engineering and Management, Technion – Israel Institute of Technology, Technion City, Haifa 32000, Israel E-print: http://www2.isye.gatech.edu/ nemirovs/Dima.pdf [8] Fritz John, Extremum problems with inequalities as subsidiary conditions, in: Studies and Essays Presented to R. Courant on his 60th Birthday, January 8, 1948, Interscience Publishers, Inc., New York, N. Y., 1948, pp. 187–204. [9] O. Goldreich, S. Micali and A. Wigderson, Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems, J. ACM 38:691-729, 1991. [10] S. Goldwasser, S. Micali and C. Rackoff, The Knowledge Complexity of Interactive Proof Systems, SIAM J. Comput. 18:186-208, 1989. [11] M. Gr¨tschel, L. Lovasz and A. Schrijver, The Ellipsoid Method and Combinatorial Optio mizatoon. Springer-Verlag, 1986. [12] P.T. Harker and J.-S. Pang, Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications, Mathematical Programming 48:161-220, 1990. [13] L.G. Khachiyan, A polynomial algorithm in linear programming, Soviet Mathematics Doklady, 20:191–194, 1979. [14] A.Yu. Levin, On an algorithm for the minimization of convex functions (in Russian), Doklady Akad. Nauk SSSR 160:6 (1965), 1244–1247. (English translation: Soviet Mathematics Doklady, 6:286–290, 1965.) [15] G.J. Minty, Monotone non-linear operators in Hilber space, Duke Mathematics Journal, 29:341-346, 1962. 30
[16] A. Nemirovski, S. Onn and U.G. Rothblum, Dual algorithms and accuracy certificates for convex minimization problems, forthcoming. [17] A. Nemirovski, Polynomial time methods in Convex Programming, in: J. Renegar, M. Shub and S. Smale, Eds., The Mathematics of Numerical Analysis, AMS-SIAM Summer Seminar on Mathematics in Applied Mathematics, July 17– August 11, 1995, Park City, Utah. Lectures in Applied Mathematics, AMS, Providence, 32:543-589, 1996. [18] Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Algorithms in Convex Programming, SIAM, Philadelphia, 1994. [19] D.J. Newman, Location of maximum on unimodal surfaces, Journ. of the Assoc. for Computing Machinery 12:11-23, 1965. [20] T. Rockafellar, Convex Analysis, Princeton University press, Princeton, New Jersey, 1970. [21] T. Rockafellar, Network Flows and Monotropic Optimization, Wiley Interscience, 1984. [22] N.Z. Shor, Cutting plane method with space dilation for the solution of convex programming problems (in Russian), Kibernetika, 1:94-95, 1977. [23] S.P. Tarasov, L.G. Khachiyan and I.I. Erlikh, The method of inscribed ellipsoids, Soviet Mathematics Doklady 37:226–230, 1988. [24] B. Yamnitsky and L. Levin, An Old Linear Programming Algorithm Runs in Polynomoial Time, In: 23rd Annual Symposium on Foundations of Computer Science, IEEE, New York, 327-328, 1982. [25] D. Yudin and A. Nemirovskii, Informational complexity and effective methods of solution for convex extremal problems (in Russian), Ekonomika i Matematicheskie Metody 12:2 (1976), 357–369 (translated into English as Matekon: Transl. Russian and East European MAth. Economics 13 25–45, Spring ’77).
Shmuel Onn
Technion - Israel Institute of Technology, 32000 Haifa, Israel email: onn@ie.technion.ac.il http://ie.technion.ac.il/∼onn
Arkadi Nemirovski
Georgia Institute of Technology, Atlanta, Georgia xxxxx, USA email: nemirovs@isye.gatech.edu http://www2.isye.gatech.edu/∼nemirovs/
Uriel G. Rothblum
Technion - Israel Institute of Technology, 32000 Haifa, Israel email: rothblum@ie.technion.ac.il http://ie.technion.ac.il/rothblum.phtml
31