A Full-Newton Step O(n) Infeasible Interior-Point Algorithm for Linear Optimization

A Full-Newton Step O(n) Infeasible Interior-Point Algorithm for Linear Optimization C. Roos Revised August 9, 2005 February 5, 2005 Faculty of Electr...
Author: Diane Preston
2 downloads 4 Views 268KB Size
A Full-Newton Step O(n) Infeasible Interior-Point Algorithm for Linear Optimization C. Roos Revised August 9, 2005 February 5, 2005

Faculty of Electrical Engineering, Computer Science and Mathematics Delft University of Technology P.O. Box 5031, 2600 GA Delft, The Netherlands e-mail: [email protected]

Abstract We present a primal-dual infeasible interior-point algorithm. As usual, the algorithm decreases the duality gap and the feasibility residuals at the same rate. Assuming that an optimal solution exists it is shown that at most O(n) iterations suffice to reduce the duality gap and the residuals by the factor 1/e. This implies an O(n log(n/ε)) iteration bound for getting an ε-solution of the problem at hand, which coincides with the best known bound for infeasible interior-point algorithms. The algorithm constructs strictly feasible iterates for a sequence of perturbations of the given problem and its dual problem. A special feature of the algorithm is that it uses only full-Newton steps. Two types of full-Newton steps are used, so-called feasibility steps and usual (centering) steps. Starting at strictly feasible iterates of a perturbed pair, (very) close its central path, feasibility steps serve to generate strictly feasible iterates for the next perturbed pair. By accomplishing a few centering steps for the new perturbed pair we obtain strictly feasible iterates close enough to the central path of the new perturbed pair. The algorithm finds an optimal solution or detects infeasibility or unboundedness of the given problem.

Keywords: Linear optimization, infeasible interior-point method, primal-dual method, polynomial complexity. AMS Subject Classification:

1

90C05, 90C51

Introduction

Interior-Point Methods (IPMs) for solving Linear Optimization (LO) problems were initiated by Karmarkar [8]. They not only have polynomial complexity, but are also highly efficient in practice. One may distinguish between feasible IPMs and infeasible IPMs (IIPMs). Feasible IPMs start with a strictly feasible interior point and maintain feasibility during the solution process. An elegant and theoretically sound method to find a strictly feasible starting point is to use a self-dual embedding model, by introducing artificial variables. This technique was 1

presented first by Ye et al. [33]. Subsequent references are [1, 20, 31]. Well-known commercial software packages are based on this approach, for example MOSEK [2]1 and SeDuMi [23]2 are based on the use of the self-dual model. Also the leading commercial linear optimization package CPLEX3 includes the self-dual embedding model as a possible option. Most of the existing software packages use an IIPM. IIPMs start with an arbitrary positive point and feasibility is reached as optimality is approached. The first IIPMs were proposed by Lustig [11] and Tanabe [24]. Global convergence was shown by Kojima et. al. [9], whereas Zhang [34] proved an O(n2 L) iteration bound for IIPMs under certain conditions. Mizuno [12] introduced a primal-dual IIPM and proved global convergence of the algorithm. Other relevant references are [4, 5, 6, 10, 13, 15, 16, 18, 19, 22, 25, 28, 29]. A detailed discussion and analysis of IIPMs can be found in the book by Wright [30] and, with less detail, in the books by Ye [32] and Vanderbei [26]. The performance of existing IIPMs highly depends on the choice of the starting point, which makes these methods less robust than the methods using the self-dual embedding technique. As usual, we consider the linear optimization (LO) problem in the standard form  (P ) min cT x : Ax = b, x ≥ 0 , with its dual problem

(D)

 max bT y : AT y + s = c,

s≥0 .

Here A ∈ Rm×n , b, y ∈ Rm and c, x, s ∈ Rn . Without loss of generality we assume that rank (A) = m. The vectors x, y and s are the vectors of variables. The best know iteration bound for IIPMs, as given in [32, Theorem 5.14], is n  



o  T max x0 s0 , b − Ax0 , c − AT y 0 − s0 . O n log ε

(1)

Here x0 > 0, y 0 and s0 > 0 denote the starting points, and b − Ax0 and c − AT y 0 − s0 are the initial primal and dual residue vectors, respectively, whereas ε is an upper bound for the duality gap and the norms of residual vectors upon termination of the algorithm. It is assumed in this result that there exists an optimal solution (x∗ , y ∗ , s∗ ) such that k(x∗ ; s∗ )k∞ ≤ ζ, and the initial iterates are (x0 , y 0 , s0 ) = ζ(e, 0, e). Until 2003, the search directions used in all primal-dual IIPMs were computed from the linear system

T

A∆x = b − Ax

A ∆y + ∆s = c − AT y − s

(2)

s∆x + x∆s = µe − xs,

which yields tho so-called primal-dual Newton search directions ∆x, ∆y and ∆s. Here xs denotes the componentwise (or Hadamard) product of the vectors x and s. Recently, Salahi et 1

MOSEK: http://www.mosek.com SeDuMi: http://sedumi.mcmaster.ca 3 CPLEX: http://cplex.com

2

2

al. used a so-called ’self-regular’ proximity function to define a new search direction for IIPMS [21]. Their modification only involves the third equation in the above system. The iteration bound of their method does not improve the bound in (1). To introduce the idea underlying the algorithm presented in this paper we make some remarks with a historical flavor. In feasible IPMs the iterates are feasible and the ultimate goal is to get iterates that are optimal. There is a well known IPM that aims to reach optimality in one step, namely the affine-scaling method. Although variants of this method have been shown to be polynomial in [14] and [7], with complexity bounds O(nL2 ) and O(nL), respectively, where L denotes the binary input size, these bounds are much worse than the best known bounds for interior-point methods, and the question if the affine-scaling method is polynomial or not is still unsettled. The last two decades have made it very clear that to get a more efficient method one should be less greedy, and work with a search direction that moves the iterates only slowly in the direction of optimality. The reason is that only then one can take full advantage of the efficiency of Newton’s method, which is the work horse in all IPMs. In IIPMs, the iterates are not feasible, and apart from reaching optimality one needs to strive for feasibility. This is reflected by the choice of the search direction, as defined by (2). Because, when moving from x to x+ := x + ∆x the new iterate x+ satisfies the primal feasibility constraints, except possibly the nonnegativity constraint. In fact, in general x+ will have negative components, and, to keep the iterate positive, one is forced to take a damped step of the form x+ := x + α∆x, where α < 1 denotes the step size. This recalls, however, the phenomenon that occurred with the affine-scaling method in feasible IPMs. There it has become clear that the best complexity results hold for methods that are much less greedy and that use full Newton steps (with α = 1). Striving to reach feasibility in one step might be too optimistic, and may deteriorate the overall behavior of a method. One should better exercise a little patience and move slower into the direction of feasibility. Therefore, in our approach the search directions are designed in such a way that a full Newton step reduces the sizes of the residual vectors with the same speed as the duality gap. The outcome of the analysis in this paper shows that this is a good strategy. The paper is organized as follows. As a preparation to the rest of the paper, in Section 2 we first recall some basic tools in the analysis of a feasible IPM. These tools will be used also in the analysis of the IIPM proposed in this paper. Section 3 is used to describe our algorithm in more detail. One characteristic of the algorithm is that it uses intermediate problems. The intermediate problems are suitable perturbations of the given problems (P ) and(D) so that at any stage the iterates are strictly feasible for the current perturbed problems; the size of the perturbation decreases at the same speed as the barrier parameter µ. When µ changes to a smaller value, the perturbed problem corresponding to µ changes, and hence also the current central path. The algorithm keeps the iterates close to the µ-center on the central path of the current perturbed problem. To get the iterates feasible for the new perturbed problem and close to its central path, we use a so-called ’feasibility step’. The largest, and hardest, part of the analysis, which is presented in Section 4, concerns this step. It turns out that to keep control over this step, before taking the step, the iterates need to be very well centered. Some concluding remarks can be found in Section 5. Some notations used throughout the paper are as follows. The 2-norm and the infinity norm are denoted by k·k and k·k∞ , respectively. If x, s ∈ Rn , then xs denotes the componentwise (or Hadamard) product of the vectors x and s. Furthermore, e denotes the all-one vector of length n. If z ∈ Rn+ and f : R+ → R+ , then f (z) denotes the vector in Rn+ whose i-th component is 3

f (zi ), with 1 ≤ i ≤ n. We write f (x) = O(g(x)) if f (x) ≤ γ g(x) for some positive constant γ.

2

Feasible full-Newton step IPMs

In preparation of dealing with infeasible IPMs, in this section we briefly recall the classical way to obtain a polynomial-time path-following feasible IPM for solving (P ) and (D). To solve these problems one needs to find a solution of the following system of equations. Ax = b, T

A y + s = c, xs = 0.

x≥0 s≥0

In these so-called optimality conditions the first two constraints represent primal and dual feasibility, whereas the last equation is the so-called complementary condition. The nonnegativity constraints in the feasibility conditions make the problem already nontrivial: only iterative methods can find solutions of linear systems involving inequality constraints. The complementary condition is nonlinear, which makes it extra hard to solve this system.

2.1

The central path

IPMs replace the complementarity condition by the so-called centering condition xs = µe, where µ may be any positive number. This yields the system Ax = b, T

A y + s = c, xs = µe.

x≥0 s≥0

(3)

Surprisingly enough, if this system has a solution for some µ > 0, then a solution exists for every µ > 0, and this solution is unique. This happens if and only if (P ) and (D) satisfy the interior-point condition (IPC), i.e., if (P ) has a feasible solution x > 0 and (D) has a solution (y, s) with s > 0 (see, e.g., [20]). If the IPC is satisfied, then the solution of (3) is denoted by (x(µ), y(µ), s(µ)), and called the µ-center of (P ) and (D). The set of all µ-centers is called the central path of (P ) and (D). As µ goes to zero, x(µ), y(µ) and s(µ) converge to optimal solution of (P ) and (D). Of course, system (3) is still hard to solve, but by applying Newton’s method one can easily find approximate solutions.

2.2

Definition and properties of the Newton step

We proceed by describing Newton’s method for solving (3), with µ fixed. Given any primal feasible x > 0, and dual feasible y and s > 0, we want to find displacements ∆x, ∆y and ∆s such that A(x + ∆x) = b, T

A (y + ∆y) + s + ∆s = c, (x + ∆x)(s + ∆s) = µe.

4

Neglecting the quadratic term ∆x∆s in the third equation, we obtain the following linear system of equations in the search directions ∆x, ∆y and ∆s. A∆x = b − Ax,

T

(4)

T

A ∆y + ∆s = c − A y − s,

(5)

s∆x + x∆s = µe − xs.

(6)

Since A has full rank, and the vectors x and s are positive, one may easily verify that the coefficient matrix in the linear system (4)–(6) is nonsingular. Hence this system uniquely defines the search directions ∆x, ∆y and ∆s. These search directions are used in all existing primal-dual (feasible and infeasible) IPMs and are equivalent to Newton’s method for solving the equations in system (3). If x is primal feasible and (y, s) dual feasible, then b − Ax = 0 and c − AT y − s = 0, whence the above system reduces to A∆x = 0,

(7)

A ∆y + ∆s = 0,

(8)

s∆x + x∆s = µe − xs,

(9)

T

which gives the usual search directions for feasible primal-dual IPMs. The new iterates are given by x+ = x + ∆x, y + = y + ∆y, s+ = s + ∆s. An important observation is that ∆x lies in the null space of A, whereas ∆s belongs to the row space of A. This implies that ∆x and ∆s are orthogonal, i.e., (∆x)T ∆s = 0. As a consequence we have the important property that after a full Newton step the duality gap assumes the same value as at the µ-centers, namely nµ. T

Lemma 2.1 (Lemma II.46 in [20]) After a primal-dual Newton step one has (x+ ) s+ = nµ. With x is primal feasible and (y, s) dual feasible, these are called ε-solutions of (P ) and (D) if cT x − bT y = xT s ≤ ε. Assume that a primal feasible x0 > 0 and a dual feasible pair (y 0 , s0 ) with s0 > 0 are given that are ’close to’ x(µ) and (y(µ), s(µ)), respectively, for some µ = µ0 . √ Then one can find an ε-solution in O( n log(n/ε)) iterations of the algorithm in Figure 1. In this algorithm δ(x, s; µ) is a quantity that measures proximity of the feasible triple (x, y, s) to the µ-center (x(µ), y(µ), s(µ)). Following [20], this quantity is defined as follows r

xs 1 −1

v−v where v := . (10) δ(x, s; µ) := δ(v) := 2 µ

The following two lemmas are crucial in the analysis of the algorithm. We recall them without proof. They describe the effect on δ(x, s; µ) of a µ-update and a Newton (or centering) step, respectively. 5

Primal-Dual Feasible IPM

Input: Accuracy parameter ε > 0; barrier update parameter θ, 0 < θ < 1; T feasible (x0 , y 0 , s0 ) with x0 s0 = nµ0 , δ(x0 , s0 , µ0 ) ≤ 1/2. begin x := x0 ; y := y 0 ; s := s0 ; µ := µ0 ; while xT s ≥ ε do begin µ-update: µ := (1 − θ)µ; centering step: (x, y, s) := (x, y, s) + (∆x, ∆y, ∆s); end end

Figure 1: Feasible full-Newton-step algorithm Lemma 2.2 (Lemma II.53 in [20]) Let (x, s) be a positive primal-dual pair and µ > 0 such that xT s = nµ. Moreover, let δ := δ(x, s; µ) and let µ+ = (1 − θ)µ. Then δ(x, s; µ+ )2 = (1 − θ)δ 2 +

θ2 n . 4(1 − θ)

Lemma 2.3 (Theorem II.49 in [20]) If δ := δ(x, s; µ) ≤ 1, then the primal-dual Newton step is feasible, i.e., x+ and s+ are nonnegative. Moreover, if δ < 1, then x+ and s+ are positive and δ2 . δ(x+ , s+ ; µ) ≤ p 2(1 − δ 2 )

Corollary 2.4 If δ := δ(x, s; µ) ≤

2.3

√1 , 2

then δ(x+ , s+ ; µ) ≤ δ 2 .

Complexity analysis

We have the following theorem, whose simple proof we include, because it slightly improves the complexity result in [20, Theorem II.52]. √ Theorem 2.5 If θ = 1/ 2n, then the algorithm requires at most √

2n log

nµ0 ε

iterations. The output is a primal-dual pair (x, s) such that xT s ≤ ε. 6

Proof: At the start of the algorithm we have √ δ(x, s; µ) ≤ 1/2. After the update of the barrier + parameter to µ = (1 − θ)µ, with θ = 1/ 2n, we have, by Lemma 2.2, the following upper bound for δ(x, s; µ+ ): 1 3 1−θ + ≤ . δ(x, s; µ+ )2 ≤ 4 8(1 − θ) 8 Assuming n ≥ 2, the last inequality follows since the expression on its left-hand side is a convex function of θ, whose value is 3/8 both at θ = 0 and at θ = 1/2. Since√θ ∈ [0, 1/2], the left hand side does not exceed 3/8. Since 3/8 < 1/2, we obtain δ(x, s; µ+ ) ≤ 1/ 2. After the primal-dual Newton step to the µ+ -center we have, by Corollary 2.4, δ(x+ , s+ ; µ+ ) ≤ 1/2. Also, from Lemma 2.1, (x+ )T s+ = nµ+ . Thus, after each iteration of the algorithm the properties 1 2 are maintained, and hence the algorithm is well defined. The iteration bound in the theorem now easily follows from the fact that in each iteration the value of xT s is reduced by the factor 1 − θ (see, for example, the proof of Theorem 3.2 in [30] for such a deduction). This proves the theorem.  xT s = nµ,

3

δ(x, s; µ) ≤

Infeasible full-Newton step IPM

In the case of an infeasible method we call the triple (x, y, s) an ε-solution of (P ) and (D) if the norms of the residual vectors b − Ax and c − AT y − s do not exceed ε, and also xT s ≤ ε. We are now ready to present an infeasible-start algorithm that generates an ε-solution of (P ) and (D), if it exists, or establishes that no such solution exists.

3.1

The perturbed problems

We start with choosing arbitrarily x0 > 0 and y 0 , s0 > 0 such that x0 s0 = µ0 e for some (positive) number µ0 . For any ν with 0 < ν ≤ 1 we consider the perturbed problem (Pν ), defined by n o T  (Pν ) min c − ν c − AT y 0 − s0 x : Ax = b − ν b − Ax0 , x ≥ 0 ,

and its dual problem (Dν ), which is given by n T  (Dν ) max b − ν b − Ax0 y : AT y + s = c − ν c − AT y 0 − s0 ,

o s≥0 .

Note that if ν = 1 then x = x0 yields a strictly feasible solution of (Pν ), and (y, s) = (y 0 , s0 ) a strictly feasible solution of (Dν ). We conclude that if ν = 1 then (Pν ) and (Dν ) satisfy the IPC. Lemma 3.1 (cf. Theorem 5.13 in [32]) The original problems, (P ) and (D), are feasible if and only if for each ν satisfying 0 < ν ≤ 1 the perturbed problems (Pν ) and (Dν ) satisfy the IPC. Proof: Suppose that (P ) and (D) are feasible. Let x ¯ be a feasible solution of (P ) and (¯ y , s¯) T a feasible solution of (D). Then A¯ x = b and A y¯ + s¯ = c, with x ¯ ≥ 0 and s¯ ≥ 0. Now let 0 < ν ≤ 1, and consider x = (1 − ν) x ¯ + ν x0 ,

y = (1 − ν) y¯ + ν y 0 , 7

s = (1 − ν) s¯ + ν s0 .

One has   x + νAx0 = (1 − ν) b + νAx0 = b − ν b − Ax0 , Ax = A (1 − ν) x ¯ + ν x0 = (1 − ν) A¯

showing that x is feasible for (Pν ). Similarly,   AT y + s = (1 − ν) AT y¯ + s¯ + ν AT y 0 + s0   = (1 − ν) c + ν AT y 0 + s0 = c − ν c − AT y 0 − s0 ,

showing that (y, s) is feasible for (Dν ). Since ν > 0, x and s are positive, thus proving that (Pν ) and (Dν ) satisfy the IPC. To prove the inverse implication, suppose that (Pν ) and (Dν ) satisfy the IPC for each ν satisfying 0 < ν ≤ 1. Obviously, then (Pν ) and (Dν ) are feasible for these values of ν. Letting ν go to zero it follows that (P ) and (D) are feasible.  In the sections to follow we assume that (P ) and (D) are feasible. Only in Section 4.7 we will discuss how our algorithm can be used to detect infeasibility or unboundedness of (P ) and (D).

3.2

The central path of the perturbed problems

Let (P ) and (D) be feasible and 0 < ν ≤ 1. Then Lemma 3.1 implies that the problems (Pν ) and (Dν ) satisfy the IPC, and hence their central paths exist. This means that the system b − Ax = ν(b − Ax0 ), T 0

T

0

c − A y − s = ν(c − A y − s ), xs = µe.

x≥0

s ≥ 0.

(11) (12) (13)

has a unique solution, for every µ > 0. In the sequel this unique solution is denoted by (x(µ, ν), y(µ, ν), s(µ, ν)). These are the µ-centers of the perturbed problems (Pν ) and (Dν ). Note that since x0 s0 = µ0 e, x0 is the µ0 -center of the perturbed problem (P1 ) and (y 0 , s0 ) the µ0 -center of (D1 ). In other words, (x(µ0 , 1), y(µ0 , 1), s(µ0 , 1)) = (x0 , y 0 , s0 ). In the sequel the parameters µ and ν always satisfy the relation µ = ν µ0 .

3.3

An iteration of our algorithm

We just established that if ν = 1 and µ = µ0 , then x = x0 is the µ-center of the perturbed problem (Pν ) and (y, s) = (y 0 , s0 ), the µ-center of (Dν ). These are our initial iterates. We measure proximity to the µ-center of the perturbed problems by the quantity δ(x, s; µ) as defined in (10). Initially we thus have δ(x, s; µ) = 0. In the sequel we assume that at the start of each iteration, just before the µ-update, δ(x, s; µ) is smaller than or equal to a (small) threshold value τ > 0. So this is certainly true at the start of the first iteration. Now we describe one (main) iteration of our algorithm. Suppose that for some µ ∈ (0, µ0 ] we have x, y and s satisfying the feasibility conditions (11) and (12) for ν = µ/µ0 , and such that xT s = nµ and δ(x, s; µ) ≤ τ . We reduce µ to µ+ = (1 − θ)µ, with θ ∈ (0, 1), and find new iterates x+ , y + and s+ that satisfy (11) and (12), with µ replaced by µ+ and ν by ν + = µ+ /µ0 , and such that xT s = nµ+ and δ(x+ , s+ ; µ+ ) ≤ τ . Note that ν + = (1 − θ)ν. 8

To be more precise, this is achieved as follows. Each main iteration consists of a feasibility step and a few centering steps. The feasibility step serves to get iterates (xf , y f , sf ) that are strictly feasible for (Pν + ) and (Dν + ), and close to their µ-centers (x(µ+ , ν + ), y(µ+ ,√ν + ), s(µ+ , ν + )). In fact, the feasibility step is designed in a such a way that δ(xf , sf ; µ+ ) ≤ 1/ 2. Since the triple (xf , y f , sf ) is strictly feasible for (Pν + ) and (Dν + ), we can easily get iterates (x+ , y + , s+ ) that are strictly feasible for (Pν + ) and (Dν + ) and such that δ(x, s; µ+ ) ≤ τ , just by performing a few centering steps starting at (xf , y f , sf ) and targeting at the µ+ centers of (Pν + ) and (Dν + ). Before describing the feasibility step it will be convenient to introduce some new notations. We denote the initial values of the primal and dual residuals as rb0 and rc0 , respectively, as rb0 = b − Ax0 , rc0 = c − AT y 0 − s0 . Then the feasibility equations for (Pν ) and (Dν ) are given by Ax = b − νrb0 ,

T

A y+s = c−

x≥0

νrc0 ,

(14)

s ≥ 0.

(15)

x≥0

(16)

and those of (Pν + ) and (Dν + ) by T

Ax = b − ν + rb0 ,

A y+s = c−

ν + rc0 ,

s ≥ 0.

(17)

To get iterates that are feasible for (Pν + ) and (Dν + ) we need search directions ∆f x, ∆f y and ∆f s such that A(x + ∆f x) = b − ν + rb0

AT (y + ∆f y) + (s + ∆f s) = c − ν + rc0 . Since x is feasible for (Pν ) and (y, s) is feasible for (Dν ), it follows that ∆f x, ∆f y and ∆f s should satisfy A∆f x = (b − Ax) − ν + rb0 = νrb0 − ν + rb0 = θνrb0

AT ∆f y + ∆f s = (c − AT y − s) − ν + rc0 = νrc0 − ν + rc0 = θνrc0 . Therefore, the following system is used to define ∆f x, ∆f y and ∆f s:

T

f

A∆f x = θνrb0

(18)

θνrc0

(19)

f

A ∆ y+∆ s = f

f

s∆ x + x∆ s = µe − xs,

(20)

and after the feasibility step the iterates are given by xf = x + ∆f x,

(21)

f

∆f y,

(22)

f

∆f s.

(23)

y =y+ s =s+

9

We conclude that after the feasibility step the iterates satisfy the affine equations in (11) and f f (12), with ν = ν + . The hard part √ in the analysis will be to guarantee that x and s are positive f f + and satisfy δ(x , s ; µ ) ≤ 1/ 2.

After the feasibility step we perform centering steps in order to get iterates (x+ , y + , s+ ) that T satisfy x+ s+ = nµ+ and δ(x+ , s+ ; µ+ ) ≤ τ . By using Corollary 2.4, the required√number of centering steps can easily be obtained. Indeed, assuming δ = δ(xf , sf ; µ+ ) ≤ 1/ 2, after k centering steps we will have iterates (x+ , y + , s+ ) that are still feasible for (Pν + ) and (Dν + ) and that satisfy  k 1 2 + + + δ(x , s ; µ ) ≤ √ . 2 Thus, δ(x+ , s+ ; µ+ ) ≤ τ if k satisfies

 which gives

3.4

1 √ 2

2k

≤ τ,

  1 k ≥ log2 log2 2 . τ

(24)

The algorithm

A more formal description of the algorithm is given in Figure 2. Note that after each iteration the residuals and the duality gap are reduced by a factor 1 − θ. The algorithm stops if the norms of the residuals and the duality gap are less than the accuracy parameter ε.

4

An analysis of the algorithm

Let x, y and s denote the iterates at the start of an iteration, and assume δ(x, s; µ) ≤ τ . Recall that at the start of the first iteration this is certainly true, because then δ(x, s; µ) = 0.

4.1

The effect of the feasibility step and the choice of θ

As we established in Section 3.3, the feasibility step generates new iterates xf , y f and sf that satisfy the feasibility conditions for (Pν + ) and (Dν +√). A crucial element in the analysis is to show that after the feasibility step δ(xf , sf ; µ+ ) ≤ 1/ 2, i.e., that the new iterates are within the region where the Newton process targeting at the µ+ -centers of (Pν + ) and (Dν + ) is quadratically convergent. Defining v=

r

xs , µ

dx :=

v∆f x , x

ds :=

v∆f s , s

(25)

we have, using (20) and (25),   xs xf sf = xs + s∆f x + x∆f s + ∆f x∆f s = µe + ∆f x∆f s = µe + 2 dx ds = µ (e + dx ds ) . (26) v  Lemma 4.1 (cf. Lemma II.45 in [20]) The iterates xf , y f , sf are strictly feasible if and only if e + dx ds > 0. 10

Primal-Dual Infeasible IPM

Input: Accuracy parameter ε > 0; barrier update parameter θ, 0 < θ < 1 threshold parameter τ > 0. begin 0 0 0 0 x := x0 > 0; y := y 0 ; s := s0 >

0; x Ts = µ e; µ = µ ; T

while max x s, kb − Axk , c − A y − s ≥ ε do begin feasibility step: (x, y, s) := (x, y, s) + (∆f x, ∆f y, ∆f s); µ-update: µ := (1 − θ)µ;

centering steps:

while δ(x, s; µ) ≥ τ do (x, y, s) := (x, y, s) + (∆x, ∆y, ∆s); endwhile end end

Figure 2: The algorithm Proof: Note that if xf and sf are positive then (26) makes clear that e + dx ds > 0, proving the ‘only if’ part of the statement in the lemma. For the proof of the converse implication we introduce a step length α ∈ [0, 1], and we define xα = x + α∆f x,

y α = y + α∆f y,

sα = s + α∆f s.

We then have x0 = x, x1 = x+ and similar relations for y and s. Hence we have x0 s0 = xs > 0. We write   xα sα = (x + α∆f x)(s + α∆f s) = xs + α s∆f x + x∆f s + α2 ∆f x∆f s.

Using s∆f x + x∆f s = µe − xs gives

xα sα = xs + α (µe − xs) + α2 ∆f x∆f s. Now suppose e+dx ds > 0. From the definitions of dx and ds in (25) we deduce µdx ds = ∆f x∆f s. Hence µe + ∆f x∆f s > 0, or, equivalently, ∆f x∆f s > −µe. Substitution gives xα sα > xs + α (µe − xs) − α2 µe = (1 − α) (xs + αµe) , 11

α ∈ [0, 1].

Since (1 − α) (xs + αµe) ≥ 0 it follows that xα sα > 0 for 0 ≤ α ≤ 1. Hence, none of the entries of xα and sα vanishes for 0 ≤ α ≤ 1. Since x0 and s0 are positive, and xα and sα depend linearly on α, this implies that xα > 0 and sα > 0 for 0 ≤ α ≤ 1. Hence, x1 and s1 must be positive, proving the ‘if’ part of the statement in the lemma.   Corollary 4.2 The iterates xf , y f , sf are certainly strictly feasible if kdx ds k∞ < 1.

Proof: By Lemma 4.1 xf and sf are strictly feasible if and only if e + dx ds > 0. Since the last inequality holds if kdx ds k∞ < 1, the corollary follows.  In the sequel we denote

q ω(v) := kdx k2 + kds k2 . 1 2

This implies kdx k ≤ 2ω(v) and kds k ≤ 2ω(v), and moreover,   dTx ds ≤ kdx k kds k ≤ 12 kdx k2 + kds k2 ≤ 2ω(v)2 kdx ds k∞ ≤ kdx k kds k ≤ 2ω(v)2 .

(27)

(28) (29)

√  Lemma 4.3 If ω(v) < 1/ 2 then the iterates xf , y f , sf are strictly feasible.

√ Proof: Let ω(v) < 1/ 2. Then (29) implies that kdx ds k∞ < 1. By Corollary 4.2 this implies the statement in the lemma.  √  Assuming ω(v) < 1/ 2, which guarantees strict feasibility of the iterates xf , y f , sf , we proceed by deriving an upper bound for δ(xf , sf ; µ+ ). Recall from definition (10) that s

xf sf e 1

. δ(xf , sf ; µ+ ) = v f − f , where v f = 2 v µ+ In the sequel we denote δ(xf , sf ; µ+ ) also shortly by δ(v f ).

√ Lemma 4.4 Let ω(v) < 1/ 2. Then one has 4δ(v f )2 ≤ Proof:

2ω(v)2 2ω(v)2 θ2 n + + (1 − θ) . 1−θ 1−θ 1 − 2ω(v)2

Using (26), after division of both sides by µ+ = (1 − θ)µ, we get 

vf

2

=

µ(e + dx ds ) e + d x ds = . + µ 1−θ

Due to Lemma 4.3 and x ds k∞ < 1, whence e + dx ds > 0. by Defining √ Corollary 4.2 we have kd√ for the moment u = e + dx ds , we have v f = u/ 1 − θ. Hence we may write

√ √

u

θu  f −1 −1

− 1 − θ u = √ + 1 − θ u − u . 2δ(v ) = √ 1−θ 1−θ 12

Therefore we have 4δ(v f )2 = = = = =

2  θ2 kuk2 + (1 − θ) u − u−1 + 2θuT u − u−1 1−θ  2 

2 θ + 2θ kuk2 + (1 − θ) u − u−1 − 2θuT u−1 1−θ  2 

2 θ + 2θ eT (e + dx ds ) + (1 − θ) u − u−1 − 2θn 1−θ   2

2  θ + 2θ n + dTx ds + (1 − θ) u − u−1 − 2θn 1−θ  2 

2 θ θ2 n + + 2θ dTx ds + (1 − θ) u−1 − u . 1−θ 1−θ

The last term can be reduced as follows. 

−1

u − u 2 = eT e + dx ds +

 n X 1 e T − 2e = dx ds + −n e + dx ds 1 + dxi dsi i=1  n  n X X 1 dxi dsi T = dx ds + − 1 = dTx ds − . 1 + dxi dsi 1 + dxi dsi i=1

i=1

Substitution gives θ2 n 4δ(v f )2 = + 1−θ =

θ2 n

+

1−θ



θ2 + 2θ 1−θ

dTx ds 1−θ



dTx ds + (1 − θ) dTx ds −

− (1 − θ)

n X i=1

dxi dsi . 1 + dxi dsi

n X i=1

dx i ds i 1 + dxi dsi

!

Hence, using (28) and (29), we arrive at n

4δ(v f )2 ≤

X |dxi dsi | 2ω(v)2 θ2 n + + (1 − θ) 1−θ 1−θ 1 − 2ω(v)2 i=1

n 1 − θ X dx 2i + ds 2i ≤ + + 1−θ 1−θ 2 1 − 2ω(v)2

θ2 n

2ω(v)2

i=1

2ω(v)2 2ω(v)2 θ2 n + + (1 − θ) , = 1−θ 1−θ 1 − 2ω(v)2

which completes the proof.



√ Because we need to have δ(v f ) ≤ 1/ 2, it follows from Lemma 4.4 that it suffices to have 2ω(v)2 2ω(v)2 θ2 n + + (1 − θ) ≤ 2. 1−θ 1−θ 1 − 2ω(v)2

(30)

We conclude this section by presenting a value that we do not allow ω(v) to exceed and by choosing a value for θ so that inequality (30) is satisfied.

13

Lemma 4.5 Let ω(v) ≤

1 2

and α θ=√ , 2n

0≤α≤

r

n . n+1

 Then the iterates xf , y f , sf are strictly feasible and δ(v f ) ≤

(31)

√1 . 2

 Proof: Since ω(v) ≤ 12 , the iterates xf , y f , sf are certainly strictly feasible, due to Lemma √ 4.3. As we just established, δ(v f ) ≤ 1/ 2 holds if inequality (30) is satisfied. The left-hand side in this inequality is monotonically increasing in ω(v). By substituting ω(v) = 12 , the inequality reduces to 1 θ2 n + + (1 − θ) ≤ 2. 1 − θ 2 (1 − θ) Using that θ ≥ 0, one easily verifies that this is equivalent to θ≤√

1 , 2n + 2

which is in agreement with (31). Thus the lemma has been proved.



We proceed by considering the vectors dx and ds more in detail in order to obtain an upper bound for ω(v).

4.2

The scaled search directions dx and ds

One may easily check that the system (18)–(20), which defines the search directions ∆f x, ∆f y and ∆f s, can be expressed in terms of the scaled search directions dx and ds as follows.

A¯T

where

¯ x = θνr0 , Ad b

(32)

+ ds = θνvs−1 rc0 , µ dx + ds = v −1 − v,

(33)

∆f y

A¯ = AV −1 X,

V = diag (v),

X = diag (x).

(34) (35)

If rb0 and rc0 are zero, i.e., if the initial iterates are feasible, then dx and ds are orthogonal vectors, since then the vector dx belongs to the null space and ds to the row space of the ¯ It follows that dx and ds form an orthogonal decomposition of the vector v −1 − v. matrix A. As a consequence we then have obvious upper bounds for the norms of dx and ds , namely kdx k ≤ 2δ(v) and kds k ≤ 2δ(v), and moreover, ω(v) = δ(v), with ω(v) as defined in (27).

In the infeasible case orthogonality of dx and ds may not be assumed, however, and the situation may be quite different. This is illustrated in Figure 3. As a consequence, it becomes much harder to get upper bounds for kdx k and kds k, thus complicating the analysis of the algorithm in comparison with feasible IPMs. To obtain an upper bound for ω(v) is the subject of several subsections to follow.

14

dx

0

90o

2ω(v)

2δ(v)

θ ν v s −1 rc0 +

v −1 − v



A¯T η

: η ∈ Rm

ds



b

ξ : ¯ Aξ = θν r 0

q

Figure 3: Geometric interpretation of ω(v).

4.3

An upper bound for ω(v)

Let us denote the null space of the matrix A¯ as L. So,  ¯ =0 . L := ξ ∈ Rn : Aξ  ¯ = θνr0 equals dx +L. Note that due to a well-known result Then the affine space ξ ∈ Rn : Aξ b from linear algebra the row space of A¯ equals the orthogonal complement L⊥ of L. Obviously ds ∈ θνvs−1 rc0 + L⊥ . Also note that L ∩ L⊥ = {0}, and as a consequence the affine spaces dx + L and ds + L⊥ meet in a unique point. This point is denoted by q. Lemma 4.6 Let q be the (unique) point in the intersection of the affine spaces dx + L and ds + L⊥ . Then q 2ω(v) ≤ kqk2 + (kqk + 2δ(v))2 .

Proof: To simplify the notation in this proof we denote r = v −1 − v. Since L + L⊥ = Rn , there exist q1 , r1 ∈ L and q2 , r2 ∈ L⊥ such that q = q1 + q2 ,

r = r1 + r2 .

On the other hand, since dx − q ∈ L and ds − q ∈ L⊥ there must exists ℓ1 ∈ L and ℓ2 ∈ L⊥ such that dx = q + ℓ1 , ds = q + ℓ2 . 1

15

Due to (34) it follows that r = 2q + ℓ1 + ℓ2 , which implies (2q1 + ℓ1 ) + (2q2 + ℓ2 ) = r1 + r2 . Since the decomposition L + L⊥ = Rn is unique, we conclude that ℓ1 = r1 − 2q1 ,

ℓ2 = r2 − 2q2 .

Hence we obtain dx = q + r1 − 2q1 = (r1 − q1 ) + q2 ds = q + r2 − 2q2 = q1 + (r2 − q2 ). Since the spaces L and L⊥ are orthogonal we conclude from this that 4ω(v)2 = kdx k2 + kds k2 = kr1 − q1 k2 + kq2 k2 + kq1 k2 + kr2 − q2 k2 = kq − rk2 + kqk2 . Assuming q 6= 0, since krk = 2δ(v) the right-hand side is maximal if r = −2δ(v)q/ kqk, and thus we obtain

  2

2δ(v) 2 2 2 2

4ω(v) ≤ 1 + q

+ kqk = kqk + (kqk + 2δ(v)) , kqk

which implies the inequality in the lemma if q 6= 0. Since the inequality in the lemma holds with equality if q = 0, this completes the proof.  In the sequel we denote δ(v) simply as δ. Recall from Lemma 4.5 that in order to guarantee that δ(v + ) ≤ √12 we want to have ω(v) ≤ 21 . Due to Lemma 4.6 this will certainly hold if kqk satisfies kqk2 + (kqk + 2δ)2 ≤ 1. (36)

4.4

An upper bound for kqk

Recall from Lemma 4.6 that q is the (unique) solution of the system ¯ = θνr0 , Aq b A¯T ξ + q = θνvs−1 rc0 . We proceed by deriving an upper bound for kqk. Before doing this we have to specify our initial iterates (x0 , y 0 , s0 )). These are chosen in the usual way. So, we assume that ζ > 0 is such that kx∗ + s∗ k∞ ≤ ζ for some optimal solutions x∗ of (P ) and (y ∗ , s∗ ) of (D), and we start the algorithm with x0 = s0 = ζe, y 0 = 0, µ0 = ζ 2 . We have the following result. Lemma 4.7 One has



µ kqk ≤ θν ζ

r

16

eT

x s

+

s . x

(37)

Proof:

√ From the definition (35) of A¯ we deduce that A¯ = µ AD, where  −1  r   xv x √ µ vs−1 . D = diag = diag = diag √ µ s

For the moment, let us write

rb = θνrb0 ,

rc = θνrc0

Then the system defining q is equivalent to √ µ AD q = rb , 1 √ µ DAT ξ + q = √ Drc . µ This implies µ AD2 AT ξ = AD2 rc − rb , whence ξ= Substitution gives

 −1 1 AD2 rc − rb . AD2 AT µ

 −1 1  q=√ Drc − DAT AD2 AT AD2 rc − rb . µ

Observe that

q1 := Drc − DAT AD2 AT

−1

  −1 AD2 rc = I − DAT AD2 AT AD Drc

is the orthogonal projection of Drc onto the null space of AD. Let (¯ y , s¯) be such that AT y¯+¯ s = c. Then we may write   rc = θνrc0 = θν c − AT y 0 − s0 = θν AT (¯ y − y 0 ) + s¯ − s0 .

Since DAT (¯ y − y 0 ) belongs to the row space of AD, which is orthogonal to the null space of AD, we obtain

 kq1 k ≤ θν D s¯ − s0 .

On the other hand, let x ¯ be such that A¯ x = b. Then

x − x0 ), rb = θνrb0 = θν(b − Ax0 ) = θνA(¯ and the vector  −1 ¯ − x0 AD D−1 x rb = θνDAT AD2 AT  is the orthogonal projection of θνD−1 x ¯ − x0 onto the row space of AD. Hence it follows that

 kq2 k ≤ θν D−1 x ¯ − x0 . q2 := DAT AD2 AT

Since



−1

µ q = q1 + q2 and q1 and q2 are orthogonal, we may conclude that q q √ µ kqk = kq1 k2 + kq2 k2 ≤ θν kD (¯ s − s0 )k2 + kD−1 (¯ x − x0 )k2 , 17

(38)

where, as always, µ = µ0 ν. We are still free to choose x ¯ and s¯, subject to the constraints A¯ x = b and AT y¯ + s¯ = c. Taking ∗ ∗ 0 0 x ¯ = x and s¯ = s the entries of the vectors x − x ¯ and s − s¯ satisfy 0 ≤ x0 − x ¯ ≤ ζe,

0 ≤ s0 − s¯ ≤ ζe.

Thus it follows that r  q q x s 2 2 2 2 0 −1 0 −1 kD (¯ s − s )k + kD (¯ x − x )k ≤ ζ kDek + kD ek = ζ eT + . s x Substitution into (38) gives √

µ kqk ≤ θν ζ

proving the lemma.

r

eT

x s

+

s , x



To proceed we need upper bounds for the elements of the vectors x/s and s/x.

4.5

Some bounds for x and s. The choice of τ and α

Recall that x is feasible for (Pν ) and (y, s) for (Dν ) and, moreover δ(x, s; µ) ≤ τ , i.e., these iterates are close to the µ-centers of (Pν ) and (Dν ). Based on this information we need to estimate the sizes of the entries of the vectors x/s and s/x. Since the concerning analysis does not belong to the mainstream of the paper we have moved this analysis to the Appendix. Based on this analysis we choose 1 (39) τ= . 8 Then, by Corollary A.10, we have r r x √ x(µ, ν) s √ s(µ, ν) ≤ 2 √ , ≤ 2 √ . µ s x µ Substitution into (37) gives v u u √ µ kqk ≤ θν ζ t2eT

This implies

! x(µ, ν)2 s(µ, ν)2 + . µ µ

√ q µ kqk ≤ θνζ 2 kx(µ, ν)k2 + ks(µ, ν)k2 .

Therefore, also using µ = µ0 ν = ζ 2 ν and θ = √α2n , we obtain the following upper bound for the norm of q: q α kx(µ, ν)k2 + ks(µ, ν)k2 . kqk ≤ √ ζ n

At this stage we define

κ(ζ, ν) =

q

kx(µ, ν)k2 + ks(µ, ν)k2 √ , ζ 2n 18

0 < ν ≤ 1, µ = µ0 ν

(40)

and κ ¯ (ζ) = max κ(ζ, ν).

(41)

0 0 is such that kx∗ + s∗ k∞ ≤ ζ for some optimal solutions x∗ of (P ) and (y ∗ , s∗ ) of (D) then after at most

 √ max nζ 2 , rb0 , rc0 16 2 n log ε inner iterations the algorithm finds an ε-solution of (P ) and (D).

A basic question is of course how to choose the number ζ, which determines the initial iterates and has to be fixed before starting the algorithm. A related question that we did not yet deal with is wether or not our algorithm can detect infeasibility (or unboundedness) of (P ) and (D). These issues can be dealt with as follows. It is well known that if (P ) and (D) are feasible then there exist optimal solutions x∗ and (y ∗ , s∗ ) of (P ) and (D) such that kx∗ + s∗ k∞ ≤ 2L , where L denotes the binary input size of (P ) and (D). The number L can be computed straightforwardly from the input data A, b and c. Thus, when starting the algorithm with ζ = 2L , after at most



 √ max n4L , b − 2L Ae , c − 2L e 16 2 n log ε iterations the algorithm finds an ε-solution if it exists. Otherwise we must decide that (P ) and (D) are infeasible or unbounded.

Working with the number L may not be possible in practice, however, since this number can be very large. For such cases it may be worth noting that if (P ) and (D) are infeasible or unbounded then Lemma 3.1 implies that (Pν ) and (Dν ) do not satisfy the IPC for all√ν ∈ (0, ν¯] for some ν¯ ∈ (0, 1). If the iterates after the feasibility step satisfy δ(xf , sf ; µ+ ) ≤ 1/ 2 we are sure that the perturbed problems corresponding to ν = µ+ /µ0 satisfy the IPC, and hence are strictly feasible. So we √ then have ν ≥ ν¯. On the other hand, if ν < ν¯ the algorithm will find that δ(xf , sf ; µ+ ) > 1/ 2, which implies that there are no optimal solutions x∗ and (y ∗ , s∗ ) such that ζ ≥ kx∗ + s∗ k∞ . We can then run the algorithm again with ζ := 2ζ. If necessary this can be repeated. When starting with ζ = 1, after doubling the value of ζ at most L times the algorithm must have found optimal solutions of (P ) and (D) if these exist. Otherwise (P ) and (D) are infeasible or unbounded.

5

Concluding remarks

The current paper shows that the techniques that have been developed in the field of feasible full-Newton step IPMs, and which have now been known for almost twenty years, are sufficient to get a full-Newton step IIPM whose complexity is at least as good as the currently best known performance of IIPMs. Following a well-known metaphor of Isaac Newton4 , it looks as if a 4 “ I do not know what I may appear to the world; but to myself I seem to have been only like a boy playing on the sea-shore, and diverting myself in now and then finding a smoother pebble or a prettier shell than ordinary, whilst the great ocean of truth lay all undiscovered before me” [27, p. 863].

21

“smooth pebble or pretty shell on the sea-shore of IPMs” has been overlooked for a surprisingly long time. It is worth mentioning that we found computational evidence for the following conjecture. Conjecture 5.1 If (P ) and (D) are feasible and ζ ≥ kx∗ + s∗ k∞ for some pair of optimal solutions x∗ and (y ∗ , s∗ ), then κ ¯ (ζ) = 1. The evidence was provided by a simple Matlab implementation of our algorithm. As input we used a primal-dual pair of randomly generated feasible problems with known optimal solutions x∗ and (y ∗ , s∗ ), and ran the algorithm with ζ = kx∗ + s∗ k∞ . This was done for various sizes of the problems and for at least 105 instances. No counterexample for the conjecture was found. Typically the graph of κ(ζ, ν), as a function of ν is as depicted in Figure 4 below. The importance κ(ζ, ν) 1.0 0.8 0.6 0.4 0.2 ν

0

0 0.2 0.4 0.6 0.8 1.0 Figure 4: Typical behavior of κ(ζ, ν) as a function of ν. of the conjecture is√evident. Its trueness would reduce the currently best iteration bound for IIPMs by a factor 2n. It may be clear that the full-Newton step method presented in this paper may not be efficient from a practical point of view, just as the feasible IPMs with the best theoretical performance are far from practical. But just as in the case of feasible IPMs one might expect that computationally efficient large-update methods for IIPMs can be designed whose theoretical complexity is not √ worse than n times the iteration bound in this paper. Even better results for large-update methods might be obtained by changing the search direction, by using methods that are based on kernel functions, as presented in [3, 17]. This requires further investigation. Also extensions to second-order cone optimization, semidefinite optimization, linear complementarity problems, etc. seem to be within reach.

Acknowledgements Hossein Mansouri, PhD student in Delft, forced the author to become more familiar with IIPMs when, in September 2004, he showed him a draft of a paper on large-update IIPMs based on kernel functions. Thank you, Hossein! I am also much indebted to Jean-Philippe Vial. He found a serious flaw in an earlier version of this paper. Thanks are also due Kurt Anstreicher, Florian Potra, Shinji Mizuno and other colleagues for some useful discussions during the Oberwolfach meeting Optimization and Applications in January 2005. I also thank Yanqin Bai for pointing out some typos in an earlier version of the paper. Finally I kindly acknowledge the useful

22

comments of two anonymous referees that were very helpful in improving the readability of the paper.

References [1] E. D. Andersen and Y. Ye. A computational study of the homogeneous algorithm for large-scale convex optimization. Computational Applications and Optimization, 10:243–269, 1998. [2] E.D. Andersen and K.D. Andersen. The MOSEK interior-point optimizer for Linear Programming: an implementation of the homogeneous algorithm. In S. Zhang, H. Frenk, C. Roos, and T. Terlaky, editors, High Performence Optimization Techniques, pages 197–232. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999. [3] Y.Q. Bai, M. El Ghami, and C. Roos. A comparative study of kernel functions for primal-dual interior-point algorithms in linear optimization. SIAM Journal on Optimization, 15(1):101–128, 2004. [4] S.C. Billups and M.C. Ferris. Convergence of an infeasible interior-point algorithm from arbitrary positive starting points. SIAM J. Optim., 6(2):316–325, 1996. [5] J.F. Bonnans and F.A. Potra. On the convergence of the iteration sequence of infeasible path following algorithms for linear complementarity problems. Mathematics of Operations Research, 22(2):378–407, 1997. [6] R.M. Freund. A potential-function reduction algorithm for solving a linear program directly from an infeasible ’warm start’. Mathematical Programming, 52:441–466, 1991. [7] B. Jansen, C. Roos, and T. Terlaky. A polynomial Dikin–type primal–dual algorithm for linear programming. Mathematics of Operations Research, 21:341–353, 1996. [8] N. Karmarkar. New polynomial-time algorithm for linear programming. Combinattorica 4, pages 373–395, 1984. [9] M. Kojima, N. Megiddo, and S.Mizuno. A primal-dual infeasible-interior-point algorithm for linear programming. Mathematical Programming 61, pages 263–280, 1993. [10] M. Kojima, T. Noma, and A. Yoshise. Global convergence in infeasible-interior-point algorithms. Mathematical Programming, Series A, 65:43–72, 1994. [11] I. J. Lustig. Feasible issues in a primal-dual interior-point method. Mathematical Programming 67, pages 145–162, 1990/91. [12] S. Mizuno. Polynomiality of infeasible-interior-point algorithms for linear programming. Mathematical Programming 67, pages 109–119, 1994. [13] S. Mizuno, M.J. Todd, and Y. Ye. A surface of analytic centers and infeasible- interior-point algorithms for linear programming. Mathematics of Operations Research, 20:135–162, 1995. [14] R.D.C. Monteiro, I. Adler, and M.G.C. Resende. A polynomial-time primal-dual affine scaling algorithm for linear and convex quadratic programming and its power series extension. Mathematics of Operations Research, 15:191–214, 1990. [15] Yu. Nesterov, M. J. Todd, and Y. Ye. Infeasible-start primal-dual methods and infeasibility detectors for nonlinear programming problems. Mathematical Programming, 84(2, Ser. A):227–267, 1999. [16] Yu. Nesterov and M.J. Todd. On the Riemannian geometry defined by self-concordant barriers and interior-point methods. Found. Comput. Math., 2(4):333–361, 2002. [17] J. Peng, C. Roos, and T. Terlaky. Self-Regularity. A New Paradigm for Primal-Dual Interior-Point Algorithms. Princeton University Press, 2002.

23

[18] F. A. Potra. A quadratically convergent predictor-corrector method for solving linear programs from infeasible starting points. Mathematical Programming, 67(3, Ser. A):383–406, 1994. [19] F. A. Potra. An infeasible-interior-point predictor-corrector algorithm for linear programming. SIAM Journal on Optimization, 6(1):19–32, 1996. [20] C. Roos, T. Terlaky, and J.-Ph. Vial. Theory and Algorithms for Linear Optimization. An InteriorPoint Approach. John Wiley & Sons, Chichester, UK, 1997. [21] M. Salahi, T. Terlaky, and G. Zhang. The complexity of self-regular proximity based infeasible IPMs. Technical Report 2003/3, Advanced Optimization Laboratory, Mc Master University, Hamilton, Ontario, Canada, 2003. [22] R. Sheng and F. A. Potra. A quadratically convergent infeasible-interior-point algorithm for LCP with polynomial complexity. SIAM Journal on Optimization, 7(2):304–317, 1997. [23] J.F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods & Software, 11:625–653, 1999. [24] K. Tanabe. Centered Newton method for linear programming: Interior and ’exterior’ point method (in Janpanese). In: New Methods for Linear Programming. K. Tone (Ed.) 3, pages 98–100, 1990. [25] M. J. Todd and Y. Ye. Approximate Farkas lemmas and stopping rules for iterative infeasible-point algorithms for linear programming. Mathematical Programming, 81(1, Ser. A):1–21, 1998. [26] R.J. Vanderbei. Linear Programming: Foundations and Extensions. Kluwer Academic Publishers, Boston, USA, 1996. [27] R. S. Westfall. Never at Rest. A biography of Isaac Newton. Cambridge University Press, Cambridge, UK, 1964. [28] S. J. Wright. An infeasible-interior-point algorithm for linear complementarity problems. Mathematical Programming, 67(1):29–52, 1994. [29] S.J. Wright. A path-following infeasible-interior-point algorithm for linear complementarity problems. Optimization Methods & Software, 2:79–106, 1993. [30] S.J. Wright. Primal-Dual Interior-Point Methods. SIAM, Philadelphia, 1996. √ [31] F. Wu, S. Wu, and Y. Ye. On quadratic convergence of the O( nL)-iteration homogeneous and self-dual linear programming algorithm. Annals of Operations Research, 87:393–406, 1999. [32] Y. Ye. Interior Point Algorithms, Theory and Analysis. John Wiley & Sons, Chichester, UK, 1997. √ [33] Y. Ye, M.J. Todd, and S. Mizuno. An O( nL)-iteration homogeneous and self-dual linear programming algorithm. Mathematics of Operations Research, 19:53–67, 1994. [34] Y. Zhang. On the convergence of a class of infeasible-interior-point methods for the horizantal linear complementarity problem. SIAM Journal on Optimization, 4:208–227, 1994.

A

Some technical lemmas

Given a strictly primal feasible point x of (P ) and a strictly dual feasible point (y, s) of (D), and µ > 0, let r n X  1 2 xi si Φ (xs; µ) := Ψ(v) := , ψ(t) := t − 1 − log t2 . ψ(vi ), vi := µ 2 i=1

It is well known that ψ(t) is the kernel function of the primal-dual logarithmic barrier function, which, up to some constant, is the function Φ (xs; µ) (see, e.g., [3]).

24

Lemma A.1 One has Φ (xs; µ) = Φ (xs(µ); µ) + Φ (x(µ)s; µ) . Proof: The equality in the lemma is equivalent to  X  X  n  n  n  X xi (µ)si xi si (µ) xi si xi (µ)si xi si (µ) xi si − 1 − log = − 1 − log + − 1 − log . µ µ µ µ µ µ i=1 i=1 i=1 Since log

xi si xi si xi si (µ) xi (µ)si xi si = log = log + log = log + log , µ xi (µ)si (µ) xi (µ) si (µ) µ µ

the lemma holds if and only if   xT s − nµ = xT s(µ) − nµ + sT x(µ) − nµ .

Using that x(µ)s(µ) = µe, whence x(µ)T s(µ) = nµ, this can be written as (x − x(µ))T (s − s(µ)) = 0, which holds if the vectors x − x(µ) and s − s(µ) are orthogonal. This is indeed the case, because x − x(µ) belongs to the null space and s − s(µ) to the row space of A. This proves the lemma.  Lemma A.2 One has ψ

r

xi xi (µ)



≤ Ψ(v),

ψ

r

si si (µ)



≤ Ψ(v),

i = 1, . . . , n.

Proof: By Lemma A.1 we have Φ (xs; µ) = Φ (xs(µ); µ) + Φ (x(µ)s; µ). Since Φ (xs; µ) is always nonnegative, also Φ (xs(µ); µ) and Φ (x(µ)s; µ) are nonnegative. Thus it follows that Φ (xs(µ); µ) ≤ Ψ(v) and Φ (x(µ)s; µ) ≤ Ψ(v). The first of these two inequalities gives s ! n X xi si (µ) ≤ Ψ(v). ψ Φ (xs(µ); µ) = µ i=1 Since ψ(t) ≥ 0, for every t > 0, it follows that s ! xi si (µ) ψ ≤ Ψ(v), µ

i = 1, . . . , n.

Due to x(µ)s(µ) = µe, we have xi si (µ) xi xi si (µ) = = , µ xi (µ)si (µ) xi (µ) whence we obtain the first inequality in the lemma. The second inequality follows in the same way.  Note that ψ(t) is monotonically decreasing for t ≤ 1 and monotonically increasing for t ≥ 1. In the sequel we denote by ̺ : [0, ∞) → [1, ∞) the inverse function of ψ(t) for t ≥ 1 and by χ : [0, ∞) → (0, 1] the inverse function of ψ(t) for t ≤ 1. So we have ̺(s) = t



s = ψ(t),

s ≥ 0, t ≥ 1.

(44)

s ≥ 0, 0 < t ≤ 1.

(45)

and χ(s) = t



s = ψ(t),

Note that χ(s) is monotonically decreasing and ̺(s) is monotonically increasing in s ≥ 0. Lemma A.3 Let t > 0 and ψ(t) ≤ s. Then χ (s) ≤ t ≤ ̺ (s) .

25

Proof: This is almost obvious. Since ψ(t) is strictly convex and minimal at t = 1, with ψ(1) = 0, ψ(t) ≤ s implies that t belongs to a closed interval whose extremal points are χ(s) and ̺(s).  Corollary A.4 One has χ (Ψ(v)) ≤ Proof:

r

xi ≤ ̺ (Ψ(v)) , xi (µ)

χ (Ψ(v)) ≤

r

si ≤ ̺ (Ψ(v)) . si (µ)

This is immediate from Lemma A.3 and Lemma A.2.



Lemma A.5 One has r Proof:

r

̺ (Ψ(v)) x(µ) x ≤ √ , s χ (Ψ(v)) µ

̺ (Ψ(v)) s(µ) s ≤ √ . x χ (Ψ(v)) µ

Using that xi (µ)si (µ) = µ, for each i, Corollary A.4 implies s s p r ̺ (Ψ(v)) xi (µ) ̺ (Ψ(v)) x2i (µ) ̺ (Ψ(v)) xi (µ) xi ̺ (Ψ(v)) xi (µ) p ≤ = = = √ , si χ (Ψ(v)) s (µ) χ (Ψ(v)) µ χ (Ψ(v)) µ χ (Ψ(v)) si (µ) i

which implies the first inequality. The second inequality is obtained in the same way.



We proceed by deriving an upper bound for Ψ(v) in terms of δ(v). First we deal with a simple lemma.  Lemma A.6 Let t ≥ 1. Then ψ(t) − ψ 1t ≥ 0.  Proof: Define f (t) := ψ(t) − ψ 1t for t > 0. Then 1 f (t) = t − − t ′



1 −t t



−1 = t2



1 t− t



1 1− 2 t



2 t2 − 1 = ≥ 0. t3

It follows that f (t) is monotonically increasing for t > 0. Since f (1) = 0 this proves that f (t) ≥ 0 for t ≥ 1, and hence the lemma follows.  Theorem A.7 Let δ(v) be as defined in (10) and ρ(δ) as in (46). Then Ψ(v) ≤ ψ (ρ(δ(v))) . Proof: The statement in the lemma is obvious if v = e since then δ(v) = Ψ(v) = 0 and since ρ(0) = 1 and ψ(1) = 0. Otherwise we have δ(v) > 0 and Ψ(v) > 0. To deal with the nontrivial case we consider, for τ > 0, the problem ( ) n n X X 2 ′ 2 2 1 ψ(vi ) : δ(v) = 4 zτ = max Ψ(v) = ψ (vi ) = τ . v

i=1

i=1

The first order optimality conditions are ψ ′ (vi ) = λψ ′ (vi )ψ ′′ (vi ),

i = 1, . . . , n,

where λ ∈ R. From this we conclude that we have either ψ ′ (vi ) = 0 or λψ ′′ (vi ) = 1, for each i. Since ψ ′′ (t) is monotonically decreasing, this implies that all vi ’s for which λψ ′′ (vi ) = 1 have the same value. Denoting this value as t, and observing that all other coordinates have value 1 (since ψ ′ (vi ) = 0 for these coordinates), we conclude that for some k, and after reordering the coordinates, v has the form v = (t, . . . , t, 1, . . . , 1 ). | {z } | {z } k times

26

n−k times

2

Since ψ ′ (1) = 0, δ(v) = τ implies kψ ′ (t) = 4τ 2 . Since ψ ′ (t) = t − 1/t, it follows that t−

2τ 1 = ±√ , t k

√ √ which gives t = ρ(τ / k) or t = 1/ρ(τ / k), where ρ : R+ → [1, ∞) is defined by p ρ(δ) := δ + 1 + δ 2 .

(46)

By Lemma A.6, the first value, which is greater than 1, gives the largest value of ψ(t). Since we are √ maximizing Ψ(v), we conclude that t = ρ(τ / k), whence we have    τ . Ψ(v) = k ψ ρ √ k The question remains which value of k maximizes Ψ(v). To investigate this we take the derivative of Ψ(v) with respect to k. To simplify the notation we write Ψ(v) = k ψ (t) , The definition of t implies t = s +

t = ρ (s),

τ s= √ . k

√ 1 + s2 . This gives (t − s)2 = 1 + s2 , or t2 − 1 = 2st, whence we have 2s = t −

1 = ψ ′ (t). t

Some straightforward computations now yield s2 ρ(s) d Ψ(v) =: f (τ ). = ψ (t) − √ dk 1 + s2 We consider this derivative as a function of τ , as indicated. One may easily verify that f (0) = 0. We proceed by computing the derivative with respect to τ . This gives 1 s2 √ f ′ (τ ) = − √ k (1 + s2 ) 1 + s2 This proves that f ′ (τ ) ≤ 0. Since f (0) = 0, it follows that f (τ ) ≤ 0, for each τ ≥ 0. Hence we conclude that Ψ(v) is decreasing in k. So Ψ(v) is maximal when k = 1, which gives the result in the theorem.  Corollary A.8 Let τ ≥ 0, δ(v) ≤ τ and ρ(δ) as defined in (46). Then Ψ(v) ≤ τ ′ , where τ ′ := ψ (ρ(τ )) .

(47)

Proof: Since ρ(δ) is monotonically increasing in δ, and ρ(δ) ≥ 1 for all δ ≥ 0, and, moreover, ψ(t) is monotonically increasing if t ≥ 1, the function ψ (ρ(δ)) is increasing in δ, for δ ≥ 0. Thus the result is immediate from Theorem A.7.  Theorem A.9 Let τ ≥ 0 and δ(v) ≤ τ and τ ′ as defined in (47). Then r r x s ̺ (τ ′ ) x(µ) ̺ (τ ′ ) s(µ) ≤ , ≤ √ √ . ′ s χ (τ ) µ x χ (τ ′ ) µ Proof: Since ̺(t) is monotonically increasing and χ(t) monotonically decreasing this is an immediate consequence of Lemma A.5 and Corollary A.8. 

27

Corollary A.10 Let τ =

1 8

and δ(v) ≤ τ . Then r

Proof:

r

x √ x(µ) ≤ 2 √ , s µ

√ s(µ) s ≤ 2 √ . µ x

If τ = 18 , then τ ′ ≈ 0.016921, ̺ (τ ′ ) ≈ 1.13278 and χ (τ ′ ) ≈ 0.872865, whence √ ̺ (τ ′ ) 2. ≈ 1.29777 < χ (τ ′ )

Thus the result follows.



28

Suggest Documents