The convex algebraic geometry of linear inverse problems

The convex algebraic geometry of linear inverse problems The MIT Faculty has made this article openly available. Please share how this access benefit...
Author: Jemima Moore
1 downloads 3 Views 353KB Size
The convex algebraic geometry of linear inverse problems

The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.

Citation

Chandrasekaran, Venkat et al. “The Convex Algebraic Geometry of Linear Inverse Problems.” 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2010. 699–703. © Copyright 2010 IEEE

As Published

http://dx.doi.org/10.1109/ALLERTON.2010.5706975

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Version

Final published version

Accessed

Sun Jan 15 20:59:39 EST 2017

Citable Link

http://hdl.handle.net/1721.1/72963

Terms of Use

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Detailed Terms

Forty-Eighth Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 29 - October 1, 2010

The Convex Algebraic Geometry of Linear Inverse Problems Venkat Chandrasekaran, Benjamin Recht, Pablo A. Parrilo, and Alan S. Willsky

Abstract— We study a class of ill-posed linear inverse problems in which the underlying model of interest has simple algebraic structure. We consider the setting in which we have access to a limited number of linear measurements of the underlying model, and we propose a general framework based on convex optimization in order to recover this model. This formulation generalizes previous methods based on `1 -norm minimization and nuclear norm minimization for recovering sparse vectors and low-rank matrices from a small number of linear measurements. For example some problems to which our framework is applicable include (1) recovering an orthogonal matrix from limited linear measurements, (2) recovering a measure given random linear combinations of its moments, and (3) recovering a low-rank tensor from limited linear observations.

I. I NTRODUCTION A fundamental principle in the study of ill-posed inverse problems is the incorporation of prior information. Such prior information typically specifies some sort of simplicity present in the model. This simplicity could concern the number of genes present in a signature for disease, correlation structures in time series data, or geometric constraints on molecular configurations. Unfortunately, naive mathematical characterizations of model simplicity often result in intractable optimization problems. This paper formalizes a methodology for transforming notions of simplicity into convex penalty functions. These penalty functions are induced by the convex hulls of simple models. We show that algorithms resulting from these formulations allow for information extraction with the number of measurements scaling as a function of the number of latent degrees of freedom of the simple model. We provide a methodology for estimating these scaling laws and discuss several examples. We focus on situations in which the basic models are semialgebraic, in which case the regularized inverse problems can be solved or approximated by semidefinite programming. Specifically suppose that there is an object of interest x ∈ R p that is formed as the linear combination of a small number of elements of a basic semi-algebraic set A : k

x = ∑ αi Ai ,

Ai ∈ A

i=1

Here we typically assume that k is relatively small. For example the set A may be the algebraic variety of unitnorm 1-sparse vectors or of unit-norm rank-1 matrices, in VC, PP, and AW are with the Laboratory for Information and Decision Systems, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139. BR is with the Computer Sciences Department, University of Wisconsin, Madison, WI, 53706.

978-1-4244-8216-0/10/$26.00 ©2010 IEEE

which case x is either a k-sparse vector or a rank-k matrix. Further, we do not have access to x directly but to some set of linear measurements of x given by a matrix M ∈ Rn×p : b = Mx, with n  p. Main Question Is it possible to recover x from the limited information b? Of course, this question is in general computationally intractable to solve. Therefore we propose a natural convex relaxation framework to solve this problem, akin to `1 norm minimization for recovering sparse vectors [6], [7], [3] or nuclear-norm minimization for recovering low-rank matrices [10], [16], [4]. This convex relaxation is obtained by convexifying the algebraic set A . We also describe a general framework for analyzing when this relaxation succeeds in recovering the underlying object x. A few examples of questions that can be answered by our broad formulation include: •





Given the eigenvalues of a matrix and additionally a few linear measurements of the matrix, when can we recover the matrix? When does a system of linear equations have a unique integer solution? This question is related to the integrality gap between a class of integer linear programs and their linear-programming relaxations, and has previously been studied in [8], [14]. When can we recover a measure given a few linear measurements of its moments?

In some cases the convex program obtained by convexifying the base algebraic set A may itself be intractable to solve. For such problems we briefly outline how to obtain a hierarchy of further semidefinite relaxations with increasing complexity based on the Positivstellensatz [15]. Outline We describe the problem setup and the generic construction of a convex relaxation in Section II. In Section III we give conditions under which these convex relaxations succeed in recovering the underlying lowdimensional model. Section IV provides a brief outline of the Positivstellensatz-based technique for further relaxing the convex program described in Section II. Section V concludes this paper. This paper provides an outline of our framework and a subset of the results, with the description of additional detailed analysis and example applications deferred to a longer report.

699

II. P ROBLEM S ETUP AND C ONVEX R ELAXATION

C. Examples

A. Setup A basic semialgebraic set is the solution set of a finite system of polynomial equations and polynomial inequalities [1]. We will generally consider only algebraic sets that are bounded, such as finite collections of points, or the sets of unit-norm 1-sparse vectors or of unit-norm rank-1 matrices. Given such a set A ⊆ R p let x ∈ R p be a vector formed as a linear combination of elements of A : k

x = ∑ αi Ai ,

Ai ∈ A , αi ≥ 0.

i=1

Here k is assumed to be relatively small. While x is not known directly, we have access to linear measurements of x: b = Mx, where M ∈ Rn×p with n  p. The questions that we address include: When can we recover x from the linear measurements b? Is it possible to do so tractably using convex optimization? Under what conditions does this convex program exactly recover the underlying x? In particular how many random measurements p are required for exact recovery using the convex program? In Section II-C we give several examples in which answering such questions is of interest. B. Convex relaxation Before constructing a general convex relaxation, we define the convex hull of an algebraic set. When a basic semialgebraic set A ⊂ R p is bounded, we denote its convex hull as follows: C(A ) = conv(A ). Here conv denotes convex hull of a set. In many cases of interest the set A is centrally symmetric (i.e., there exists a “center” c ∈ R p such that y−c ∈ A implies that −y−c ∈ A ). In such a situation, the set C(A ) is also centrally symmetric and induces a norm (perhaps recentered and restricted to a subspace): kykA = inf{t | y ∈ tC(A ), t ≥ 0}. Of course we implicitly assume here that the norm kykA is only evaluated for those y that lie in the affine hull of C(A ). Consequently we consider the following convex program to recover x given the limited linear measurements b: xˆ = arg min kxk ˜ A, x˜

s.t. M x˜ = b.

(1)

In the next section we present a number of examples that fit into the above setting, and in Section III we provide a simple unified analysis of conditions for exact recovery using the convex program (1).

A number of applications fit into our framework for recovering low-dimensional objects from a few linear measurements: Recovering a vector from a list Suppose that there is a vector x ∈ R p , and that we are given the entries of this vector without the locations of these entries. For example if x = [3 1 2 2 4]0 , then we are just given the list of numbers [1 2 2 3 4] without their positions in x. Further suppose that we have access to a few linear measurements of x. In such a scenario, can we recover x by solving a convex program? Such a problem is of interest in recovering rankings of elements of a set. For this problem the algebraic set A is the set of all permutations of x (which we know since we have the list of numbers that compose x), and the convex hull C(A ) is the permutahedron [19], [18]. Recovering a matrix given eigenvalues This problem is in a sense the non-commutative analog of the one above. Suppose that we have the eigenvalues λ of a symmetric matrix, but no information about the eigenvectors. Can we recover such a matrix given some additional linear measurements? In this case the set A is the set of all symmetric matrices with eigenvalues λ , and the convex hull C(A ) is given by the Schur-Horn orbitope [18]. Recovering permutation matrices Another problem of interest in a ranking context is that of recovering permutation matrices from partial information. Suppose that a small number k of rankings of p candidates is preferred by a population. By conducting surveys of this population one can obtain partial linear information of these preferred rankings. The set A here is the set of permutation matrices and the convex hull C(A ) is the Birkhoff polytope, or the set of doubly stochastic matrices [19]. The problem of recovering rankings was recently studied by Jagabathula and Shah [12] in the setting in which partial Fourier information is provided about a sparse function over the symmetric group; however, their algorithmic approach for solving this problem was not based on convex optimization. Recovering integer solutions In integer programming one is often interested in recovering vectors in which the entries take on values of ±1. Suppose that there is such a sign-vector, and we wish to recover this vector given linear measurements. This corresponds to a version of the multiknapsack problem [14]. In this case the algebraic set A is the set of set of all sign-vectors, and the convex hull C(A ) is the hypercube. The image of this hypercube under a linear map is also referred to as a zonotope [19]. Recovering orthogonal matrices In several applications matrix variables are constrained to be orthogonal, which is a non-convex constraint and may lead to computational difficulties. We consider one such simple setting in which we wish to recover an orthogonal matrix given limited information in the form of linear measurements. In this example the set A is the set p × p orthogonal matrices, and the set C(A ) is the spectral norm ball. Therefore the convex program (1) reduces to solving a spectral norm minimization

700

problem subject to linear constraints. Recovering measures Recovering a measure given its moments is another question of interest. Typically one is given some set of moments up to order d, and the goal is to recover an atomic measure that is consistent with these moments. In the limited information setting, we wish to recover a measure given a small set of linear combinations of the moments. Such a question is also of interest in system identification. The set A in this setting is the moment curve, and its convex hull goes by several names including the Caratheodory orbitope [18]. Discretized versions of this problem correspond to the set A being a finite number of points on the moment curve; the convex hull C(A ) is then a cyclic polytope [19]. Low-rank tensor decomposition Low-rank tensor decompositions play an important role in numerous applications throughout signal processing and machine learning. Developing computational tools to recover low-rank tensors is therefore of interest. However, a number of technical issues arise with low-rank tensors including the non-existence of an SVD analogous to that for matrices, and the difference between the rank of a tensor and its border rank [5]. A further computational challenge is that the tensor nuclear norm defined by convexifying the set of rank-1 tensors is in general intractable to compute. We discuss further relaxations to this tensor nuclear norm based on the Positivstellensatz in Section IV. III. E XACT R ECOVERY VIA C ONVEX R ELAXATION A. General recovery result In this section, we give a general set of sufficient conditions for exact recovery via convex relaxation. We then apply these conditions to some specific examples (in the next section) to give bounds on the number of measurements required in each of these cases for exact recovery. Our results hold for measurement matrices M ∈ Rn×p in which the entries are independent standard Gaussian variables. This is equivalent to assuming that the nullspace of the measurement operator is chosen uniformly at random from the set of (p − n)-dimensional subspaces. While such a measurement ensemble may not be suitable in some applications, our results suggest that a generic set of linear measurements suffices for recovering x via convex relaxation. Assume without loss of generality that the unknown object x to be recovered lies on the boundary of the set C(A ). It is then easily seen that exact recovery xˆ = x using the convex program (1) holds if and only if the tangent cone1 at x with respect to C(A ) has a transverse intersection with the nullspace of the operator M. More concretely, let T (x) be the tangent cone at x with respect to the convex body C(A ) and let N (M) be the nullspace of the operator M. Then we have exact recovery if and only if: N (M) ∩ T (x) = {0}. 1 Recall that the tangent cone at a point x with respect to a convex set C is the set of directions from x to any point in C.

Thus we describe here a recovery result based purely on analyzing the tangent cone at a point with respect to C(A ). Following the analysis of Rudelson and Vershynin [17] of `1 minimization for recovering sparse vectors, our arguments are based on studying the ‘Gaussian width’ of a tangent cone. The Gaussian width of a set D is given by w(D) = E sup yT z, y∈D

where z is a random vector with i.i.d. zero-mean, variance-1 Gaussian entries (the expectation is with respect to z). Let S = B`p2 ∩ T (x) be the “spherical” part of the tangent cone T (x). The width of the tangent cone is given by: w(S) = E sup yT z. y∈S

If S corresponds to the intersection of a k-dimensional √ k ≤ w(S) ≤ k. subspace with B`p2 , then we have that √k+1 The condition that the nullspace N (M) has a transverse intersection with T (x) is equivalent to requiring that N (M) ∩ S = {0}. The following theorem due to Gordon [11] gives the precise scaling required for such a trivial intersection to occur with overwhelming probability (for Gaussian random matrices): Theorem 3.1 (Gordon’s Escape Through A Mesh): Let M ∈ Rn×p √ be a Gaussian random matrix. Assuming that w(S) < n we have that N (M) ∩ S = {0} with probability greater than    2 n − w(S) /18 . 1 − 3.5 exp − √n+1 Remark The quantity w(S) is the key factor that determines p n then number √ of measurements required. Notice that √ < < n, so that the probability of success is lower2 n+1 bounded by n √ o 2 1 − 3.5 exp − n − w(S) /36 . Hence the number of measurements required for exact recovery is n & w(S)2 . B. Examples While Gordon’s theorem provides a simple characterization of the number of measurements required, it is in general not easy to compute the Gaussian width of a cone. Rudelson and Vershynin [17] have worked out Gaussian widths for the special case of tangent cones at sparse vectors on the boundary of the `1 ball, and derived results for sparse vector recovery using `1 minimization that improve upon previous results. In order to obtain more general results the key property that we exploit is symmetry. In particular many of our examples in Section II-C lead to symmetric convex bodies C(A ) such as the orbitopes described in [18]. Proposition 3.2: Suppose that the set A is a finite collection of m points such the convex hull C(A ) is a vertextransitive polytope [19] that contains as its vertices all the points in A . Using the convex program (1) we have that

701

O(log(m)) random Gaussian measurements suffice for exact recovery of a point in A , i.e., a vertex of C(A ). The proof will be given in a longer report. One can also bound the Gaussian width of a cone by using Dudley’s inequality [9], [13] in terms of the covering number at all scales. Using these results and other properties of the Gaussian width we obtain specific bounds for the following examples: 1) For the problem of recovering sparse vectors from linear measurements using `1 minimization, Rudelson and Vershynin [17] have worked out the width for tangent cones at boundary points with respect to the `1 ball. These results state that O(k log(p)) measurements suffice to recover a k-sparse vector in R p via `1 minimization. 2) For the nuclear norm, we obtain via width based arguments that O(kp) measurements suffice to recover a rank-k matrix in R p×p using nuclear norm minimization. This agrees with previous results [16]. 3) For a unique sign-vector solution in R p to a linear system of equations, we require that the linear system contain at least 2p random equations. This agrees with results obtained in [8], [14]. 4) In order to recover a p × p permutation matrix via a convex formulation involving the Birkhoff polytope, we appeal to Proposition 3.2 and the fact that there are p! permutation matrices of size p × p to conclude that we require n = O(p log(p)) measurements for exact recovery. 2 5) Again using symmetry one can show that n ≈ 3p4 measurements are sufficient in order to recover a p × p orthogonal matrix via spectral norm minimization. We present width computations for other examples in more detail in a longer report. IV. C OMPUTATIONAL I SSUES In this section we consider the computational problem that arises if the convex set C(A ) is not tractable to describe. In particular we focus on the case when the convex set C(A ) is centrally symmetric and thus defines a norm (appropriately recentered and perhaps with restriction to a subspace). Therefore the question of approximating the convex set C(A ) becomes one of approximating the atomic norm k · kA . We note that the dual norm k · k∗A is given by:

Notice that z0 M † q ≤ 1, ∀z ∈ A if and only if the following set is empty: {z | z0 M † q > 1, z ∈ A }. In order to certify the emptiness of this set, one can use a sequence of increasingly larger semidefinite programs based on the Positivstellensatz [1], [15]. This hierarchy is obtained based on SDP representations of sum-of-squares polynomials [15]. We provide more details in a longer report. These Positivstellensatz-based relaxations provide inner approximations to the set {q | kM † qk∗A ≤ 1}, and therefore a lower bound to the optimal value of the dual convex program (2). Consequently, they can be viewed as outer approximations to the atomic norm ball C(A ). This interpretation leads to the following interesting conclusion: If a point x˜ is an extreme point of the convex set C(A ) then the tangent cone at x˜ with respect to C(A ) is smaller than the tangent cone at (an appropriately scaled) x˜ with respect to the outer approximation. Thus we have a stronger requirement in terms of a larger number of measurements for exact recovery, which provides a trade-off between the complexity of the convex relaxation procedure and the amount of information required for exact recovery. V. C ONCLUSION We developed a framework for solving ill-posed linear inverse problems in which the underlying object to be recovered has algebraic structure. Our algorithmic approach for solving this problem is based on computing convex hulls of algebraic sets and using these to provide tractable convex relaxations. We also provided a simple and general recovery condition, and a set of semidefinite programmingbased methods to approximate the convex hulls of these algebraic sets in case they are intractable to characterize exactly. We give more details and additional recovery results (with more example applications) as well as an algebraicgeometric perspective in a longer report.

kyk∗A = sup{z0 y | z ∈ A }. This characterization is obtained from Bonsall’s atomic decomposition theorem [2]. The hierarchy of relaxations that we describe (based on the Positivstellensatz [1]) is more naturally specified with respect to the dual of the convex program (1): max b0 q q

s.t. kM † qk∗A ≤ 1,

(2)

where M † refers to the adjoint (transpose) of the operator M. Thus the dual convex program (2) can be rewritten as max b0 q q

s.t. z0 M † q ≤ 1, ∀z ∈ A .

702

R EFERENCES [1] B OCHNAK , J., C OSTE , M., AND ROY, M. (1988). Real Algebraic Geometry. Springer. [2] B ONSALL , F. F. (1991). A General Atomic Decomposition Theorem and Banach’s Closed Range Theorem. The Quarterly Journal of Mathematics, 42(1):9-14. [3] C AND E` S , E. J., ROMBERG , J., AND TAO , T. (2006). Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Info. Theory. 52 489–509. [4] C AND E` S , E. J. AND R ECHT, B. (2009). Exact matrix completion via convex optimization. Found. of Comput. Math.. 9 717–772. [5] DE S ILVA , V. AND L IM , L. (2008). Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM Journal on Matrix Analysis and Applications: Special Issue on Tensor Decompositions and Applications, 30(3), pp.10841127. [6] D ONOHO , D. L. (2006). For most large underdetermined systems of linear equations the minimal `1 -norm solution is also the sparsest solution. Comm. on Pure and Applied Math.. 59 797–829. [7] D ONOHO , D. L. (2009). Compressed sensing. IEEE Trans. Info. Theory. 52 1289–1306. [8] D ONOHO , D. L. AND TANNER , J. (2010). Counting the faces of randomly-projected hypercubes and orthants with applications. Discrete and Computational Geometry. 43 522–541. [9] D UDLEY, R. M. (1967). The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. J. Functional Analysis 1: 290–330.

[10] M. FAZEL (2002). Matrix rank minimization with applications. PhD thesis, Dep. Elec. Eng., Stanford University. [11] G ORDON , Y. (1988). On Milman’s inequality and random subspaces which escape through a mesh in Rn . Geometric aspects of functional analysis, Isr. Semin. 1986-87, Lect. Notes Math. 1317, 84-106. [12] JAGABATHULA , S. AND S HAH , D. (2010). Inferring Rankings Using Constrained Sensing. Preprint. [13] L EDOUX , M. AND TALAGRAND , M. (1991). Probability in Banach Spaces. Springer. [14] M ANGASARIAN , O. AND R ECHT, B. (2009). Probability of Unique Integer Solution to a System of Linear Equations. Preprint. [15] PARRILO , P. A. (2003). Semidefinite Programming Relaxations for Semialgebraic Problems. Mathematical Programming Ser. B, Vol. 96, No.2, pp. 293-320. [16] R ECHT, B., FAZEL , M., AND PARRILO , P. A. (2009). Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Review, to appear. [17] RUDELSON , M. AND V ERSHYNIN , R. (2006). Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements. CISS 2006 (40th Annual Conference on Information Sciences and Systems). [18] S ANYAL , R., S OTTILE , F., AND S TURMFELS , B. (2009) Orbitopes. Preprint. [19] Z IEGLER , G. (1995). Lectures on Polytopes. Springer.

703