Stanford January 2009

History and Introduction Main idea Constructions Approximation theory

Compressed Sensing: History Compressed Sensing (CS)

People involved,

(Right to left: J. Claerbout, B. Logan, D. Donoho, E. Candés, T. Tao and R. DeVore)

History and Introduction Main idea Constructions Approximation theory

Compressed Sensing: Introduction Old-fashioned Thinking

Compressed Sensing (CS)

(CS camera at Rice)

Collect data at grid points For n pixels, take n observations History and Introduction Main idea Constructions Approximation theory

Takes only O(n1/4 log5 (n)) random measurements instead of n

Traditional signal processing

Model signals as band-limited functions x(t) Support of xb is contained in [−Ωπ, Ωπ] Shannon-Nyquist Uniform time sampling with spacing h ≤ 1/Ω gives exact reconstructon A/D converters: sample and quantize Problem: if Ω is very large, one cannot build circuits to sample at the desired rate

History and Introduction Main idea Constructions Approximation theory

Signal processing using CS Compressive sensing seeks a way out of this dilemma Two new components: New model classes for signals: signals are sparse in some representation system (basis/frame) New meaning of samples: sample is a linear functional applied to the signal

History and Introduction Main idea Constructions Approximation theory

Signal processing using CS Compressive sensing seeks a way out of this dilemma Two new components: New model classes for signals: signals are sparse in some representation system (basis/frame) New meaning of samples: sample is a linear functional applied to the signal

Given x ∈ IRn with n large, ask m non-adaptive questions about x Question means inner product v · x with v ∈ IRn means sample Such sampling is described by an m × n linear system Φx = y

History and Introduction Main idea Constructions Approximation theory

Structure of signals

With no additional information on x cannot say anything But we are interested in those x that have structure Typically x can be represented by sparse linear combinations of certain building blocks (e.g., a basis) Issue: in many problems, we do not know the basis Here we assume the basis is known (for now) Ansatz: look for k -sparse solutions: x ∈ Σk

that is # supp(x) ≤ k .

History and Introduction Main idea Constructions Approximation theory

Sparsest Solutions of Linear equations Find a sparsest solution of linear system (P0 )

min{kxk0 : Φx = b, x ∈ IRn }

where kxk0 = number of nonzeros of x and Φ ∈ IRm×n with m < n.

The solution is in general not unique. Moreover, this problem is NP-Hard

History and Introduction Main idea Constructions Approximation theory

Basis Pursuit Main idea: Use the convex relaxation (P1 )

min{kxk1 : Φx = b, x ∈ IRn }

Basis Pursuit [Chen, Donoho, and Saunders (1999)] Solving (P1 ) in polynomial time Can be solved by linear programming: min s.t.

1y T

Φx = b −y ≤ x ≤ y

History and Introduction Main idea Constructions Approximation theory

Sparse Recovery and Mutual Incoherence Mutual incoherence: T

M(Φ) = max|φi φj | i6=j

where Φ = [φ1 . . . φn ] ∈ IRm×n and kφi k2 = 1. Theorem (Elad and Bruckstein (2002)) Suppose that for the sparsest solution x ? we have √ ( 2 − 21 ) ? kx k0 < . M(Φ) Then the solution of (P1 ) is equal to the solution of (P0 ).

History and Introduction Main idea Constructions Approximation theory

Sparse Recovery and RIP

Restricted Isometry Property of Order k [Candès, Romberg, Tao (2006)]: Let δk be the smallest number such that (1 − δk )kxk22 ≤ kΦxk22 ≤ (1 + δk )kxk22 for all k -sparse vectors x ∈ IRn where Φ = [φ1 . . . φn ] ∈ IRm×n . Theorem (E. J. Candès (2008)) √ If δ2k < 2 − 1, then for all k -sparse vectors x such that Φx = b, the solution of (P1 ) is equal to the solution of (P0 ).

History and Introduction Main idea Constructions Approximation theory

Approximate Recovery and RIP Basis Pursuit De-Noising (BPDN): (P1 )

min{kxk1 : kΦx − bk2 ≤ }

[Chen, Donoho, and Saunders (1999)] Theorem (E. J. Candès (2008)) Suppose that the matrix Φ is given and b = Φxˆ + e where √ kek2 ≤ . If δ2k < 2 − 1, then kx ? − xˆ k2 ≤ C0 k −1/2 σk (xˆ )1 + C1 , where x ? is the solution of (P1 ) and σk (xˆ )1 = min kxˆ − zk1 . z∈Σk

Other Heuristics: Orthogonal Matching Pursuit, Mangasarian’s approach, Bilinear formulation, etc. History and Introduction Main idea Constructions Approximation theory

End of Part I.

History and Introduction Main idea Constructions Approximation theory

Construction of CS Matrices Good compressive sensing (CS) matrices: Known Result for Random matrices Known reconstruction bounds for matrices with entries drawn at random from various probability distributions: k ≤ Cm/ log(n/m). Specific recipes include Gaussian, Bernoulli and other classical matrix ensembles. Particular case: there is a probabilistic construction of matrices Φ of size m × n with entries {± √1m } satisfying RIP of order k with the above bound.

History and Introduction Main idea Constructions Approximation theory

Probabilistic Construction of CS Matrices

Introduce the Concentration of Measure Inequality (CMI) property on a probability space (Ω, %) Suppose Φ = Φ(ω) is a collection of random m × n matrices Property PO(δ): the collection is said to have CMI if, for each x ∈ IRn , there is a set Ω0 (x, δ) ⊂ Ω s.t. (1 − δ)kxk2 ≤ kΦxk2 ≤ (1 + δ)kxk2 , and %(Ω(x, δ)c ) ≤ C0 e−c0 mδ

ω ∈ Ω(x, δ)

2

Gaussian, Bernoulli and many other families have this property

History and Introduction Main idea Constructions Approximation theory

Property PO and the JL Lemma Johnson-Lindenstrauss Lemma. Given ε ∈ (0, 1), a set X of points in IRn such that #X =: σ > σ0 = O(ln m/ε2 ), there is a Lipschitz function Φ : IRn → IRm s.t. (1 − ε)ku − v k2 ≤ kΦ(u) − Φ(v )k2 ≤ (1 + ε)ku − v k2 . If X is a set of points and m > c ln(#X )ε−2 with c sufficiently large, then the set Ω0 := ∩x,x 0 ∈X Ω(x − x 0 , ε) satisfies 2

2 +ln C

%(Ωc0 ) ≤ C0 (#X )2 e−c0 nε = e2 ln(#X )−c0 mε

0

If m ≥ (2 ln(#X ) + ln C0 )/c0 ε2 , then the measure is < 1 hence we get the JL lemma.

History and Introduction Main idea Constructions Approximation theory

The JL Lemma and RIP

If k ≤ cm/ ln(n/m) and Φ satisfies JL, then we have RIP of order k For c sufficiently small, the probability that a Gaussian emsemble satisfies RIP is 1 − Ce−cm If we have a collection of O(ecm ) bases, then a random draw of Gaussian will satisfy RIP with respect to all of these bases simultaneously Basic reason why this works h i Pr kΦ(ω)xk2`2 − kxk2`2 ≥ εkxk2`2 ≤ 2e−mc0 (ε) , 0 < ε < 1.

History and Introduction Main idea Constructions Approximation theory

Deterministic Construction of CS Matrices Difficulties in Deterministic Case [DeVore]: Proposes a deterministic constructions of order √ k ≤ C m log n/ log(n/m) which is still far from probabilistic results. Very recent ideas Find deterministic constructions with better bounds using bipartite expander graphs [Indyk, Hassibi, Xu]; using structured matrices such as Toeplitz, cyclic, generalized Vandermonde matrices. Subgoal: deterministic polynomial time algorithm for constructing good CS matrices. Holy grail: k ≤ Cm/ log(n/m) – achievable for random matrices but not yet in deterministic constructions. History and Introduction Main idea Constructions Approximation theory

Connections with approximation theory Cohen, Dahmen, DeVore [2009]: Compressed sensing and best k -term approximation, JAMS. Best k -term approximation error: σk (x)X := inf kx − zkX . z∈Σk

Encoder-decoder viewpoint The matrix Φ serves as an encoder producing y = Φx. To extract x / approximation to x, use a decoder ∆ (not necessarily linear). Thus ∆(y ) = ∆(Φx) approximates x.

History and Introduction Main idea Constructions Approximation theory

Performance of encoders-decoders Performance of encoder-decoder pairs Ask for the largest value of k s.t. x ∈ Σk =⇒ ∆(Φx) = x. Ask for the largest value of k s.t., for a given class K , En (K )X

≤

En (K )X

:=

σk (K )X

:= sup σk (X ).

Cσk (K )X ,

where

inf sup kx − ∆(Φx)k, (Φ,∆) x∈K x∈K

A pair (Φ, ∆) is called instance-optimal of order k with constant C for the space X is kx − ∆(Φx)kX ≤ Cσk (x)X for all x ∈ X with a constant C independent of k and n. History and Introduction Main idea Constructions Approximation theory

Connection with Gelfand widths

Gelfand widths For K a compact set in X and m ∈ IN, the Gelfand width of K of order m is d m (K )X :=

inf

codimY ≤m

sup{kxkX : x ∈ K ∩ Y }.

Basic result Lemma. Let K ⊂ IRn be symmetric, i.e., K = −K , and satisfy K + K ⊂ C0 K for some C0 . If X ⊆ IRn is any normed space, then d m (K )X ≤ Em (K )X ≤ C0 d m (K )X ,

History and Introduction Main idea Constructions Approximation theory

1 ≤ m ≤ n.

Orders of Gelfand widths Orders of Gelfand widths of `q balls Theorem [Gluskin, Garnaev, Kashin (1977,1984)] C1 Ψ(m, n, q, p) ≤ d m (U(`nq )) ≤ C2 Ψ(m, n, q, p) where 1/q−1/p 1/q−1/2 min{1, n1−1/q m−1/2 } , r log(n/m) Ψ(m, n, 1, 2) := min{1, }. m

Ψ(m, n, q, p) :=

Corollary. The necessary number of measurements k satisfies k ≤ c0 m/ log(n/m).

History and Introduction Main idea Constructions Approximation theory

Instance optimality and the null space of Φ

Denote N := N (Φ) :={x : Φx = 0}. Uniqueness of recovery Lemma. For an m × n matrix Φ and for 2k ≤ m, the following are equivalent: There is a decoder ∆ s.t. ∆(Φx) = x for all x ∈ Σk . Σ2k ∩ N = {0}. For any set T with #T = 2k , the matrix ΦT has rank 2k . For any T as above, the matrix Φ∗T ΦT is positive definite.

History and Introduction Main idea Constructions Approximation theory

Approximate recovery Approximation to accuracy σk Theorem [Cohen, Dahmen, DeVore (2009)]. Given an m × n matrix Φ, a norm k · kX and a value of k , a sufficient condition that there exists a decoder ∆ s.t. kx − ∆(Φx)kX ≤ Cσk (x)X is that kηkX ≤ C/2 · σ2k (η)X ,

η ∈ N.

A necessary condition is that kηkX ≤ C · σ2k (η)X ,

η ∈ N.

This gives rise to the null space property (in X of order 2k ): kηkX ≤ C · σ2k (η)X , History and Introduction Main idea Constructions Approximation theory

η ∈ N.

The null space property

Approximation and the null space property Corollary [Cohen, Dahmen, DeVore (2009)]. Suppose that X is an lpn space, k ∈ IN and Φ is an encoding matrix. If Φ has the null space property in X of order 2k with constant C/2, then there exists a decoder ∆ so that kx − ∆(Φx)kX ≤ Cσk (x)X . Conversely, the validity of the above condition for some decoder ∆ implies that Φ has the null space property in X of order 2k with constant C.

History and Introduction Main idea Constructions Approximation theory

One of the main results, revisited

RIP and good encoder-decoder pairs Theorem [Candès-Romberg-Tao (2006)]. Let Φ be √ any matrix with satisfies the RIP of order 3k with δ3k ≤ δ < ( 2 − 1)2 /3. Define the decoder ∆ by ∆(y ) := argminΦz=y kzk`1 . Then (Φ, ∆) satisfies kx − ∆(Φx)kX ≤ Cσk (x)X in X = `1 with C =

√ √ 2 2+2−(2 2−2)δ √ √ . 2−1−( 2+1)δ

History and Introduction Main idea Constructions Approximation theory

The End.

History and Introduction Main idea Constructions Approximation theory