Algorithms for sparse analysis Lecture III: Dictionary geometry, greedy algorithms, and convex relaxation

Algorithms for sparse analysis Lecture III: Dictionary geometry, greedy algorithms, and convex relaxation Anna C. Gilbert Department of Mathematics Un...

Author: Julia Rosa Robertson

0 downloads 0 Views 416KB Size

Report

Download PDF

Recommend Documents

Dictionary learning for sparse coding: Algorithms and convergence analysis

Lecture 14: Greedy Algorithms CSCI Algorithms I. Andrew Rosenberg

Algorithms for Dynamic Convex Hulls

Greedy Algorithms Spanning Trees

Proof methods and greedy algorithms

Greedy Algorithms and Dynamic Programming

CHAPTER 16 Greedy Algorithms

Greedy in Approximation Algorithms

Algorithms and Data Structures: Computational Geometry III (Convex Hull) Friday, 18th Nov, 2014

Logarithmic Regret Algorithms for Online Convex Optimization

NOVEL GREEDY DEEP LEARNING ALGORITHMS

Algorithms and Proofs in Geometry

Approximation and learning by greedy algorithms

Spanning Trees, greedy algorithms. Lecture 20 CS2110 Spring 2018

Chapter 2: Greedy Algorithms and Local Search

Mining Sparse Representations: Formulations, Algorithms, and Applications

4.1 Interval Scheduling. Chapter 4. Greedy Algorithms. Interval Scheduling: Greedy Algorithms. Interval Scheduling

Robust Guarantees of Stochastic Greedy Algorithms

Lecture 2: Quantum Algorithms

Greedy Algorithms. Mohan Kumar CSE5311 Fall The Greedy Principle

Greedy Algorithms. Interval Scheduling: Section 4.1

Domatic number 1. Greedy Algorithms. Guy Kortsarz

Deterministic Parallel Algorithms for Convex and Nonconvex Big Data Optimization

MEMORY HIERARCHY PERFORMANCE PREDICTION FOR BLOCKED SPARSE ALGORITHMS. and

Algorithms for sparse analysis Lecture III: Dictionary geometry, greedy algorithms, and convex relaxation Anna C. Gilbert Department of Mathematics University of Michigan

Convergence of OMP Theorem Suppose Φ is a complete dictionary for Rd . For any vector x, the residual after t steps of OMP satisfies 1 krt k2 ≤ c √ . t [Devore-Temlyakov]

Convergence of OMP Theorem Suppose Φ is a complete dictionary for Rd . For any vector x, the residual after t steps of OMP satisfies 1 krt k2 ≤ c √ . t [Devore-Temlyakov]

• Even if x can be expressed sparsely, OMP may take d steps

before the residual is zero.

Convergence of OMP Theorem Suppose Φ is a complete dictionary for Rd . For any vector x, the residual after t steps of OMP satisfies 1 krt k2 ≤ c √ . t [Devore-Temlyakov]

• Even if x can be expressed sparsely, OMP may take d steps

before the residual is zero.

• But, sometimes OMP correctly identifies sparse

representations.

Sparse representation with OMP • Suppose x has k-sparse representation

x=

X

c ` ϕ`

`∈Λ

i.e., copt is non-zero on Λ.

where |Λ| = k

Sparse representation with OMP • Suppose x has k-sparse representation

x=

X

c ` ϕ`

`∈Λ

where |Λ| = k

i.e., copt is non-zero on Λ. • Sufficient to find Λ—When can OMP do so?

Sparse representation with OMP • Suppose x has k-sparse representation

x=

X

c ` ϕ`

`∈Λ

where |Λ| = k

i.e., copt is non-zero on Λ. • Sufficient to find Λ—When can OMP do so? • Define

ΦΛ = ϕ`1 ΨΛ = ϕ`1

ϕ`2 ϕ`2

···

ϕ`k

···

ϕ`N−k

`s ∈Λ

`s ∈Λ /

and

Sparse representation with OMP • Suppose x has k-sparse representation

x=

X

c ` ϕ`

`∈Λ

where |Λ| = k

i.e., copt is non-zero on Λ. • Sufficient to find Λ—When can OMP do so? • Define

ΦΛ = ϕ`1 ΨΛ = ϕ`1

ϕ`2 ϕ`2

···

ϕ`k

···

ϕ`N−k

`s ∈Λ

and

`s ∈Λ /

• Define greedy selection ratio

T

Ψ r max i.p. bad atoms max`∈Λ Λ / | hr , ϕ` i |

∞ = = ρ(r ) = T

max`∈Λ | hr , ϕ` i | max i.p. good atoms ΦΛ r ∞

Sparse representation with OMP • Suppose x has k-sparse representation

x=

X

c ` ϕ`

`∈Λ

where |Λ| = k

i.e., copt is non-zero on Λ. • Sufficient to find Λ—When can OMP do so? • Define

ΦΛ = ϕ`1 ΨΛ = ϕ`1

ϕ`2 ϕ`2

···

ϕ`k

···

ϕ`N−k

`s ∈Λ

and

`s ∈Λ /

• Define greedy selection ratio

T

Ψ r max i.p. bad atoms max`∈Λ Λ / | hr , ϕ` i |

∞ = = ρ(r ) = T

max`∈Λ | hr , ϕ` i | max i.p. good atoms ΦΛ r ∞ • OMP chooses good atom iff ρ(r ) < 1

Exact Recovery Condition Theorem (ERC) A sufficient condition for OMP to identify Λ after k steps is that

max Φ+ Λ ϕ` 1 < 1 `∈Λ /

where A+ = (AT A)−1 AT .

[Tropp’04]

Exact Recovery Condition Theorem (ERC) A sufficient condition for OMP to identify Λ after k steps is that

max Φ+ Λ ϕ` 1 < 1 `∈Λ /

where A+ = (AT A)−1 AT .

[Tropp’04]

• A+ x is a coefficient vector that synthesizes best

approximation of x using atoms in A.

• P = AA+ orthogonal projector produces this best

approximation

Proof. r0 = x ∈ range(ΦΛ )

Proof. r0 = x ∈ range(ΦΛ ) At iteration t + 1 assume `1 , `2 , . . . , `t ∈ Λ, thus at ∈ range(ΦΛ ) and rt = x − at ∈ range(ΦΛ )

Proof. r0 = x ∈ range(ΦΛ ) At iteration t + 1 assume `1 , `2 , . . . , `t ∈ Λ, thus at ∈ range(ΦΛ ) and rt = x − at ∈ range(ΦΛ ) )T ΦT Express orthogonal projector onto range(ΦΛ ) as (Φ+ Λ , therefore Λ + T

T

(ΦΛ ) ΦΛ rt = rt .

Proof. r0 = x ∈ range(ΦΛ ) At iteration t + 1 assume `1 , `2 , . . . , `t ∈ Λ, thus at ∈ range(ΦΛ ) and rt = x − at ∈ range(ΦΛ ) )T ΦT Express orthogonal projector onto range(ΦΛ ) as (Φ+ Λ , therefore Λ + T

T

(ΦΛ ) ΦΛ rt = rt .

Bound

T

T + T T

ΨΛ rt

ΨΛ (ΦΛ ) ΦΛ rt ∞ ∞

ρ(rt ) = =

T

T

ΦΛ rt

ΦΛ rt ∞ ∞

T + T ≤ ΨΛ (ΦΛ ) ∞

+

= ΦΛ ΨΛ 1

T

= max ΦΛ ϕ` < 1 `∈Λ /

1

Proof. r0 = x ∈ range(ΦΛ ) At iteration t + 1 assume `1 , `2 , . . . , `t ∈ Λ, thus at ∈ range(ΦΛ ) and rt = x − at ∈ range(ΦΛ ) )T ΦT Express orthogonal projector onto range(ΦΛ ) as (Φ+ Λ , therefore Λ + T

T

(ΦΛ ) ΦΛ rt = rt .

Bound

T

T + T T

ΨΛ rt

ΨΛ (ΦΛ ) ΦΛ rt ∞ ∞

ρ(rt ) = =

T

T

ΦΛ rt

ΦΛ rt ∞ ∞

T + T ≤ ΨΛ (ΦΛ ) ∞

+

= ΦΛ ΨΛ 1

T

= max ΦΛ ϕ` < 1 `∈Λ /

1

Then OMP selects an atom from Λ at iteration t and since it chooses a new atom at each iteration,

Proof. r0 = x ∈ range(ΦΛ ) At iteration t + 1 assume `1 , `2 , . . . , `t ∈ Λ, thus at ∈ range(ΦΛ ) and rt = x − at ∈ range(ΦΛ ) )T ΦT Express orthogonal projector onto range(ΦΛ ) as (Φ+ Λ , therefore Λ + T

T

(ΦΛ ) ΦΛ rt = rt .

Bound

T

T + T T

ΨΛ rt

ΨΛ (ΦΛ ) ΦΛ rt ∞ ∞

ρ(rt ) = =

T

T

ΦΛ rt

ΦΛ rt ∞ ∞

T + T ≤ ΨΛ (ΦΛ ) ∞

+

= ΦΛ ΨΛ 1

T

= max ΦΛ ϕ` < 1 `∈Λ /

1

Then OMP selects an atom from Λ at iteration t and since it chooses a new atom at each iteration, After k iterations, chosen all atoms from Λ.

Coherence Bounds

Theorem The ERC holds whenever k < 12 (µ−1 + 1). Therefore, OMP can recover any sufficiently sparse signals. [Tropp’04]

√ For most redundant dictionaries, k < 12 ( d + 1).

Sparse Theorem Assume k ≤ 13 µ−1 . For any vector x, the approximation Φb c after k steps of OMP satisfies kb c k0 ≤ k and √ kx − Φb c k2 ≤ 1 + 6k kx − Φcopt k2 where copt is the best k-term approximation to x over Φ.

[Tropp’04]

Theorem Assume 4 ≤ k ≤ s.t.

√1 . µ

Two-phase greedy pursuit produces b x = Φb c

kx − b x k2 ≤ 3 kx − Φcopt k2 .

Assume k ≤ µ1 . Two-phase greedy pursuit produces b x = Φb c s.t. kx − b x k2 ≤ 1 + [Gilbert, Strauss, Muthukrishnan, Tropp ’03]

2µk 2 kx − Φcopt k2 . (1 − 2µk)2

Convex relaxation: BP

• Exact: non-convex optimization

arg min kck0

s.t.

x = Φc

Convex relaxation: BP

• Exact: non-convex optimization

arg min kck0

s.t.

x = Φc

• Convex relaxation of non-convex problem

arg min kck1

s.t.

x = Φc

Convex relaxation contribution of coefficient to norm

!0

coefficient value

!1

Convex relaxation: algorithmic formulation

• Well-studied algorithmic formulation

[Donoho, Donoho-Elad-Temlyakov, Tropp,

and many others]

• Optimization problem = linear program: linear objective

function (with variables c + , c − ) and linear constraints

• Still need algorithm for solving optimization problem

• Hard part of analysis: showing solution to convex problem =

solution to original problem

LP

• Feasible region is convex

polytope

min x1 + x2

s.t.

x1 ≥ 0 , x2 ≥ 0 , x1 + 2x2 = 4

feasible region x1 + 2x2 = 4 minimum

• Linear objective function:

convex and concave =⇒ local minimum/maximum are global

• If feasible solution exists maximum x1 + x2 = 2

x1 + x2 = 4

and if objective function bounded, then optimum achieved on boundary (possibly many points)

Exact Recovery Condition Theorem (ERC) A sufficient condition for BP to recover the sparsest representation of x is that

max Φ+ Λ ϕ` 1 < 1 `∈Λ /

where

A+

=

(AT A)−1 AT .

[Tropp’04]

Exact Recovery Condition Theorem (ERC) A sufficient condition for BP to recover the sparsest representation of x is that

max Φ+ Λ ϕ` 1 < 1 `∈Λ /

where

A+

Theorem

=

(AT A)−1 AT .

[Tropp’04]

The ERC holds whenever k < 12 (µ−1 + 1). Therefore, BP can recover any sufficiently sparse signals. [Tropp’04]

Convex relaxation: BP-denoising

• Error: non-convex optimization

arg min kck0

s.t.

kx − Φck2 ≤

Convex relaxation: BP-denoising

• Error: non-convex optimization

arg min kck0

s.t.

kx − Φck2 ≤

• Convex relaxation of non-convex problem

arg min kck1

s.t.

kx − Φck2 ≤ δ.

• Convex objective function over convex set.

Optimization formulations

• Constrained minimization

arg min kck1

s.t.

kx − Φck2 ≤ δ.

• Unconstrained minimization (`1 -regularization):

minimize

L(c; γ, x) =

1 kx − Φck22 + γ kck1 . 2

• Constrained minimization

Theorem Suppose that k ≤ 31 µ−1 . Suppose copt is k-sparse and solves original optimization problem. Then solution b c to constrained minimization problem has same sparsity and satisfies √ 1 + 6k . kx − Φb c k2 ≤ [Tropp ’04]

• Unconstrained minimization: many algorithms for

`1 -regularization (e.g., Bregman iteration, interior point methods, LASSO and LARS)

Optimization vs. Greedy

• Exact and Error amenable to convex relaxation and

convex optimization

• Sparse not amenable to convex relaxation

arg min kΦc − xk2

s.t.

but appropriate for greedy algorithms

kck0 ≤ k

Hardness depends on instance

Redundant dictionary Φ

input signal x

NP-hard

arbitrary

arbitrary

depends on choice of Φ

fixed

fixed

random (distribution?)

random (distribution?)

compressive sensing

random signal model

Random signal model

Theorem

√ If Φ has consistent coherence µ = 1/ d, choose k ∼ d/ log d atoms for x at random from Φ, then sparse representation is unique and, given x and Φ, convex relaxation finds it. [Tropp’07]

Summary

• Geometry of dictionary is important but

• Obtain sufficient conditions on the geometry of the dictionary

to solve Sparse problems efficiently.

• Algorithms are approximation algorithms (wrt error). • Greedy pursuit and convex relaxation.

• Next lecture: Sublinear algorithms for sparse approximation

and compressive sensing