Information Theory: Principles and Applications

. . Information Theory: Principles and Applications . . . .. Tiago T. V. Vinhoza April 9, 2010 . Tiago T. V. Vinhoza () Information Theory - ...

Author: Eleanor Armstrong

27 downloads 1 Views 190KB Size

Report

Download PDF

Recommend Documents

(Principles and Main Applications)

Marketing Theory and Applications

Minimax Theory and Applications

Matrices: Theory and Applications

Rapid Prototyping. Principles and Applications

CHEMISTRY Principles and Modern Applications

Semantic Relations. Theory and Applications

ECONOMIC THEORY, APPLICATIONS AND ISSUES

IGBT Drivers Theory and Applications

ECONOMIC THEORY, APPLICATIONS AND ISSUES

Adaptive Filtering - Theory and Applications

ACTIVE LEARNING: THEORY AND APPLICATIONS

Optimal Control - Theory and Applications

DOD Principles of Information and Information Environment

Information Systems and Applications

Principles and Theory of Radar Interferometry!

Theory and applications of classical and quantum kinetic theory

Aircraft Piston Engine Operation Principles and Theory

Principles and Applications of Aircraft Mechanical Science

Radioactivity: Principles and Applications (11 October 2007)

Linear Accelerators Principles, History, and Applications

Tech Note: Understanding Exciters Principles and Applications

cone beam CT principles and applications

.

.

Information Theory: Principles and Applications .

.

. ..

Tiago T. V. Vinhoza

April 9, 2010

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

1 / 42

. . .1 AEP and Source Coding . . .2 Markov Sources and Entropy Rate . . .3 Other Source Codes Shannon-Fano-Elias codes Arithmetic Codes Lempel-Ziv Codes . . .4 Channel Coding Types of Channel Channel Capacity

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

2 / 42

AEP and Source Coding

Asymptotic Equipartition Property: Summary Definition of typical set: 2−n(H(X)+ϵ) ≤ pXn (xn ) ≤ 2−n(H(X)−ϵ)

Size of typical set: n(H(X)+ϵ) (1 − δ)2n(H(X)−ϵ) ≤ |A(n) ϵ |≤2

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

3 / 42

AEP and Source Coding

Source coding in the light of the AEP A source coder operating on strings of n source symbols need only (n) provide a codeword for each string xn in the typical set Aϵ . (n)

If a sequence xn occurs that is not the typical set Aϵ , then a source coding failure is declared. The probability of failure can be made arbitrarily small by choosing a n large enough. (n)

Since |Aϵ | ≤ 2n(H(X)+ϵ) , the number of source codewords that need to be provided is fewer than 2n(H(X)+ϵ) . So, fixed length codewords of length ⌈n(H(X) + ϵ)⌉ is enough. L ≤ H(X) + ϵ + 1/n

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

4 / 42

AEP and Source Coding

Source coding theorem

For any discrete memoryless source with entropy H(X), any ϵ > 0, any δ > 0, and any sufficiently large n, there is a fixed-to-fixed-length source code with P (failure) ≤ δ that maps blocks of n source symbols into fixed-length codewords of length L ≤ H(X) + ϵ + 1/n bps. Compare this result with log M for fixed-length source codes without failures.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

5 / 42

AEP and Source Coding

Source coding theorem: converse

Let Xn be a string of n discrete random variables Xi , i = 1, . . . , n each with entropy H(X). For any ν > 0, let Xn be encoded into fixed-length codewords of length ⌊n(H(X) − ν)⌋ bits. For any δ > 0 and for all sufficiently large n, P (failure) > 1 − δ − 2−νn/2 Going from a fixed-length code with codeword lengths slightly larger than the entropy to a fixed-length code with codeword lengths slightly smaller than the entropy makes the probability of failure jump from almost 0 to almost 1.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

6 / 42

Markov Sources and Entropy Rate

Sources with dependent symbols

AEP established that nH(X) bits is enough, on average, to describe n independent and identically distributed random variables. What happens when the variables are dependent? What if the sequence of random variables form a stationary stochastic process?

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

7 / 42

Markov Sources and Entropy Rate

Stochastic Processes

A stochastic process is an indexed sequence of random variables. Characterized by the joint probability distribution pX1 ,...,Xn (x1 , . . . , xn ). where (x1 , . . . , xn ) ∈ X n

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

8 / 42

Markov Sources and Entropy Rate

Stochastic Processes

Stationarity: Joint probability distribution does not change with time-shifts. pX1 +d,...,Xn +d (x1 , . . . , xn ) = pX1 ,...,Xn (x1 , . . . , xn ) for every shift d and for all where x1 , . . . , xn ∈ X

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

9 / 42

Markov Sources and Entropy Rate

Markov Process or Markov Chain

Each random variable depends on the one preceding it and is conditionally independent of all other preceding random variables. P (Xn+1 = xn+1 |Xn = xn , . . . X1 = x1 ) = P (Xn+1 = xn+1 |Xn = xn ) for all where x1 , . . . , xn+1 ∈ X Joint probability distribution

pX1 ,...,Xn (x1 , . . . , xn ) = pX1 (x1 )pX2 |X1 =x1 (x2 )pX3 |X2 =x2 (x3 ) . . . pXn |Xn−

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

10 / 42

Markov Sources and Entropy Rate

Markov Process or Markov Chain

A Markov chain is irreducible if it is possible to go from any state to any other state in a finite number of steps A Markov chain is time invariant if the conditional probability does not depend on the time index n. P (Xn+1 = a|Xn = b) = P (X2 = a|X1 = b) for all a, b ∈ X . Xn is the state of the Markov chain in time n.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

11 / 42

Markov Sources and Entropy Rate

Markov Process or Markov Chain

A time invariant Markov chain is characterized by its initial state and a probability transition matrix P, whose element (i, j) is given by P (Xn+1 = j|Xn = i) Stationary distributions

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

12 / 42

Markov Sources and Entropy Rate

Entropy Rate

Given a sequence of random variables X1 , X2 , . . . , Xn . How does the entropy of the sequence grows with n? The entropy rate is defined as this rate of growth. 1 H(X1 , X2 , . . . , Xn ) n→∞ n

H(X ) = lim when the limit exists.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

13 / 42

Markov Sources and Entropy Rate

Entropy Rate: Examples

Typewriter with m equally likely output letters. After n keystrokes, we have mn possible sequences. H(X1 , . . . , Xn ) = log mn . H(X ) = lim

n→∞

1 1 H(X1 , X2 , . . . , Xn ) = lim log mn = log m n→∞ n n

X1 , X2 , . . . are indepdendent and identically distributed random variables. H(X1 , . . . , Xn ) = nH(X1 ). 1 H(X1 , X2 , . . . , Xn ) = H(X1 ) n→∞ n

H(X ) = lim

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

14 / 42

Markov Sources and Entropy Rate

Entropy Rate

Other definition of entropy rate: H ′ (X ) = lim H(Xn |Xn−1 , . . . , X1 ) n→∞

when the limit exists. For stationary stochastic processes H(X ) = H ′ (X ) For a stationary Markov chain H(X ) = H(X2 |X1 ).

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

15 / 42

Markov Sources and Entropy Rate

Why entropy rate is important?

There is a version of the AEP for stationary ergodic sources. 1 − pXn (xn ) → H(X ) n Like the AEP presented last class: 2nH(X ) typical sequences with probability 2−nH(X ) We can represent typical sequences of length n using nH(X ) bits.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

16 / 42

Other Source Codes

Other Source Codes

Shannon-Fano-Elias codes. Arithmetic codes. Lempel-Ziv codes.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

17 / 42

Other Source Codes

Shannon-Fano-Elias codes

Shannon-Fano-Elias Codes

Simple encoding procedure based on the cumulative distribution function (CDF) to allot codewords. ∑ FX (x) = pX (a) a≤x

Modified CDF F X (x) =

∑

1 pX (a) + ; P (X = x) 2 a