Information Theory: Principles and Applications

. . Information Theory: Principles and Applications . . . .. Tiago T. V. Vinhoza April 9, 2010 . Tiago T. V. Vinhoza () Information Theory - ...
27 downloads 1 Views 190KB Size
.

.

Information Theory: Principles and Applications .

.

. ..

Tiago T. V. Vinhoza

April 9, 2010

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

1 / 42

. . .1 AEP and Source Coding . . .2 Markov Sources and Entropy Rate . . .3 Other Source Codes Shannon-Fano-Elias codes Arithmetic Codes Lempel-Ziv Codes . . .4 Channel Coding Types of Channel Channel Capacity

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

2 / 42

AEP and Source Coding

Asymptotic Equipartition Property: Summary Definition of typical set: 2−n(H(X)+ϵ) ≤ pXn (xn ) ≤ 2−n(H(X)−ϵ)

Size of typical set: n(H(X)+ϵ) (1 − δ)2n(H(X)−ϵ) ≤ |A(n) ϵ |≤2

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

3 / 42

AEP and Source Coding

Source coding in the light of the AEP A source coder operating on strings of n source symbols need only (n) provide a codeword for each string xn in the typical set Aϵ . (n)

If a sequence xn occurs that is not the typical set Aϵ , then a source coding failure is declared. The probability of failure can be made arbitrarily small by choosing a n large enough. (n)

Since |Aϵ | ≤ 2n(H(X)+ϵ) , the number of source codewords that need to be provided is fewer than 2n(H(X)+ϵ) . So, fixed length codewords of length ⌈n(H(X) + ϵ)⌉ is enough. L ≤ H(X) + ϵ + 1/n

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

4 / 42

AEP and Source Coding

Source coding theorem

For any discrete memoryless source with entropy H(X), any ϵ > 0, any δ > 0, and any sufficiently large n, there is a fixed-to-fixed-length source code with P (failure) ≤ δ that maps blocks of n source symbols into fixed-length codewords of length L ≤ H(X) + ϵ + 1/n bps. Compare this result with log M for fixed-length source codes without failures.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

5 / 42

AEP and Source Coding

Source coding theorem: converse

Let Xn be a string of n discrete random variables Xi , i = 1, . . . , n each with entropy H(X). For any ν > 0, let Xn be encoded into fixed-length codewords of length ⌊n(H(X) − ν)⌋ bits. For any δ > 0 and for all sufficiently large n, P (failure) > 1 − δ − 2−νn/2 Going from a fixed-length code with codeword lengths slightly larger than the entropy to a fixed-length code with codeword lengths slightly smaller than the entropy makes the probability of failure jump from almost 0 to almost 1.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

6 / 42

Markov Sources and Entropy Rate

Sources with dependent symbols

AEP established that nH(X) bits is enough, on average, to describe n independent and identically distributed random variables. What happens when the variables are dependent? What if the sequence of random variables form a stationary stochastic process?

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

7 / 42

Markov Sources and Entropy Rate

Stochastic Processes

A stochastic process is an indexed sequence of random variables. Characterized by the joint probability distribution pX1 ,...,Xn (x1 , . . . , xn ). where (x1 , . . . , xn ) ∈ X n

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

8 / 42

Markov Sources and Entropy Rate

Stochastic Processes

Stationarity: Joint probability distribution does not change with time-shifts. pX1 +d,...,Xn +d (x1 , . . . , xn ) = pX1 ,...,Xn (x1 , . . . , xn ) for every shift d and for all where x1 , . . . , xn ∈ X

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

9 / 42

Markov Sources and Entropy Rate

Markov Process or Markov Chain

Each random variable depends on the one preceding it and is conditionally independent of all other preceding random variables. P (Xn+1 = xn+1 |Xn = xn , . . . X1 = x1 ) = P (Xn+1 = xn+1 |Xn = xn ) for all where x1 , . . . , xn+1 ∈ X Joint probability distribution

pX1 ,...,Xn (x1 , . . . , xn ) = pX1 (x1 )pX2 |X1 =x1 (x2 )pX3 |X2 =x2 (x3 ) . . . pXn |Xn−

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

10 / 42

Markov Sources and Entropy Rate

Markov Process or Markov Chain

A Markov chain is irreducible if it is possible to go from any state to any other state in a finite number of steps A Markov chain is time invariant if the conditional probability does not depend on the time index n. P (Xn+1 = a|Xn = b) = P (X2 = a|X1 = b) for all a, b ∈ X . Xn is the state of the Markov chain in time n.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

11 / 42

Markov Sources and Entropy Rate

Markov Process or Markov Chain

A time invariant Markov chain is characterized by its initial state and a probability transition matrix P, whose element (i, j) is given by P (Xn+1 = j|Xn = i) Stationary distributions

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

12 / 42

Markov Sources and Entropy Rate

Entropy Rate

Given a sequence of random variables X1 , X2 , . . . , Xn . How does the entropy of the sequence grows with n? The entropy rate is defined as this rate of growth. 1 H(X1 , X2 , . . . , Xn ) n→∞ n

H(X ) = lim when the limit exists.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

13 / 42

Markov Sources and Entropy Rate

Entropy Rate: Examples

Typewriter with m equally likely output letters. After n keystrokes, we have mn possible sequences. H(X1 , . . . , Xn ) = log mn . H(X ) = lim

n→∞

1 1 H(X1 , X2 , . . . , Xn ) = lim log mn = log m n→∞ n n

X1 , X2 , . . . are indepdendent and identically distributed random variables. H(X1 , . . . , Xn ) = nH(X1 ). 1 H(X1 , X2 , . . . , Xn ) = H(X1 ) n→∞ n

H(X ) = lim

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

14 / 42

Markov Sources and Entropy Rate

Entropy Rate

Other definition of entropy rate: H ′ (X ) = lim H(Xn |Xn−1 , . . . , X1 ) n→∞

when the limit exists. For stationary stochastic processes H(X ) = H ′ (X ) For a stationary Markov chain H(X ) = H(X2 |X1 ).

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

15 / 42

Markov Sources and Entropy Rate

Why entropy rate is important?

There is a version of the AEP for stationary ergodic sources. 1 − pXn (xn ) → H(X ) n Like the AEP presented last class: 2nH(X ) typical sequences with probability 2−nH(X ) We can represent typical sequences of length n using nH(X ) bits.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

16 / 42

Other Source Codes

Other Source Codes

Shannon-Fano-Elias codes. Arithmetic codes. Lempel-Ziv codes.

.

Tiago T. V. Vinhoza ()

Information Theory - MAP-Tele

.

.

.

.

April 9, 2010

.

17 / 42

Other Source Codes

Shannon-Fano-Elias codes

Shannon-Fano-Elias Codes

Simple encoding procedure based on the cumulative distribution function (CDF) to allot codewords. ∑ FX (x) = pX (a) a≤x

Modified CDF F X (x) =



1 pX (a) + ; P (X = x) 2 a