.
.
Information Theory: Principles and Applications .
.
. ..
Tiago T. V. Vinhoza
April 9, 2010
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
1 / 42
. . .1 AEP and Source Coding . . .2 Markov Sources and Entropy Rate . . .3 Other Source Codes Shannon-Fano-Elias codes Arithmetic Codes Lempel-Ziv Codes . . .4 Channel Coding Types of Channel Channel Capacity
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
2 / 42
AEP and Source Coding
Asymptotic Equipartition Property: Summary Definition of typical set: 2−n(H(X)+ϵ) ≤ pXn (xn ) ≤ 2−n(H(X)−ϵ)
Size of typical set: n(H(X)+ϵ) (1 − δ)2n(H(X)−ϵ) ≤ |A(n) ϵ |≤2
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
3 / 42
AEP and Source Coding
Source coding in the light of the AEP A source coder operating on strings of n source symbols need only (n) provide a codeword for each string xn in the typical set Aϵ . (n)
If a sequence xn occurs that is not the typical set Aϵ , then a source coding failure is declared. The probability of failure can be made arbitrarily small by choosing a n large enough. (n)
Since |Aϵ | ≤ 2n(H(X)+ϵ) , the number of source codewords that need to be provided is fewer than 2n(H(X)+ϵ) . So, fixed length codewords of length ⌈n(H(X) + ϵ)⌉ is enough. L ≤ H(X) + ϵ + 1/n
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
4 / 42
AEP and Source Coding
Source coding theorem
For any discrete memoryless source with entropy H(X), any ϵ > 0, any δ > 0, and any sufficiently large n, there is a fixed-to-fixed-length source code with P (failure) ≤ δ that maps blocks of n source symbols into fixed-length codewords of length L ≤ H(X) + ϵ + 1/n bps. Compare this result with log M for fixed-length source codes without failures.
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
5 / 42
AEP and Source Coding
Source coding theorem: converse
Let Xn be a string of n discrete random variables Xi , i = 1, . . . , n each with entropy H(X). For any ν > 0, let Xn be encoded into fixed-length codewords of length ⌊n(H(X) − ν)⌋ bits. For any δ > 0 and for all sufficiently large n, P (failure) > 1 − δ − 2−νn/2 Going from a fixed-length code with codeword lengths slightly larger than the entropy to a fixed-length code with codeword lengths slightly smaller than the entropy makes the probability of failure jump from almost 0 to almost 1.
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
6 / 42
Markov Sources and Entropy Rate
Sources with dependent symbols
AEP established that nH(X) bits is enough, on average, to describe n independent and identically distributed random variables. What happens when the variables are dependent? What if the sequence of random variables form a stationary stochastic process?
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
7 / 42
Markov Sources and Entropy Rate
Stochastic Processes
A stochastic process is an indexed sequence of random variables. Characterized by the joint probability distribution pX1 ,...,Xn (x1 , . . . , xn ). where (x1 , . . . , xn ) ∈ X n
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
8 / 42
Markov Sources and Entropy Rate
Stochastic Processes
Stationarity: Joint probability distribution does not change with time-shifts. pX1 +d,...,Xn +d (x1 , . . . , xn ) = pX1 ,...,Xn (x1 , . . . , xn ) for every shift d and for all where x1 , . . . , xn ∈ X
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
9 / 42
Markov Sources and Entropy Rate
Markov Process or Markov Chain
Each random variable depends on the one preceding it and is conditionally independent of all other preceding random variables. P (Xn+1 = xn+1 |Xn = xn , . . . X1 = x1 ) = P (Xn+1 = xn+1 |Xn = xn ) for all where x1 , . . . , xn+1 ∈ X Joint probability distribution
pX1 ,...,Xn (x1 , . . . , xn ) = pX1 (x1 )pX2 |X1 =x1 (x2 )pX3 |X2 =x2 (x3 ) . . . pXn |Xn−
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
10 / 42
Markov Sources and Entropy Rate
Markov Process or Markov Chain
A Markov chain is irreducible if it is possible to go from any state to any other state in a finite number of steps A Markov chain is time invariant if the conditional probability does not depend on the time index n. P (Xn+1 = a|Xn = b) = P (X2 = a|X1 = b) for all a, b ∈ X . Xn is the state of the Markov chain in time n.
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
11 / 42
Markov Sources and Entropy Rate
Markov Process or Markov Chain
A time invariant Markov chain is characterized by its initial state and a probability transition matrix P, whose element (i, j) is given by P (Xn+1 = j|Xn = i) Stationary distributions
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
12 / 42
Markov Sources and Entropy Rate
Entropy Rate
Given a sequence of random variables X1 , X2 , . . . , Xn . How does the entropy of the sequence grows with n? The entropy rate is defined as this rate of growth. 1 H(X1 , X2 , . . . , Xn ) n→∞ n
H(X ) = lim when the limit exists.
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
13 / 42
Markov Sources and Entropy Rate
Entropy Rate: Examples
Typewriter with m equally likely output letters. After n keystrokes, we have mn possible sequences. H(X1 , . . . , Xn ) = log mn . H(X ) = lim
n→∞
1 1 H(X1 , X2 , . . . , Xn ) = lim log mn = log m n→∞ n n
X1 , X2 , . . . are indepdendent and identically distributed random variables. H(X1 , . . . , Xn ) = nH(X1 ). 1 H(X1 , X2 , . . . , Xn ) = H(X1 ) n→∞ n
H(X ) = lim
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
14 / 42
Markov Sources and Entropy Rate
Entropy Rate
Other definition of entropy rate: H ′ (X ) = lim H(Xn |Xn−1 , . . . , X1 ) n→∞
when the limit exists. For stationary stochastic processes H(X ) = H ′ (X ) For a stationary Markov chain H(X ) = H(X2 |X1 ).
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
15 / 42
Markov Sources and Entropy Rate
Why entropy rate is important?
There is a version of the AEP for stationary ergodic sources. 1 − pXn (xn ) → H(X ) n Like the AEP presented last class: 2nH(X ) typical sequences with probability 2−nH(X ) We can represent typical sequences of length n using nH(X ) bits.
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
16 / 42
Other Source Codes
Other Source Codes
Shannon-Fano-Elias codes. Arithmetic codes. Lempel-Ziv codes.
.
Tiago T. V. Vinhoza ()
Information Theory - MAP-Tele
.
.
.
.
April 9, 2010
.
17 / 42
Other Source Codes
Shannon-Fano-Elias codes
Shannon-Fano-Elias Codes
Simple encoding procedure based on the cumulative distribution function (CDF) to allot codewords. ∑ FX (x) = pX (a) a≤x
Modified CDF F X (x) =
∑
1 pX (a) + ; P (X = x) 2 a