Chapter 8: Differential entropy. Chapter 8 outline

Chapter 8: Differential entropy University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye Chapter 8 outline • Motivation • Definitions ...

Author: Marilyn Small

11 downloads 4 Views 3MB Size

Report

Download PDF

Recommend Documents

8 Similarity CHAPTER. Chapter Outline

Chapter 8 Articulations Chapter Outline

Chapter 8 - Concrete. Chapter 8 - Concrete

Canada. Chapter 8. Chapter 8, Section

Linear Programming. Chapter 8 KEY TERMS CHAPTER OUTLINE

Chapter 8. In this chapter:

Chapter 8. Linear Models 8 Linear Models CHAPTER Chapter Outline 8.1 REVIEW OF RATE OF CHANGE 8.2 LINEAR REGRESSION MODELS

Chapter 8 - The Senses. Chapter 8 - The Senses

Chapter 8 Software Testing. Chapter 8 Software testing

Chapter 8: Acid Soils

Chapter 8. Chemical Equations

Chapter 8: Major Elements

Chapter-8 Monostable Multivibrators

CHAPTER 8 EMERGENCY EQUIPMENT

Linear Algebra. Chapter 8

Experimentation. Chapter 8. Contents

Chapter 8 Transportation Data

Chapter 8 Exercise Solutions

Chapter 8: Covalent Bonding

Chapter 8 Opening Statements

Chapter 8: Hypothesis Testing

CHAPTER 8: MULTICOLLINEARITY

Chapter 8- Rotational Motion

CHAPTER 8 FOOD SAFETY

Chapter 8: Differential entropy

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Chapter 8 outline

• Motivation • Definitions • Relation to discrete entropy • Joint and conditional differential entropy • Relative entropy and mutual information • Properties • AEP for Continuous Random Variables

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Motivation • Our goal is to determine the capacity of an AWGN channel

N

h

X

Y

Wireless channel with fading

Gaussian noise ~ N(0,PN)

=hX+N

time

time

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Motivation • Our goal is to determine the capacity of an AWGN channel

N

h

X

Y

Wireless channel with fading

C= =

1 2 1 2

log

!

|h|2 P +PN PN

"

Gaussian noise ~ N(0,PN)

=hX+N

log (1 + SN R) (bits/channel use)

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Motivation • need to define entropy, mutual information between CONTINUOUS random variables • Can you guess? • Discrete X, p(x): • Continuous X, f(x):

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Definitions - densities

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Properties - densities

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Properties - densities

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Properties - densities

247

8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY

an interpretation of the differential entropy: It is the logarithm of the equivalent side length of the smallest set that contains most of the probability. Hence low entropy implies that the random variable is confined to a small effective volume and high entropy indicates that the random widely ECE dispersed. Universityvariable of Illinois is at Chicago 534, Fall 2009, Natasha Devroye Note. Just as the entropy is related to the volume of the typical set, there is a quantity called Fisher information which is related to the surface area of the typical set. We discuss Fisher information in more detail in Sections 11.10 and 17.8. 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY

Quantized random variables Consider a random variable X with density f (x) illustrated in Figure 8.1. Suppose that we divide the range of X into bins of length !. Let us assume that the density is continuous within the bins. Then, by the mean value theorem, there exists a value xi within each bin such that f (xi )! =

!

(i+1)!

f (x) dx.

(8.23)

i!

Consider the quantized random variable X ! , which is defined by X ! = xi

if i! ≤ X < (i + 1)!.

(8.24)

f(x)

∆

x

FIGURE 8.1. Quantization of a continuous random variable.

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

variable is widely dispersed. variable is widely dispersed. Note. Just as the entropy is related to the volume of the typicalNote. set, there Just as the entropy is related to the volume of the typical set, there is a quantity called Fisher information which is related to the is asurface quantity called Fisher information which is related to the surface area of the typical set. We discuss Fisher information in morearea detail of the in typical set. We discuss Fisher information in more detail in Sections 11.10 and 17.8. Sections 11.10 and 17.8. 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY ENTROPY

Quantized random variables

Consider a random variable X with density f (x) illustrated in Figure Consider 8.1.a random variable X with density f (x) illustrated in Figure 8.1. Suppose that we divide the range of X into bins of length !. Suppose Let usthat we divide the range of X into bins of length !. Let us assume that the density is continuous within the bins. Then, byassume the mean that the density is continuous within the bins. Then, by the mean value theorem, there exists a value xi within each bin such thatvalue theorem, there exists a value xi within each bin such that f (xi )! =

!

(i+1)!

f (x) dx.

(8.23)

i!

f (xi )! =

!

(i+1)!

f (x) dx.

(8.23)

i!

Consider the quantized random variable X ! , which is defined by Consider the quantized random variable X ! , which is defined by X ! = xi

if i! ≤ X < (i + 1)!.

(8.24)

f(x)

X ! = xi

if i! ≤ X < (i + 1)!.

(8.24)

f(x)

∆

∆

x

x

247 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY FIGURE 8.1. Quantization of a continuous random variable.

8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY FIGURE 8.1. Quantization of a continuous random variable.

247

an interpretation of the differential entropy: It is the logarithm an interpretation of the of the differential entropy: It is the logarithm of the most ofequivalent the prob- side length of the smallest set that contains most of the probability. Hence low entropy implies that the random variable isability. confined Hence low entropy implies that the random variable is confined to a small effective volume and high entropy indicates that the to random a small effective volume and high entropy indicates that the random variable is widely dispersed. variable is widely dispersed. Note. Just as the entropy is related to the volume of the typicalNote. set, there Just as the entropy is related to the volume of the typical set, there is a quantity called Fisher information which is related to the is asurface quantity called Fisher information which is related to the surface area of the typical set. We discuss Fisher information in morearea detail of the in typical set. We discuss Fisher information in more detail in Sections 11.10 and 17.8. Sections 11.10 and 17.8.

equivalent side length ECE of the smallest that contains University of Illinois at Chicago 534, Fall 2009,set Natasha Devroye

Quantized random variables

8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY ENTROPY Consider a random variable X with density f (x) illustrated in Figure Consider 8.1.a random variable X with density f (x) illustrated in Figure 8.1. Suppose that we divide the range of X into bins of length !. Suppose Let usthat we divide the range of X into bins of length !. Let us assume that the density is continuous within the bins. Then, byassume the mean that the density is continuous within the bins. Then, by the mean value theorem, there exists a value xi within each bin such thatvalue theorem, there exists a value xi within each bin such that f (xi )! =

!

(i+1)!

f (x) dx.

(8.23)

i!

f (xi )! =

!

(i+1)!

f (x) dx.

(8.23)

i!

Consider the quantized random variable X ! , which is defined by Consider the quantized random variable X ! , which is defined by X ! = xi

if i! ≤ X < (i + 1)!.

(8.24)

f(x)

X ! = xi

if i! ≤ X < (i + 1)!.

(8.24)

f(x)

∆

∆

x

FIGURE 8.1. Quantization of a continuous random variable. University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

x

FIGURE 8.1. Quantization of a continuous random variable.

Differential entropy - definition

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Examples

f(x)

a University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

b

x

Examples

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Differential entropy - the good the bad and the ugly

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Differential entropy - the good the bad and the ugly

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Differential entropy - multiple RVs

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Differential entropy of a multi-variate Gaussian

SUMMARY

41

Proof: We have !

2−H (p)−D(p||r) = 2

!

p(x) log p(x)+ p(x) log r(x)

!

=2 " ≤ p(x)2log r(x) " University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye = p(x)r(x) = Pr(X = X # ),

r(x) p(x) log p(x)

(2.151) (2.152) (2.153) (2.154) (2.155)

where the inequality follows from Jensen’s inequality and the convexity of the function f (y) = 2y . !

Parallels with discrete entropy.... The following telegraphic summary omits qualifying conditions. SUMMARY

Definition The entropy H (X) of a discrete random variable X is defined by " H (X) = − p(x) log p(x). (2.156) Properties of H

x∈X

1. H (X) ≥ 0. 2. Hb (X) = (logb a)Ha (X). 3. (Conditioning reduces entropy) For any two random variables, X and Y , we have H (X|Y ) ≤ H (X)

(2.157)

with equality if and only if X and Y are independent. ! 4. H (X1 , X2 , . . . , Xn ) ≤ ni=1 H (Xi ), with equality if and only if the Xi are independent. 5. H (X) ≤ log | X |, with equality if and only if X is distributed uniformly over X. 6. H (p) is concave in p. University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

! ! .... .... .... ....

Parallels with discrete entropy.... 42

ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION

Definition The relative entropy D(p ! q) of the probability mass function p with respect to the probability mass function q is defined by !

p(x) . (2.158) q(x) Definition The mutual information between two random variables X and Y is defined as !! p(x, y) . (2.159) p(x, y) log I (X; Y ) = p(x)p(y) D(p ! q) =

x

p(x) log

x∈X y∈Y

Alternative expressions

42

1 ENTROPY, RELATIVE ENTROPY, H (X) = EAND logMUTUAL, INFORMATION p

p(X)

(2.160)

1 Definition The relative , of the probability (2.161) mass H (X, Y ) entropy = Ep log D(p ! q) p(X, ) function p with respect to the probability Ymass function q is defined by 1 , p(x) (2.162) p(X|Y ) . (2.158) p(x) log x q(x) p(X, Y ) The mutual variables X I (X; Yinformation ) = Ep log between, two random (2.163) p(X)p(Y )

.... .... .... ....

log H (X|Y ) = Ep!

D(p ! q) =

Definition and Y is defined as

p(X) PROBLEMS 43 . p(x, y) (2.164) . (2.159) p(x, y)q(X) log

!! D(p||q) = Ep log

I (X; Y ) =

p(x)p(y)

Properties of D and Ix∈X y∈Y Relative entropy: D(p(x, expressions y)||q(x, y)) = D(p(x)||q(x)) + D(p(y|x)||q(y|x)). Alternative 1. I (X; Y ) = H (X) − H (X|Y ) = H (Y ) − H (Y |X) = H (X) + H (Y ) − H (X, ). IfFall inequality. f 2009, is a convex University of Jensen’s Illinois at Chicago ECEY534, Natasha function, Devroye1 then Ef (X) ≥ f (EX). H (X) = , = q(x), for all x ∈(2.160) 2. D(p ! q) ≥ 0 with equality if E and only if p(x) p log p(X) a , a , . . . , a and LogX. sum inequality. For n positive numbers, 1 2 n b3.1 , Ib(X; 1 equality if and only if 2 , . .Y. ,) b= n , D(p(x, y)||p(x)p(y)) ≥ 0, with , (2.161) H (X, p(x, y) = p(x)p(y) (i.e.,Y )X=and are independent). "EnpYlog # p(X, $Y n n ) ! ! distribution a ai uniform i u is the over X , then D(p ! 4. If | X |= m, and $i=1 ai log ≥ ai log 1 (2.165) n u) = log m − H (p). bi i=1,bi (2.162) i=1 H (X|Y ) = E i=1 p log 5. D(p||q) is convex in the pair (p, q). p(X|Y ) ai with equality if and only if bi = constant. p(X, Y ) Chain rules I (X; Y ) = " E log |X , . . .,, X ). (2.163) Entropy: H (X1 , X2 , . . . , Xn ) = pni=1 H (X i i−1 ) 1 p(X)p(Y Data-processing inequality. If X → Y → Z forms a Markov chain, information: I Mutual (X; Y ) ≥ I (X; Z). "n p(X) ;Y) = = I (X I (X1 , X2 , . . . , XnD(p||q) i ; Y |X1 , X.2 , . . . , Xi−1 ). i=1E (2.164) p log q(X) to {fθ (x)} if and only Sufficient statistic. T (X) is sufficient relative if I (θ ; X) = I (θ ; T (X)) for all distributions on θ .

Parallels with discrete entropy.... Properties of D and I

ˆ ) #= X}. Then Fano’s inequality. Let P = Pr{X(Y

1. I (X; Y ) = H (X) − He (X|Y ) = H (Y ) − H (Y |X) = H (X) + H (Y ) − H (X, YH ).(Pe ) + Pe log |X| ≥ H (X|Y ). (2.166) 2. D(p ! q) ≥ 0 with equality if and only if p(x) = q(x), for all x ∈ Inequality. If X and X $ are independent and identically distributed, X. then 3. I (X; Y ) = D(p(x, y)||p(x)p(y)) ≥ 0, with equality if and only if (X) independent). p(x, y) = p(x)p(y)Pr(X (i.e.,=XX$and ) ≥ 2Y−Hare , (2.167) 4. If | X |= m, and u is the uniform distribution over X, then D(p ! u) = log m − H (p). 5.PROBLEMS D(p||q) is convex in the pair (p, q). 2.1 Coin until the first head occurs. Let Chain rulesflips. A fair coin is flipped "n X denote of flips required. , Xnumber , . . . , X ) = Entropy: H (X1the 2 n i=1 H (Xi |Xi−1 , . . . , X1 ). (a) Find the entropy H (X) in bits. The following expressions may Mutual information: " be useful: I (X1 , X2 , . . . , Xn ; Y ) = ni=1 I (Xi ; Y |X1 , X2 , . . . , Xi−1 ). ∞ ! n=0

rn =

1 , 1−r

∞ ! n=0

nr n =

r . (1 − r)2

(b) A random variable X is drawn according to this distribution. Find an “efficient” sequence of yes–no questions of the form, University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

.... .... .... ....

Parallels with discrete entropy...

PROBLEMS

43

Relative entropy: D(p(x, y)||q(x, y)) = D(p(x)||q(x)) + D(p(y|x)||q(y|x)). Jensen’s inequality. If f is a convex function, then Ef (X) ≥ f (EX). Log sum inequality. For n positive numbers, a1 , a2 , . . . , an and b1 , b2 , . . . , bn , " n # $n n ! ! ai ai $ ai log ≥ ai log i=1 (2.165) n bi i=1 bi i=1

....

i=1

with equality if and only if

ai bi

= constant.

Data-processing inequality. If X → Y → Z forms a Markov chain, I (X; Y ) ≥ I (X; Z). Sufficient statistic. T (X) is sufficient relative to {fθ (x)} if and only if I (θ ; X) = I (θ ; T (X)) for all distributions on θ .

.... .... ....

ˆ ) #= X}. Then Fano’s inequality. Let Pe = Pr{X(Y H (Pe ) + Pe log |X| ≥ H (X|Y ).

(2.166)

Inequality. If X and X $ are independent and identically distributed, then

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Pr(X = X $ ) ≥ 2−H (X) ,

(2.167)

PROBLEMS 2.1

Coin flips.

A fair coin is flipped until the first head occurs. Let

X denote the number of flipsand required.mutual information Relative entropy (a) Find the entropy H (X) in bits. The following expressions may be useful: ∞ ! n=0

rn =

1 , 1−r

∞ ! n=0

nr n =

r . (1 − r)2

(b) A random variable X is drawn according to this distribution. Find an “efficient” sequence of yes–no questions of the form,

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Properties

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

ASIDE: A general definition of mutual information 252

DIFFERENTIAL ENTROPY

Definition The mutual information between two random variables X and Y is given by I (X; Y ) = sup I ([X]P ; [Y ]Q ), (8.54) P ,Q

where the supremum is over all finite partitions P and Q. This is the master definition of mutual information that always applies, even to joint distributions with atoms, densities, and singular parts. Moreover, by continuing to refine the partitions P and Q, one finds a monotonically increasing sequence I ([X]P ; [Y ]Q ) ! I . By arguments similar to (8.52), we can show that this definition of mutual information is equivalent to (8.47) for random variables that have a density. For discrete random variables, this definition is equivalent to University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

A quick example • Find the mutual information between the correlated Gaussian random variables with correlation coefficient "

• What is I(X;Y)?

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

More properties of differential entropy

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

More properties of differential entropy

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Examples of changes in variables

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Concavity and convexity • Same as in the discrete entropy and mutual information....

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Maximum entropy distributions • For a discrete random variable taking on K values, what distribution maximized the entropy? • Can you think of a continuous counter-part?

[Look ahead to Ch.12, pg. 409-412] University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Maximum entropy distributions

[Look ahead to Ch.12, pg. 409-412]

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Maximum entropy examples

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Maximum entropy examples

[Look ahead to Ch.12, pg. 409-412] University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

38

ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION

ˆ where Xˆ is an estimate of X and takes on calculate a function g(Y ) = X, Estimation entropy ˆ to be equal ˆ . We will and X not restrictdifferential the alphabet X to X, and we values inerror

• A counter

will also allow the function g(Y ) to be random. We wish to bound the probability that Xˆ != X. We observe that X → Y → Xˆ forms a Markov chain. Define the probability of error part to Fano’s inequality RVs... ! for discrete " (2.129) Pe = Pr Xˆ != X .

Theorem 2.10.1 (Fano’s Inequality) For any estimator Xˆ such that ˆ with Pe = Pr(X != X), ˆ we have X → Y → X, ˆ ≥ H (X|Y ). H (Pe ) + Pe log |X| ≥ H (X|X)

(2.130)

This inequality can be weakened to 1 + Pe log |X| ≥ H (X|Y )

(2.131)

Why can’t we use Fano’s? or

Pe ≥

H (X|Y ) − 1 . log |X|

(2.132)

Remark Note from (2.130) that Pe = 0 implies that H (X|Y ) = 0, as intuition suggests. Proof: We first ignore the role of Y and prove the first inequality in (2.130). We will then use the data-processing inequality to prove the more traditional form of Fano’s inequality, given by the second inequality in (2.130). Define an error random variable, # 1 if Xˆ != X, (2.133) E= 0 if Xˆ = X. using the Natasha chain rule for University of Illinois at Chicago Then, ECE 534, Fall 2009, Devroye

ˆ in two entropies to expand H (E, X|X)

different ways, we have

ˆ = H (X|X) ˆ + H (E|X, X) ˆ H (E, X|X) $ %& '

(2.134)

Estimation error and differential entropy

58

ASYMPTOTIC EQUIPARTITION PROPERTY

probability distribution. Here it turns out that p(X1 , X2 , . . . , Xn ) is close to 2−nH with high probability. We summarize this by saying, “Almost all events are almost equally surprising.” This is a way of saying that ! " Pr (X1 , X2 , . . . , Xn ) : p(X1 , X2 , . . . , Xn ) = 2−n(H ±!) ≈ 1 (3.1)

if X1 , X2 , . . . , Xn are i.i.d. ∼ p(x). # # In the example just given, where p(X1 , X2 , . . . , Xn ) = p Xi q n− Xi , we are simply saying that the number of 1’s in the sequence is close to np (with high probability), and all such sequences have (roughly) the same probability 2−nH (p) . We use the idea of convergence in probability, defined as follows: Definition (Convergence University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroyeof

random variables). Given a sequence of random variables, X1 , X2 , . . . , we say that the sequence X1 , X2 , . . . converges to a random variable X: 1. In probability if for every ! > 0, Pr{|Xn − X| > !} → 0 2. In mean square if E(Xn − X)2 → 0 3. With probability 1 (also called almost surely) if Pr{limn→∞ Xn = X} = 1

The AEP for continuous RVs 3.1

ASYMPTOTIC EQUIPARTITION PROPERTY THEOREM

equipartition property is formalized in the following • The AEPThe for asymptotic discrete RVs said..... theorem.

Theorem 3.1.1

(AEP)

If X1 , X2 , . . . are i.i.d. ∼ p(x), then

1 − log p(X1 , X2 , . . . , Xn ) → H (X) n

• The

in probability.

(3.2)

Proof: Functions of independent random variables are also independent random variables. Thus, since the Xi are i.i.d., so are log p(Xi ). Hence, weak law of RVs large numbers, AEP by forthe continuous says..... 1$ 1 log p(Xi ) − log p(X1 , X2 , . . . , Xn ) = − n n

(3.3)

i

→ −E log p(X) = H (X),

which proves the theorem.

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

in probability

(3.4) (3.5) !

Typical sets • One of the points of the AEP is to define typical sets. • Typical set for discrete RVs...

• Typical set of continuous RVs....

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Typical sets and volumes

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Summary 256

DIFFERENTIAL ENTROPY

SUMMARY h(X) = h(f ) = − .

f (X n )=2−nh(X)

!

f (x) log f (x) dx

. nh(X) . Vol(A(n) ! )=2

256

(8.82) (8.83) (8.84)

H ([X]2−n ) ≈ h(X) + n.

DIFFERENTIAL ENTROPY

1 log 2π eσ 2 . 2 1 log(2π e)n |K|. h(Nn (µ, K)) =SUMMARY 2 ! f! D(f ||g) = f log ≥ 0. h(X) = h(f ) = − g f (x) log f (x) dx h(N(0, σ 2 )) =

n "

(8.81)

S

(8.85) (8.86) (8.87)

(8.81)

S

. n ) −nh(X) . . .n,)X = h(Xi |X1 , X2 , . . . , Xi−1 ). h(X1 , Xf2 ,(X =2

(8.88) (8.82)

i=1

.

(n) nh(X) . Vol(Ah(X|Y ! )=2) ≤ h(X).

h(aX) = h(X) ++n.log |a|. H ([X]2−n ) ≈ h(X)

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye!

f (x, y) 2 Y) 1 = logf2π (x,eσ y)2log h(N(0, σI (X; )) = . f (x)f (y) ≥ 0. 2 1 n = log(2π e) |K|. max h(X) 1 t e)n |K|. (µ, =K K)) = log(2π h(NnEXX 2 2 ˆ ))2 !≥ 1 e2h(X|Y E(X − X(Y f ). e D(f ||g) = 2π f log ≥ 0. g

Summary

(8.83) (8.89) (8.90) (8.84) (8.91) (8.85) (8.92)

(8.86) (8.87)

size for a discrete random variable. 2nH (X) is the effective alphabet n " 2nh(X) is the effective support set size for a continuous random variable. ,X ,...,X ) = h(Xi |X , X2 , . .of. ,capacity Xi−1 ). C. (8.88) h(X 2C is1 the2effective nalphabet size of a 1channel i=1

PROBLEMS 8.1

h(X|Y ) ≤ h(X).

h(aX) = h(X) + log |a|.

(8.89) (8.90)

Differential entropy. ! Evaluate the differential entropy h(X) = # f (x, y) − f Iln(X; f for the following: ≥ 0. (8.91) Y ) = f (x, y) log −λx(y) f λe (x)f (a) The exponential density, f (x) = , x ≥ 0.

1 log(2π e)n |K|. 2 ˆ ))2 ≥ 1 e2h(X|Y ) . E(X − X(Y 2π e max h(X) =

EXXt =K

(8.92)

2nH (X) is the effective alphabet size for a discrete random variable. 2nh(X) is the effective support set size for a continuous random variable. 2C is the effective alphabet size of a channel of capacity C.

PROBLEMS 8.1

University of Illinois at Chicago ECE 534, Fall 2009, Natasha Devroye

Differential entropy. Evaluate the differential entropy h(X) = # − f ln f for the following: (a) The exponential density, f (x) = λe−λx , x ≥ 0.