Introduction to Bayesian inference and generative models

Introduction to Bayesian inference and generative models Dr. Richard E. Turner ([email protected]) Computational and Biological Learning Lab, Departmen...
Author: Hillary Johnson
0 downloads 0 Views 765KB Size
Introduction to Bayesian inference and generative models Dr. Richard E. Turner ([email protected])

Computational and Biological Learning Lab, Department of Engineering University of Cambridge Laboratory for Computational Vision, Center for Neural Science New York University

Question

• Collected inter-spike interval measurements, x, from a neuron • x follows an exponential distribution with a characteristic time-scale λ, shifted to take account of the absolute refractory period of the neuron, 5ms long. • ISIs over 50ms were not recorded (short trials used for data-collection) • N ISIs are observed at {x1, . . . xN }. What is λ?

0

10

20

30

x /ms

40

50

Question

• Collected inter-spike interval measurements, x, from a neuron • x follows an exponential distribution with a characteristic time-scale λ, shifted to take account of the absolute refractory period of the neuron, 5ms long. • ISIs over 50ms were not recorded (short trials used for data-collection) • N ISIs are observed at {x1, . . . xN }. What is λ? 4

count

3 2 1 0 0

10

20

30

x /ms

40

50

Ideas Idea 1 • bin up into a histogram – where do we place the bins • fit to density – what error measure do we minimise? Idea 2 • construct an estimator e.g. the sample mean µ =

1 N

PN

n=1 xn

– which estimator should we choose? mean, variance, higher moments? • relate to parameters via expectation of estimator e.g. µ ≈ hxi = f (λ) – small sample effects can be problematic e.g. if µ > 12 (50 + 5)ms

A less ad hoc method...probabilities as degrees of belief Cox’s axioms • degrees of belief can be represented by real numbers • take into account all evidence • consistency: if things can be reasoned in more than one way, each must lead to the same answer • equivalent states of knowledge → equivalent plausibility assignments Conclusion: degrees of belief must follow the rules of probability. Product rule: p(λ,P x) = p(λ|x)p(x) = p(x|λ)p(λ) (Bayes’ Rule) Sum rule: p(λ) = x p(λ, x) (marginalisation)

A less ad hoc method...probabilities as degrees of belief Cox’s axioms • degrees of belief can be represented by real numbers • take into account all evidence • consistency: if things can be reasoned in more than one way, each must lead to the same answer • equivalent states of knowledge → equivalent plausibility assignments Conclusion: degrees of belief must follow the rules of probability. Product rule: p(λ,P x) = p(λ|x)p(x) = p(x|λ)p(λ) (Bayes’ Rule) Sum rule: p(λ) = x p(λ, x) (marginalisation)

A less ad hoc method...probabilities as degrees of belief Cox’s axioms • degrees of belief can be represented by real numbers • take into account all evidence • consistency: if things can be reasoned in more than one way, each must lead to the same answer • equivalent states of knowledge → equivalent plausibility assignments Conclusion: degrees of belief must follow the rules of probability. Product rule: p(λ,P x) = p(λ|x)p(x) = p(x|λ)p(λ) Sum rule: p(λ) = x p(λ, x)

A less ad hoc method...probabilities as degrees of belief Cox’s axioms • degrees of belief can be represented by real numbers • take into account all evidence • consistency: if things can be reasoned in more than one way, each must lead to the same answer • equivalent states of knowledge → equivalent plausibility assignments Conclusion: degrees of belief must follow the rules of probability. Product rule: p(λ,P x) = p(λ|x)p(x) = p(x|λ)p(λ) ←Bayes’ Rule Sum rule: p(λ) = x p(λ, x) ←marginalisation

Mathematical solution

5ms

50ms

area must sum to one

what the data tell us (likelihood of parameters)

what we knew before hand (prior)

Density

0.1

p(x|λ=10) p(x|λ=20) p(x|λ=50)

0.09 0.08 0.07

p(x|λ)

0.06 0.05 0.04 0.03 0.02 0.01 0 5

10

15

20

25

30

x /ms

35

40

45

50

Likelihood of the parameters

0.04 p(x=15|λ) 0.035 0.03

p(x|λ)

0.025 0.02 0.015 0.01 0.005 0 0

20

40

60

λ /ms

80

100

Likelihood of the parameters

0.04 p(x=15|λ) p(x=20|λ)

0.035 0.03

p(x|λ)

0.025 0.02 0.015 0.01 0.005 0 0

20

40

60

λ /ms

80

100

Likelihood of the parameters

0.04 p(x=15|λ) p(x=20|λ) p(x=30|λ)

0.035 0.03

p(x|λ)

0.025 0.02 0.015 0.01 0.005 0 0

20

40

60

λ /ms

80

100

Posterior distribution: p(λ|x1) −3

x 10 2.5

p(x|λ)

2

1.5

1

0.5

0 0

20

40

60

λ /ms

80

100

Posterior distribution: p(λ|x1, x2) −3

x 10 2.5

p(x|λ)

2

1.5

1

0.5

0 0

20

40

60

λ /ms

80

100

Posterior distribution: p(λ|x1, x2, x3) −3

x 10 2.5

p(x|λ)

2

1.5

1

0.5

0 0

20

40

60

λ /ms

80

100

Posterior distribution: p(λ|x1, x2, x3, x4) −3

x 10 2.5

p(x|λ)

2

1.5

1

0.5

0 0

20

40

60

λ /ms

80

100

Posterior distribution: p(λ|x1, x2, x3, x4, x5) −3

x 10 2.5

p(x|λ)

2

1.5

1

0.5

0 0

20

40

60

λ /ms

80

100

Summarising the posterior distribution −3

2.5

x 10

maximum a posteriori (MAP)

2

p(x|λ)

error-bars 1.5

1

0.5

0 0

20

40

λ /ms

60

80

100

Summarising the posterior distribution −3

2.5

x 10

Gaussian approximation

p(x|λ)

2

1.5

1

0.5

0 0

20

40

λ /ms

60

80

100

Summarising the posterior distribution −3

2.5

x 10

p(x|λ)

2

1.5

1

0.5

0 0

samples from posterior 20

40

λ /ms

60

80

100

Question

• Record inter-spike interval measurements, x • As before: absolute refractory of 5ms & ISIs above 50ms not recorded • We know if the neuron is... – quiescent: x follows an exponential distribution with time-scale λ0 = 25ms – bursting: x follows an exponential distribution with time-scale λ1 = 5ms • You observe a single ISI, x = 15ms. Is the neuron in a bursting state?

Question

• Record inter-spike interval measurements, x • As before: absolute refractory of 5ms & ISIs above 50ms not recorded • We know if the neuron is... – quiescent: x follows an exponential distribution with time-scale λ0 = 25ms – bursting: x follows an exponential distribution with time-scale λ1 = 5ms • You observe a single ISI, x = 15ms. Is the neuron in a bursting state? Intuition: should be close to 50:50

Mathematical solution

Introduce latent variable: Generative model

Recognition model: inference

not bursting bursting

Graphical solution

p(x|b,λb)

0.2 b=0 b=1 0.1

0 5

10

15

20

25

30

x /ms

35

40

45

50

Graphical solution

p(x|b,λb)

0.2

b=0 b=1

crossing point: 13.9 0.1

0 5

10

15

20

25

30

x /ms

35

40

45

50

Graphical solution

p(x|b,λb)

0.2 b=0 b=1 0.1

0 5

10

15

20

25

30

35

40

45

50

35

40

45

50

x /ms

p(b=0|x)

1

0.5

0 5

10

15

20

25

30

x /ms

Graphical solution

p(x|b,λb)

0.2

b=0 b=1

0.1

0 5

10

15

20

25

30

35

40

45

50

15

20

25

30

35

40

45

50

x /ms

p(b=0|x)

1

0.54 0.5

0 5

10

x /ms

Generative models in neuroscience

• data analysis (spike sorting, fMRI, etc.) • ideal observer models in psychophysics • neural encoding models • neural decoding models • Bayesian Brain - the brain is making inferences about the world using probabilistic calculus