Introduction to Bayesian inference and generative models Dr. Richard E. Turner (
[email protected])
Computational and Biological Learning Lab, Department of Engineering University of Cambridge Laboratory for Computational Vision, Center for Neural Science New York University
Question
• Collected inter-spike interval measurements, x, from a neuron • x follows an exponential distribution with a characteristic time-scale λ, shifted to take account of the absolute refractory period of the neuron, 5ms long. • ISIs over 50ms were not recorded (short trials used for data-collection) • N ISIs are observed at {x1, . . . xN }. What is λ?
0
10
20
30
x /ms
40
50
Question
• Collected inter-spike interval measurements, x, from a neuron • x follows an exponential distribution with a characteristic time-scale λ, shifted to take account of the absolute refractory period of the neuron, 5ms long. • ISIs over 50ms were not recorded (short trials used for data-collection) • N ISIs are observed at {x1, . . . xN }. What is λ? 4
count
3 2 1 0 0
10
20
30
x /ms
40
50
Ideas Idea 1 • bin up into a histogram – where do we place the bins • fit to density – what error measure do we minimise? Idea 2 • construct an estimator e.g. the sample mean µ =
1 N
PN
n=1 xn
– which estimator should we choose? mean, variance, higher moments? • relate to parameters via expectation of estimator e.g. µ ≈ hxi = f (λ) – small sample effects can be problematic e.g. if µ > 12 (50 + 5)ms
A less ad hoc method...probabilities as degrees of belief Cox’s axioms • degrees of belief can be represented by real numbers • take into account all evidence • consistency: if things can be reasoned in more than one way, each must lead to the same answer • equivalent states of knowledge → equivalent plausibility assignments Conclusion: degrees of belief must follow the rules of probability. Product rule: p(λ,P x) = p(λ|x)p(x) = p(x|λ)p(λ) (Bayes’ Rule) Sum rule: p(λ) = x p(λ, x) (marginalisation)
A less ad hoc method...probabilities as degrees of belief Cox’s axioms • degrees of belief can be represented by real numbers • take into account all evidence • consistency: if things can be reasoned in more than one way, each must lead to the same answer • equivalent states of knowledge → equivalent plausibility assignments Conclusion: degrees of belief must follow the rules of probability. Product rule: p(λ,P x) = p(λ|x)p(x) = p(x|λ)p(λ) (Bayes’ Rule) Sum rule: p(λ) = x p(λ, x) (marginalisation)
A less ad hoc method...probabilities as degrees of belief Cox’s axioms • degrees of belief can be represented by real numbers • take into account all evidence • consistency: if things can be reasoned in more than one way, each must lead to the same answer • equivalent states of knowledge → equivalent plausibility assignments Conclusion: degrees of belief must follow the rules of probability. Product rule: p(λ,P x) = p(λ|x)p(x) = p(x|λ)p(λ) Sum rule: p(λ) = x p(λ, x)
A less ad hoc method...probabilities as degrees of belief Cox’s axioms • degrees of belief can be represented by real numbers • take into account all evidence • consistency: if things can be reasoned in more than one way, each must lead to the same answer • equivalent states of knowledge → equivalent plausibility assignments Conclusion: degrees of belief must follow the rules of probability. Product rule: p(λ,P x) = p(λ|x)p(x) = p(x|λ)p(λ) ←Bayes’ Rule Sum rule: p(λ) = x p(λ, x) ←marginalisation
Mathematical solution
5ms
50ms
area must sum to one
what the data tell us (likelihood of parameters)
what we knew before hand (prior)
Density
0.1
p(x|λ=10) p(x|λ=20) p(x|λ=50)
0.09 0.08 0.07
p(x|λ)
0.06 0.05 0.04 0.03 0.02 0.01 0 5
10
15
20
25
30
x /ms
35
40
45
50
Likelihood of the parameters
0.04 p(x=15|λ) 0.035 0.03
p(x|λ)
0.025 0.02 0.015 0.01 0.005 0 0
20
40
60
λ /ms
80
100
Likelihood of the parameters
0.04 p(x=15|λ) p(x=20|λ)
0.035 0.03
p(x|λ)
0.025 0.02 0.015 0.01 0.005 0 0
20
40
60
λ /ms
80
100
Likelihood of the parameters
0.04 p(x=15|λ) p(x=20|λ) p(x=30|λ)
0.035 0.03
p(x|λ)
0.025 0.02 0.015 0.01 0.005 0 0
20
40
60
λ /ms
80
100
Posterior distribution: p(λ|x1) −3
x 10 2.5
p(x|λ)
2
1.5
1
0.5
0 0
20
40
60
λ /ms
80
100
Posterior distribution: p(λ|x1, x2) −3
x 10 2.5
p(x|λ)
2
1.5
1
0.5
0 0
20
40
60
λ /ms
80
100
Posterior distribution: p(λ|x1, x2, x3) −3
x 10 2.5
p(x|λ)
2
1.5
1
0.5
0 0
20
40
60
λ /ms
80
100
Posterior distribution: p(λ|x1, x2, x3, x4) −3
x 10 2.5
p(x|λ)
2
1.5
1
0.5
0 0
20
40
60
λ /ms
80
100
Posterior distribution: p(λ|x1, x2, x3, x4, x5) −3
x 10 2.5
p(x|λ)
2
1.5
1
0.5
0 0
20
40
60
λ /ms
80
100
Summarising the posterior distribution −3
2.5
x 10
maximum a posteriori (MAP)
2
p(x|λ)
error-bars 1.5
1
0.5
0 0
20
40
λ /ms
60
80
100
Summarising the posterior distribution −3
2.5
x 10
Gaussian approximation
p(x|λ)
2
1.5
1
0.5
0 0
20
40
λ /ms
60
80
100
Summarising the posterior distribution −3
2.5
x 10
p(x|λ)
2
1.5
1
0.5
0 0
samples from posterior 20
40
λ /ms
60
80
100
Question
• Record inter-spike interval measurements, x • As before: absolute refractory of 5ms & ISIs above 50ms not recorded • We know if the neuron is... – quiescent: x follows an exponential distribution with time-scale λ0 = 25ms – bursting: x follows an exponential distribution with time-scale λ1 = 5ms • You observe a single ISI, x = 15ms. Is the neuron in a bursting state?
Question
• Record inter-spike interval measurements, x • As before: absolute refractory of 5ms & ISIs above 50ms not recorded • We know if the neuron is... – quiescent: x follows an exponential distribution with time-scale λ0 = 25ms – bursting: x follows an exponential distribution with time-scale λ1 = 5ms • You observe a single ISI, x = 15ms. Is the neuron in a bursting state? Intuition: should be close to 50:50
Mathematical solution
Introduce latent variable: Generative model
Recognition model: inference
not bursting bursting
Graphical solution
p(x|b,λb)
0.2 b=0 b=1 0.1
0 5
10
15
20
25
30
x /ms
35
40
45
50
Graphical solution
p(x|b,λb)
0.2
b=0 b=1
crossing point: 13.9 0.1
0 5
10
15
20
25
30
x /ms
35
40
45
50
Graphical solution
p(x|b,λb)
0.2 b=0 b=1 0.1
0 5
10
15
20
25
30
35
40
45
50
35
40
45
50
x /ms
p(b=0|x)
1
0.5
0 5
10
15
20
25
30
x /ms
Graphical solution
p(x|b,λb)
0.2
b=0 b=1
0.1
0 5
10
15
20
25
30
35
40
45
50
15
20
25
30
35
40
45
50
x /ms
p(b=0|x)
1
0.54 0.5
0 5
10
x /ms
Generative models in neuroscience
• data analysis (spike sorting, fMRI, etc.) • ideal observer models in psychophysics • neural encoding models • neural decoding models • Bayesian Brain - the brain is making inferences about the world using probabilistic calculus