Stan: A platform for Bayesian inference

Stan: A platform for Bayesian inference Andrew Gelman, Bob Carpenter, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiq...
Author: Mitchell Dean
8 downloads 0 Views 3MB Size
Stan: A platform for Bayesian inference Andrew Gelman, Bob Carpenter, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell Department of Statistics, Columbia University, New York (and other places)

5 Mar 2014

1/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

2/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

3/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

4/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Stan code for linear regression

data { int N; int K; vector[N] y; matrix[N,K] X; } parameters { vector[K] b; real sigma; } model { y ~ normal(X*b, sigma); }

5/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Stan code for an item response model: 1 data { int I; // # questions int J; // # students int K; // # schools int N; // # observations int ii[N]; // question for n int jj[N]; // student for n int kk[N]; // school for n int y[N]; // correctness for n } parameters { vector[J] alpha; // ability for student j vector[I] beta; // difficulty for item i vector[K] gamma; // ability for school k vector[I] delta; // discrimination for item i real sigma_gamma; // school ability scale } 6/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Stan code for an item response model: 2

model { alpha ~ normal(0,1); // priors beta ~ normal(0,10); gamma ~ normal(0,sigma_gamma); delta ~ lognormal(-0.5,1); for (n in 1:N) // likelihood y[n] ~ bernoulli_logit(delta[ii[n]] * ( alpha[jj[n]] + gamma[kk[n]] - beta[ii[n]] ) ); }

7/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Stan overview

I

Fit open-ended Bayesian models

I

Specify log posterior density in C++

I

Code a distribution once, then use it everywhere

I

Hamiltonian No-U-Turn sampler

I

Autodiff

I

Runs from R or Python; postprocessing

8/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

People I

Core: I I

I

I I I I

I I I

Andrew Gelman (stats/poli-sci): R, proto-user . . . Bob Carpenter (comp sci/linguistics): C++, language design. . . Matt Hoffman (comp sci/acoustics): NUTS, optimization, C++, . . . Daniel Lee (comp sci/stats): C++, make, integration, . . . Ben Goodrich (stats/poli-sci): linear algebra, R, C++, . . . Michael Betancourt (physics): geometric samplers, C++, . . . Marcus Brubaker (comp sci): linear algebra, optimization, C++, . . . Jiqiang Guo (stats): R, C++, . . . Peter Li (stats/math): C++, . . . Allen Riddell (comp sci): Python, C++, . . .

I

Research collaborators

I

Users 9/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Funding

I

U.S. National Science Foundation

I

Novartis

I

Columbia University

I

U.S. Department of Energy

10/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Roles of Stan

I

Bayesian inference for unsophisticated users (alternative to BUGS)

I

Bayesian inference for sophisticated users (alternative to programming it yourself)

I

Fast and scalable gradient computation

I

Environment for developing new algorithms

11/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Example from toxicology

12/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Sparse data

13/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Validation using predictive simulations

14/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Public opinion: Health care reform

15/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Public opinion: School vouchers

16/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Public opinion: Death penalty

I

Hierarchical time series model

I

Open-ended 17/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Tree rings

18/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Speed dating

I

Each person meets 10–20 “dates”

I

Rate each date on Attractiveness, Sincerity, Intelligence, Ambition, Fun to be with, Shared interests Outcomes

I

I I

Do you want to see this person again? (Yes/No) How much do you like this person (1–10)

I

How important are each of the 6 attributes?

I

Logistic or linear regression

I

Hierarchical model: coefficients vary by person

19/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Steps of Bayesian data analysis

I

Model building

I

Inference

I

Model checking

I

Model understanding and improvement

20/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Background on Bayesian computation

I

Point estimates and standard errors

I

Hierarchical models

I

Posterior simulation

I

Markov chain Monte Carlo (Gibbs sampler and Metropolis algorithm)

I

Hamiltonian Monte Carlo

21/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Solving problems I

Problem: Gibbs too slow, Metropolis too problem-specific

I

Solution: Hamiltonian Monte Carlo

I

Problem: Interpreters too slow, won’t scale

I

Solution: Compilation

I

Problem: Need gradients of log posterior for HMC

I

Solution: Reverse-mode algorithmic differentation

I

Problem: Existing algo-diff slow, limited, unextensible

I

Solution: Our own algo-diff

I

Problem: Algo-diff requires fully templated functions

I

Solution: Our own density library, Eigen linear algebra 22/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Radford Neal (2011) on Hamiltonian Monte Carlo

“One practical impediment to the use of Hamiltonian Monte Carlo is the need to select suitable values for the leapfrog stepsize, , and the number of leapfrog steps L . . . Tuning HMC will usually require preliminary runs with trial values for  and L . . . Unfortunately, preliminary runs can be misleading . . . ” 23/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

The No U-Turn Sampler

I

Created by Matt Hoffman

I

Run the HMC steps until they start to turn around (bend with an angle > 180◦ )

I

Computationally efficient

I

Requires no tuning of #steps

I

Complications to preserve detailed balance

24/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Hoffman and Gelman

NUTS Example Trajectory 0.4

0.3

0.2

0.1

0

−0.1 −0.1

0

0.1

0.2

0.3

0.4

0.5

Blue ellipse is contour of target distribution Initial position at black solid circle Arrows indicate a U-turn in momentum

Figure I 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipse is a contour of the target distribution, the black open circles are the positions θ I traced out by the leapfrog integrator and associated with elements of the set of visited states B, the black solid circle is the starting position, the red solid circles I are positions associated with states that must be excluded from the set C of possible next samples because their joint probability is below the slice variable u, Gelman Carpenter Betancourt .. Stan: A platform and theHoffman positions Lee withGoodrich a red “x” through them .correspond to states that mustfor be Bayesian inference

25/35

No-U-Turn Sampler NUTS vs. Gibbs andThe Metropolis

 

 



 

      

 



I









 











 











 











Two dimensions of highly correlated 250-dim distribution

Figure 7: Samples generated by random-walk Metropolis, Gibbs sampling, and NUTS. The plots

1M samples from Metropolis, from correlated Gibbs (thin to 1K) compare 1,000 independent draws from1M a highly 250-dimensional distribu-

I

tion1K (right) with 1,000,000 samples1K (thinned to 1,000 samples I samples from NUTS, independent draws for display) generated by random-walk Metropolis (left), 1,000,000 samples (thinned to 1,000 samples for display) generated by Gibbs sampling (second from left), and 1,000 samples generated by NUTS (second from right). Only the first two dimensions are shown here. 26/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

NUTS vs. Basic HMC I I I

250-D normal and logistic regression models Vertical axis shows effective #sims (big is good) (Left) NUTS; (Right) HMC with increasing t = L

27/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

NUTS vs. Basic HMC II I I I

Hierarchical logistic regression and stochastic volatility Simulation time is step size  times #steps L NUTS can beat optimally tuned HMC

28/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Solving more problems in Stan I I I I I I I I I I I I

Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation Problem: Efficient up-to-proportion density calcs Solution: Density template metaprogramming Problem: Limited error checking, recovery Solution: Static model typing, informative exceptions Problem: Poor boundary behavior Solution: Calculate limits (e.g. limx→0 x log x) Problem: Restrictive licensing (e.g., closed, GPL, etc.) Solution: Open-source, BSD license 29/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Example: the “Kumaraswamy distribution”

p(θ|a, b) = a b x θ−1 (1 − θa )(b − 1) for a, b > 0 and θ ∈ (0, 1) model { // Put priors on a and b here if you want // Put in the rest of your model // Kumaraswamy log-likelihood increment_log_prob(N*(log(a)+log(b))+(a-1)*sum_log_theta); for (n in 1:N) increment_log_prob((b-1)*log1m(pow(theta[n],a))); }

30/35

Gelman Carpenter Hoffman Lee Goodrich Betancourt . . .

Stan: A platform for Bayesian inference

Check that it worked R code: N

Suggest Documents