Probabilistic Programming
Miguel L´azaro-Gredilla
[email protected]
January 2013 Machine Learning Group http://www.tsc.uc3m.es/~miguel/MLG/
Contents
Towards a modern machine learning
Probabilistic programming languages
Infer.NET The software stack: A digression The modeling language
References
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Classical machine learning I
I
Large number of tools with diverse backgrounds I
k-means
I
Principal Component Analysis
I
Independent Component Analysis
I
Classical Neural Networks
I
Support Vector Machines
I
Density estimation via Parzen windows
I
Recursive Least Squares
I
Least Mean Squares
I
(just an arbitrary sample, we could go on and on...)
Pragmatic, unsystematic, non-probabilistic 1/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Third generation machine learning I
Data sets described as instances of a probabilistic model I
Gaussian Process regression/classification
I
Latent Dirichlet Allocation
I
Bayes Point Machine
I
...
I
We can infer unknowns in a principled way
I
Features I
Each tool can be expressed as a Bayesian network
I
Systematic, modular, standardized approach
I
Proposals are models, not algorithms
I
Detach inference from model 2/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Updating classical machine learning (I/V) I
Should we just dump classical ML and jump on the Bayesian bandwagon?
I
Most classical ML tools can be written as some type of inference on some Bayesian network
I
So just update classical ML using a Bayesian interpretation I
Gain insight on the model behind the algorithm
I
See overlaps emerge
I
Use other types of inference
I
It may become obvious how to enhance them
3/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Updating classical machine learning (II/V) I
Classical algorithm: k-means
I
Bayesian model (assuming normalized data) p(xn |zn , {µk }) = N (xn |µzn , v I) p(v ) = InvGamma(v |1, 1) p(µk ) = N (µk |0, 10I) p(zn |w) = Discrete(zn |w) p(w) = Dirichlet(w|1k×1 )
I
Classical model corresponds to I
Maximum likelihood for p({xn }|{zn }, {µk }), obtained using hard-EM optimization
I
Particular case of Gaussian mixture model 4/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Updating classical machine learning (III/V) I
Classical algorithm: Extended Recursive Least Squares (adaptive)
I
Bayesian model p(xn |wn ) = N (xn |w> n un , v ) p(wn |wn−1 , α, β) = N (wn |(1 − β)wn−1 , βαI) p(v ) = InvGamma(v |1, 1) p(α) = InvGamma(α|1, 1)
I
p(β/(1 − β)) = InvGamma(β/(1 − β)|1, 1) Classical model corresponds to I
Posterior mean for wn , for some magically selected α and v
I
Particular case of Kalman filter
I
Contrived assumptions needed to represent exponentially weighted RLS in this framework. I
Hints that it might not make sense, see [KRLST]. 5/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Updating classical machine learning (IV/V) I
Classical algorithm: Principal Component Analysis
I
Bayesian model p(xn |yn , W, v ) = N (xn |W> yn , v I) p(yn ) = N (yn |0, I) p([W]d-th col ) = N (wd |0, diag([α1 , . . . , αD ]) p(v ) = InvGamma(v |1, 1) p(αd ) = InvGamma(v |1, 1)
I
Classical model corresponds to I
Maximum likelihood for p(xn |W, v ) when v → 0, with W product of orthogonal and ordered diagonal matrix
I
Restrictions on W make it unique, but don’t change the model
I
New possibilities: What if we do max. lik. for p(xn |{yn }, v )? 6/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Updating classical machine learning (V/V)
I
Classical algorithm: Support Vector Machines
I
Bayesian model I
See: Sollich, P. (2002). Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities. Machine Learning, 46:21-52.
I
Ingenious approach, but not very natural (involves using three classes to solve a binary problem)
7/27
Contents
Towards a modern machine learning
Probabilistic programming languages
Infer.NET The software stack: A digression The modeling language
References
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Some observations According to the previous slides I
Most ML tools have a Bayesian model description
I
Full description takes a few lines
I
The type of inference (ML, MAP, point estimates, full Bayesian posterior) is independent of the model I
Though the tractability of each does depend on the model
In the process of creating new ML tools I
What makes a new ML tool worth is: A novel model
I
What we spend most time working on is: Making inference tractable on the new model
8/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
The idea Probabilistic programming I
Define a language to describe Bayesian models (“programs”)
I
Create some “compiler” that understands those programs and generates inference engines for them
The new workflow (emphasis is on model design) I
Spend more time thinking about the model
I
Program it (just a few lines!)
I
Optional: Sample data from the model
I
Feed data to the inference engine and assess the results
I
Model wasn’t that good, do it over 9/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Programming paradigms (I/II) A random assortment of them: I
Imperative (Matlab)
I
Procedural (C)
I
Object oriented (C++)
I
Declarative (SQL)
I
Functional (Ocaml, F#)
I
Metaprogramming (LISP)
I
Domain specific language (Spice)
10/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Programming paradigms (II/II) For probabilistic programming: I A domain specific language may be simpler for the user, but I
Doesn’t integrate well with existing codebases
I
Doesn’t interface well with DB access, plotting capabilities, etc.
I
Functional languages with metaprogramming can be useful to write a “guest probabilistic program” within a “host programming language” (we’ll see F# examples)
I
This can be done with imperative style, but looks uglier (we’ll see Python examples)
11/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
An incomplete list of probabilistic programming languages I
BUGS: Bayesian inference using Gibbs sampling
I
HANSEI: Extends OCaml, discrete distributions only
I
Hierarchical Bayesian Compiler (HBC): Large-scale models and non-parametric process priors
I
PyMCMC: MCMC algorithms for Python classes
I
Church: Extends Scheme to describe Bayesian models
I
Infer.NET: Provides a probabilistic language within the .NET platform
See more on http://probabilistic-programming.org
12/27
Contents
Towards a modern machine learning
Probabilistic programming languages
Infer.NET The software stack: A digression The modeling language
References
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
The Java case Three big operating systems I
Linux, OSX, Windows
...and one language to rule them all I
Sun Microsystems designed Java to run on the JVM
I
...and implemented the JVM to run on linux, mac, and windows
I
platform independence, bliss for programmers
13/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
The .NET case Microsoft reacted and created the JVM counterpart: The CLR Which language was used to target the CLR? I
The .NET languages: VB.NET, C#, F#, IronPython...
I
Different languages with a common set of libraries, it is easy to interface them
Which operating systems did the CLR run on? I
Microsoft released the specification and standardized it
I
CLR-like implementations arose for linux/mac: Mono
I
...but the Windows version is always ahead, more complete, with additional tools, better tested, etc.
Microsoft tries to win in the cross-platform territory (sweet spot for developers) while still favoring its flagship product, Windows: Opposing objectives 14/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Infer.NET’s interoperability I
Infer.NET targets all .NET languages, with a focus on C#, F#, and IronPython
I
It can be used on Mono (Mac/Linux) or the CLR (Windows) (better experience/less buggy on the latter)
I
The .NET languages have a growing set of tools for scientific computing, but nowhere near Matlab yet
I
IronPython cannot use Numpy/Scipy/Matplotlib natively
I
A new tool called Sho provides an IronPython shell with Matlab-like capabilities (but Windows only)
15/27
Contents
Towards a modern machine learning
Probabilistic programming languages
Infer.NET The software stack: A digression The modeling language
References
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Using Infer.NET Infer.NET provides: I A probabilistic modeling language embedded in other languages I
I
F# allows a more natural embedding
Compilation to three inference engines I
EP: Approximation includes all non-zero probability points
I
VB: Approximation avoids zero probability points
I
MCMC: Gibbs sampling, slower
Let’s browse Microsoft Research examples and create a new model http://research.microsoft.com/en-us/um/cambridge/ projects/infernet/
16/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Two coins (F#)
17/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Two coins (IronPython)
18/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Learning a Gaussian (F#)
19/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Learning a Gaussian (IronPython)
20/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Truncated Gaussian (F#)
21/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Truncated Gaussian (IronPython)
22/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Gaussian Mixture (F#)
23/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Gaussian Mixture (IronPython)
24/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
k-means (F#)
[Demo]
25/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Conclusions
I
Probabilistic programming is in its infancy
I
We might produce alternative language definitions/implementations
I
We can leverage it to test new models faster
I
We can build custom models on the fly
26/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
References I [BayesSVM] Sollich, P. (2002). Bayesian Methods for Support Vector
Machines: Evidence and Predictive Class Probabilities. Machine Learning, 46:21-52. I [ProbProg] The probabilistic programming wiki
http://probabilistic-programming.org/wiki/Home I [InferNET] T. Minka, J. Winn, J. Guiver, and D. Knowles Infer.NET 2.5,
Microsoft Research Cambridge, 2012. http://research.microsoft.com/infernet I [KRLST] M. L´ azaro-Gredilla, S. Van Vaerenbergh, and I. Santamar´ıa. “A
Bayesian approach to tracking with kernel recursive least-squares”, MLSP 2011.
27/27