Quantum Mechanics / Machine Learning Models Matthias Rupp Fritz Haber Institute of the Max Planck Society
[email protected]
Hands-on Workshop Density-Functional Theory and Beyond Berlin, Germany, July 13–23, 2015
Outline
Introduction
What are QM/ML models?
Machine learning
How does ML work?
Applications
What can be done with them?
Pitfalls
What can go wrong?
Tutorial
Worked example
Matthias Rupp: QM/ML Models
2
accuracy generality
speed
Full configuration interaction Wave-function-based methods Density functional theory Semi-empirical methods Empirical methods
−−−−−−−−−−−→
−−−−−−−−−−−→
Approximations
QM/ML models: The accuracy of quantum mechanics, at the speed of machine learning
Matthias Rupp: QM/ML Models — Introduction
3
QM/ML models Exploit redundancy in a series of QM calculations • • • •
QM/ML = quantum mechanics + machine learning Interpolate between QM calculations using ML Smoothness assumption (regularization) Large systems, long simulations, many systems
property
æ
æ æ æ
æ
æ
• reference calculations — QM - - - ML
molecular structure Matthias Rupp: QM/ML Models — Introduction
4
Relationship to other models
Quantum mechanics
Molecular mechanics
Machine learning
Deductive Form from physics No or little fitting Few or no parameters General applicability Slow Small systems
Mostly deductive Form from physics Fitted to one class Some parameters Limited domain Fast Large systems
Inductive Form from statistics Fitted per dataset Many parameters General applicability In between Large systems
Matthias Rupp: QM/ML Models — Introduction
5
Overview Sensitivity analysis & regularization for PES (Ho, Rabitz; J. Chem. Phys. 89)
1990 1995 2000
ANN MD simulation of silicon (Behler, Parrinello et al; Phys. Rev. Lett. 100)
2005
Atomization energies for compound space (Rupp, von Lilienfeld et al; Phys. Rev. Lett. 108)
ANN for PES interpolation (Behler, Parrinello; Phys. Rev. Lett. 98)
(Bartók, Csányi et al; Phys. Rev. Lett. 104)
ML functionals for orbital-free DFT (Snyder, Burke et al; Phys. Rev. Lett. 108)
(Pozun, Henkelman et al; J. Chem. Phys. 136)
Polymer properties
Quantification of intrinsic motion
(Pilania, Ramprasad et al; Sci. Rep. 3)
(Ghiringhelli, Scheffler et al, submitted)
ANNs for potential energy surface interpolation (Behler; Phys. Chem. Chem. Phys. 2011 Behler; J. Phys. Condens. Matter 2014)
Gaussian approximation potentials 2010
Transition state theory dividing surfaces
Descriptor selection for materials
ANN for eigenvalues of 2d harmonic oscillator (Darsey, Noid, Upadhyaya; Chem. Phys. Lett. 177)
(Leighty, Varma; J. Chem. Theor . Comp. 9)
2015
ANN + charge density for ionized systems (Ghasemi, Goedecker et al; arXiv)
Matthias Rupp: QM/ML Models — Introduction
134 000 molecule dataset (Ramakrishnan et al; Sci. Data 1)
On-the-fly learning of QM forces (Li, Kermode, De Vita; Phys. Rev. Lett. accepted)
6
What is machine learning?
Machine learning (ML) studies algorithms whose performance improves with data (“learning from experience”). (Mitchell, McGraw Hill, 1997) • • • •
Widely applied, many problems and algorithms Systematic identification of regularity in data for prediction & analysis Interpolation in high-dimensional spaces Inductive, data-driven; empirical in a principled way
Hastie, Tibshirani, Friedman, The Elements of Statistical Learning, Springer, 2nd ed., 2009. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. Matthias Rupp: QM/ML Models — Machine learning
7
Kernel learning Idea: • Transform samples into higher-dimensional space • Implicitly compute inner products there • Rewrite linear algorithm to use only inner products sin x 1
-2 Π
-Π
0
Π
x 2Π
7→
-2Π
-Π
Π
x 2Π
-1
Input space X k : X × X → R,
φ
− →
Feature space H
k(x, z) = φ(x), φ(z)
Sch¨ olkopf, Smola: Learning with Kernels, 2002; Hofmann et al., Ann Stat 36: 1171, 2008 Matthias Rupp: QM/ML Models — Machine learning
8
Kernels Kernels correspond to inner products. If k : X × X → R is symmetric positive semi-definite, then k(x, z) = hφ(x), φ(z)i for some φ : X → H. Inner products encode information about lengths and angles: hx,zi ||x − z||2 = hx, xi − 2 hx, zi + hz, zi , cos θ = ||x|| ||z|| . z ÈÈzÈÈ2 ÈÈx-zÈÈ2
Θ 0
ÈÈ z ÈÈ2 cos Θ ÈÈ x ÈÈ2
ÈÈxÈÈ2
x
• • • • •
Well characterized function class Closure properties Access data only by K ij = k(xi , xj ) X can be any non-empty set Examples: Linear kernel hx, zi 2 Gaussian kernel exp − ||x−z|| 2 2σ
Matthias Rupp: QM/ML Models — Machine learning
9
Examples of kernel functions Linear kernel k(x, z) = hx, zi �
�(���) � �
� �
-�
-�
�
�
�
-�
-�
�
�
�
�
-�
-�
-�
• Recovers original linear model Matthias Rupp: QM/ML Models — Machine learning
10
Examples of kernel functions kx − zk2 Gaussian kernel k(x, z) = exp − 2σ 2 �(���)
✄
�
✁
� � � �
-
� �
-�
-�
�
�
�
• Length scale σ • Infinite dimensional feature space • Universal local approximator
Matthias Rupp: QM/ML Models — Machine learning
-✁
✁
✂
-✁
10
Examples of kernel functions kx − zk1 Laplacian kernel k(x, z) = exp − σ
�(���)
✄
�
✁
� � � �
-
� �
-�
-�
�
�
�
-✁
✁
✂
-✁
• Length scale σ Matthias Rupp: QM/ML Models — Machine learning
10
Representer theorem Kernel models have form fˆ(z) =
n X
αi k(xi , z)
i=1
due to the representer theorem:
Any function minimizing a regularized risk functional n ` xi , yi , fˆ(xi ) i=1 + g kfˆk admits to above representation.
Sch¨ olkopf, Herbrich & Smola, COLT 2001 Matthias Rupp: QM/ML Models — Machine learning
11
Kernel ridge regression • Regularized form of ordinary regression • Regularization prevents over-fitting by penalizing large coefficients • Use of kernels for non-linearity Solution has form f (x) =
n X
αi k(xi , x)
i=1
Coefficients α are obtained by solving n X i=1
f (xi ) − yi
2
+ λαT K α,
which has solution α = K + λI
−1
Matthias Rupp: QM/ML Models — Machine learning
y. 12
Density functional theory Learning the map from electron density to kinetic energy • Orbital-free DFT • 1D toy system • DFT/LDA as reference
H2 potential
• Error decays to zero • Self-consistent densities • Bond breaking and formation
H2 binding curve
H2 forces
Snyder et al, Phys Rev Lett 108: 253002, 2012. Snyder et al, J Chem Phys 139: 224104, 2013 Matthias Rupp: QM/ML Models — Applications
13
Transition state theory • • • •
Characterization of dividing surfaces Support vector machines No prior information required Iteratively refined by biased sampling along dividing surface
y
(a)
(b)
(c)
(d)
TS1 P2 P1
R TS2 x*
x
saddle points
Pozun et al, J Chem Phys 136: 174101, 2012. Matthias Rupp: QM/ML Models — Applications
14
Gaussian approximation potentials • Representation: Local density Projection to 4d sphere Hyperspherical harmonics Bispectrum
• Gaussian process regression • Molecular dynamics • Partitioned energies
15%
Energy / eV
10
GAP BOP MEAM FS
0
DFT−LDA GAP Brenner Tersoff
6 Energy / eV
50%
5
4
0
2 0
Rhombohedral graphite 0
0.2
0.4 0.6 Reaction coordinate
Diamond 0.8
Transition path energies
1
C11
C12
C44
Elastic const.
Vacancy energy
(100)
(110)
(111)
(112)
Surface energy
Errors on properties of Tungsten
Bart´ ok, Cs´ anyi et al, Phys Rev Lett 104: 136403, 2010. Szlachta et al, Phys Rev B 90: 104108, 2014 Matthias Rupp: QM/ML Models — Applications
15
Molecular properties RMSE
Representation ( 1 2.4 Z M IJ = 2 ZII ZJ kR I −R J k
Model E ML (M) =
Error @kcalmolD
à
Data 7 165 small organic molecules DFT PBE0 atomization energies
à à
20
à à
à
æ
à
à à
æ æ
æ æ
MAE
æ
æ æ æ
10 500
if I = J if I = 6 J
1000
2000 log2HNL
à à à àà à à à àà à ààà à à à à à æ ææ æ æ
æææææ ææææææ æææ
5000
N P
αi k(M i , M) 2 i −Mk k(M i , M) = exp − kM 2σ 2 I =1
α = (K − λI )−1 E QM Rupp et al, Phys Rev Lett 108: 058301, 2012 Matthias Rupp: QM/ML Models — Applications
16
Extension to other properties Learning the map from molecular structure to molecular properties
• • • • •
Various properties Various levels of theory Small organic molecules Coulomb matrix representations Kernel learning, deep neural networks • For 5k training molecules, errors are comparable to the reference
Montavon et al, New J Phys 15: 095003, 2013. Hansen et al, J Chem Theor Comput 9: 3404, 2013 Matthias Rupp: QM/ML Models — Applications
17
∆-learning Learning the error between different levels of theory • Learn corrections to a baseline method furanone, a methylated cyclo hexanedione, and a cyclopentane fused with propiolactone (∆ = reference - baseline)and methylated bridge atom framework. After identification of these isomers, we calculated validating G4MP2 enthalpies (Figure 5). The 1kML model estimates the isomer• Augmenting legacy QM methods ization enthalpy of these products with a maximal error of 0.6 kcal/mol for product 10. The • Puts physics into QM/MLML-model model predictions agree with G4MP2 results calculated a posteriori , and never exceed G4MP2 B3LYP
CCSD(T)
G4MP2 the threshold of chemical accuracy (1 kcal/mol). • Examples: ∆B3LYP PM7 , ∆PM7 , ∆HF
H
H
Ramakrishnan, Dral, Rupp, von Lilienfeld, J Chem Theor Comput 11: 2087, 2015.
Figure 5: Calculated reaction enthalpies at 298.15 K between the most stable molecule with
Matthias Rupp: QM/ML Models — CApplications 18 7 H10 O2 stoichiometry (6-oxabicyclooctan-7-one, in inset, with atomization enthalpy -1933
Properties of atoms in molecules
→ • Local interpolation is global extrapolation • Linear scaling due to locality ML 0.5k ML 10k
polymer size / # electrons
ML 1k GDB9
234 ●
3
RMSE / %
10
DFT
102
10
906 13
3 ▲ ■
▲ ■
○
○
●
●
2 □
□
1
Cδ
■ ▲ ■ ○
1
1578
Hδ
▲ ■ ○
▲ ▲ ■ ○
●
●
●
□
□
□
1s C δ
○
2250 FC ▲
□
FH ▲ 7 ■
▲
▲
■
■
■
■
○
○
○
●
●
●
●
□
□
□
□
○ ● □
▲
○
5 3
compute time / days
104 #
1
0
50
100 13
150
200
4
C δ / ppm
14
25
35
polymer length / nm
Rupp, Ramakrishnan, von Lilienfeld, arxiv 1505.00350, 2015 Matthias Rupp: QM/ML Models — Applications
19
Overfitting: Model complexity and generalization error Underfitting
Fitting
Overfitting
y 1.2
y 1.2
y 1.2
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.0 0.0
0.5
1.0
1.5
x 2.0
0.0 0.0
0.2 0.5
1.0
1.5
x 2.0
0.0 0.0
0.5
1.0
1.5
0.123 / 0.443
0.044 / 0.068
0.036 / 0.939
λ too large
λ right
λ too small
x 2.0
Rupp, PhD thesis, 2009. Vu, Snyder et al, Int. J. Quant. Chem. 115: 1115, 2015 Matthias Rupp: QM/ML Models — Pitfalls
20
et ts
error
tes
stop
Overfitting: Early stopping rule
training set trainingcomplexity Matthias Rupp: QM/ML Models — Pitfalls
21
Validation Golden rule Training must never use validation data Example 1: overfitting × train on all data, predict all data X split data, train, predict Example 2: centering × center data, split data, train & predict X split data, center training set, train, center test set, predict Example 3: cross-validation with feature selection × feature selection, cross-validation X feature selection for each split of cross-validation Matthias Rupp: QM/ML Models — Pitfalls
22
Reliability of predictions
Predictive variance of Gaussian process regression model Snyder et al, Phys Rev Lett 108: 253002, 2012. Matthias Rupp: QM/ML Models — Pitfalls
23
Gradients
Functional derivative of model as-is and projected on training data Snyder et al, J Chem Phys 139: 224104, 2013. Matthias Rupp: QM/ML Models — Pitfalls
24
Summary
• QM/ML models combine quantum chemistry with machine learning by interpolating between reference QM calculations • The concept is broadly applicable and allows investigation of larger systems, longer timescales, and more systems
Matthias Rupp: QM/ML Models — Summary
25
Tutorial International Journal of Quantum Chemistry 115(16), 2015 Special issue on Quantum Chemistry and Machine Learning
Rupp, Int J Quant Chem 115: 1058, 2015. Matthias Rupp: QM/ML Models — Tutorial
26
Acknowledgements Collaborators O A von Lilienfeld, R Ramakrishnan U Basel M Scheffler, A Tkatchenko, V Gobre FHI K-R M¨ uller, A Ziehe, F Biegler, F Brockherde, G Montavon TU Berlin J C Snyder, K Hansen, S Fazli TU Berlin K Burke, I M Pelaschier, J Huang, L Blooston, L Li UCI G Schneider, G Folkers, M Reutlinger ETHZ G Henkelman, D Sheppard, Z Pozun U Austin F M Boeckler, A Lange, M R Bauer, R Wilcken U T¨ ubingen A Knoll, A Lopez-Bezanilla, A Vazquez-Mayagoitia, P O Dral Institutions • IPAM • SNF (205321-134783, PP00P2 138932) • EU IEF (273039) • sciCORE, U Basel • DFG (FOR1406TP4) • FHI, Max Planck Society Matthias Rupp: QM/ML Models — Acknowledgements
27
www.mrupp.info