Quantum Mechanics / Machine Learning Models Matthias Rupp Fritz Haber Institute of the Max Planck Society [email protected]

Hands-on Workshop Density-Functional Theory and Beyond Berlin, Germany, July 13–23, 2015

Outline

Introduction

What are QM/ML models?

Machine learning

How does ML work?

Applications

What can be done with them?

Pitfalls

What can go wrong?

Tutorial

Worked example

Matthias Rupp: QM/ML Models

2

accuracy generality

speed

Full configuration interaction Wave-function-based methods Density functional theory Semi-empirical methods Empirical methods

−−−−−−−−−−−→

−−−−−−−−−−−→

Approximations

QM/ML models: The accuracy of quantum mechanics, at the speed of machine learning

Matthias Rupp: QM/ML Models — Introduction

3

QM/ML models Exploit redundancy in a series of QM calculations • • • •

QM/ML = quantum mechanics + machine learning Interpolate between QM calculations using ML Smoothness assumption (regularization) Large systems, long simulations, many systems

property

æ

æ æ æ

æ

æ

• reference calculations — QM - - - ML

molecular structure Matthias Rupp: QM/ML Models — Introduction

4

Relationship to other models

Quantum mechanics

Molecular mechanics

Machine learning

Deductive Form from physics No or little fitting Few or no parameters General applicability Slow Small systems

Mostly deductive Form from physics Fitted to one class Some parameters Limited domain Fast Large systems

Inductive Form from statistics Fitted per dataset Many parameters General applicability In between Large systems

Matthias Rupp: QM/ML Models — Introduction

5

Overview Sensitivity analysis & regularization for PES (Ho, Rabitz; J. Chem. Phys. 89)

1990 1995 2000

ANN MD simulation of silicon (Behler, Parrinello et al; Phys. Rev. Lett. 100)

2005

Atomization energies for compound space (Rupp, von Lilienfeld et al; Phys. Rev. Lett. 108)

ANN for PES interpolation (Behler, Parrinello; Phys. Rev. Lett. 98)

(Bartók, Csányi et al; Phys. Rev. Lett. 104)

ML functionals for orbital-free DFT (Snyder, Burke et al; Phys. Rev. Lett. 108)

(Pozun, Henkelman et al; J. Chem. Phys. 136)

Polymer properties

Quantification of intrinsic motion

(Pilania, Ramprasad et al; Sci. Rep. 3)

(Ghiringhelli, Scheffler et al, submitted)

ANNs for potential energy surface interpolation (Behler; Phys. Chem. Chem. Phys. 2011 Behler; J. Phys. Condens. Matter 2014)

Gaussian approximation potentials 2010

Transition state theory dividing surfaces

Descriptor selection for materials

ANN for eigenvalues of 2d harmonic oscillator (Darsey, Noid, Upadhyaya; Chem. Phys. Lett. 177)

(Leighty, Varma; J. Chem. Theor . Comp. 9)

2015

ANN + charge density for ionized systems (Ghasemi, Goedecker et al; arXiv)

Matthias Rupp: QM/ML Models — Introduction

134 000 molecule dataset (Ramakrishnan et al; Sci. Data 1)

On-the-fly learning of QM forces (Li, Kermode, De Vita; Phys. Rev. Lett. accepted)

6

What is machine learning?

Machine learning (ML) studies algorithms whose performance improves with data (“learning from experience”). (Mitchell, McGraw Hill, 1997) • • • •

Widely applied, many problems and algorithms Systematic identification of regularity in data for prediction & analysis Interpolation in high-dimensional spaces Inductive, data-driven; empirical in a principled way

Hastie, Tibshirani, Friedman, The Elements of Statistical Learning, Springer, 2nd ed., 2009. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. Matthias Rupp: QM/ML Models — Machine learning

7

Kernel learning Idea: • Transform samples into higher-dimensional space • Implicitly compute inner products there • Rewrite linear algorithm to use only inner products sin x 1

-2 Π



0

Π

x 2Π

7→

-2Π



Π

x 2Π

-1

Input space X k : X × X → R,

φ

− →

Feature space H

k(x, z) = φ(x), φ(z)

Sch¨ olkopf, Smola: Learning with Kernels, 2002; Hofmann et al., Ann Stat 36: 1171, 2008 Matthias Rupp: QM/ML Models — Machine learning

8

Kernels Kernels correspond to inner products. If k : X × X → R is symmetric positive semi-definite, then k(x, z) = hφ(x), φ(z)i for some φ : X → H. Inner products encode information about lengths and angles: hx,zi ||x − z||2 = hx, xi − 2 hx, zi + hz, zi , cos θ = ||x|| ||z|| . z ÈÈzÈÈ2 ÈÈx-zÈÈ2

Θ 0

ÈÈ z ÈÈ2 cos Θ ÈÈ x ÈÈ2

ÈÈxÈÈ2

x

• • • • •

Well characterized function class Closure properties Access data only by K ij = k(xi , xj ) X can be any non-empty set Examples: Linear kernel hx, zi 2 Gaussian kernel exp − ||x−z|| 2 2σ

Matthias Rupp: QM/ML Models — Machine learning

9

Examples of kernel functions Linear kernel k(x, z) = hx, zi �

�(���) � �

� �

-�

-�







-�

-�









-�

-�

-�

• Recovers original linear model Matthias Rupp: QM/ML Models — Machine learning

10

Examples of kernel functions   kx − zk2 Gaussian kernel k(x, z) = exp − 2σ 2 �(���)







� � � �

-

� �

-�

-�







• Length scale σ • Infinite dimensional feature space • Universal local approximator

Matthias Rupp: QM/ML Models — Machine learning

-✁





-✁

10

Examples of kernel functions kx − zk1 Laplacian kernel k(x, z) = exp − σ 



�(���)







� � � �

-

� �

-�

-�







-✁





-✁

• Length scale σ Matthias Rupp: QM/ML Models — Machine learning

10

Representer theorem Kernel models have form fˆ(z) =

n X

αi k(xi , z)

i=1

due to the representer theorem:

Any function minimizing a regularized risk functional  n   ` xi , yi , fˆ(xi ) i=1 + g kfˆk admits to above representation.

Sch¨ olkopf, Herbrich & Smola, COLT 2001 Matthias Rupp: QM/ML Models — Machine learning

11

Kernel ridge regression • Regularized form of ordinary regression • Regularization prevents over-fitting by penalizing large coefficients • Use of kernels for non-linearity Solution has form f (x) =

n X

αi k(xi , x)

i=1

Coefficients α are obtained by solving n X i=1

f (xi ) − yi

2

+ λαT K α,

which has solution α = K + λI

−1

Matthias Rupp: QM/ML Models — Machine learning

y. 12

Density functional theory Learning the map from electron density to kinetic energy • Orbital-free DFT • 1D toy system • DFT/LDA as reference

H2 potential

• Error decays to zero • Self-consistent densities • Bond breaking and formation

H2 binding curve

H2 forces

Snyder et al, Phys Rev Lett 108: 253002, 2012. Snyder et al, J Chem Phys 139: 224104, 2013 Matthias Rupp: QM/ML Models — Applications

13

Transition state theory • • • •

Characterization of dividing surfaces Support vector machines No prior information required Iteratively refined by biased sampling along dividing surface

y

(a)

(b)

(c)

(d)

TS1 P2 P1

R TS2 x*

x

saddle points

Pozun et al, J Chem Phys 136: 174101, 2012. Matthias Rupp: QM/ML Models — Applications

14

Gaussian approximation potentials • Representation: Local density Projection to 4d sphere Hyperspherical harmonics Bispectrum

• Gaussian process regression • Molecular dynamics • Partitioned energies

15%

Energy / eV

10

GAP BOP MEAM FS

0

DFT−LDA GAP Brenner Tersoff

6 Energy / eV

50%

5

4

0

2 0

Rhombohedral graphite 0

0.2

0.4 0.6 Reaction coordinate

Diamond 0.8

Transition path energies

1

C11

C12

C44

Elastic const.

Vacancy energy

(100)

(110)

(111)

(112)

Surface energy

Errors on properties of Tungsten

Bart´ ok, Cs´ anyi et al, Phys Rev Lett 104: 136403, 2010. Szlachta et al, Phys Rev B 90: 104108, 2014 Matthias Rupp: QM/ML Models — Applications

15

Molecular properties RMSE

Representation ( 1 2.4 Z M IJ = 2 ZII ZJ kR I −R J k

Model E ML (M) =

Error @kcalmolD

à

Data 7 165 small organic molecules DFT PBE0 atomization energies

à à

20

à à

à

æ

à

à à

æ æ

æ æ

MAE

æ

æ æ æ

10 500

if I = J if I = 6 J

1000

2000 log2HNL

à à à àà à à à àà à ààà à à à à à æ ææ æ æ

æææææ ææææææ æææ

5000

N P

αi k(M i , M)   2 i −Mk k(M i , M) = exp − kM 2σ 2 I =1

α = (K − λI )−1 E QM Rupp et al, Phys Rev Lett 108: 058301, 2012 Matthias Rupp: QM/ML Models — Applications

16

Extension to other properties Learning the map from molecular structure to molecular properties

• • • • •

Various properties Various levels of theory Small organic molecules Coulomb matrix representations Kernel learning, deep neural networks • For 5k training molecules, errors are comparable to the reference

Montavon et al, New J Phys 15: 095003, 2013. Hansen et al, J Chem Theor Comput 9: 3404, 2013 Matthias Rupp: QM/ML Models — Applications

17

∆-learning Learning the error between different levels of theory • Learn corrections to a baseline method furanone, a methylated cyclo hexanedione, and a cyclopentane fused with propiolactone (∆ = reference - baseline)and methylated bridge atom framework. After identification of these isomers, we calculated validating G4MP2 enthalpies (Figure 5). The 1kML model estimates the isomer• Augmenting legacy QM methods ization enthalpy of these products with a maximal error of 0.6 kcal/mol for product 10. The • Puts physics into QM/MLML-model model predictions agree with G4MP2 results calculated a posteriori , and never exceed G4MP2 B3LYP

CCSD(T)

G4MP2 the threshold of chemical accuracy (1 kcal/mol). • Examples: ∆B3LYP PM7 , ∆PM7 , ∆HF



H

H













 



 

     

  



Ramakrishnan, Dral, Rupp, von Lilienfeld, J Chem Theor Comput 11: 2087, 2015.

Figure 5: Calculated reaction enthalpies at 298.15 K between the most stable molecule with

Matthias Rupp: QM/ML Models — CApplications 18 7 H10 O2 stoichiometry (6-oxabicyclooctan-7-one, in inset, with atomization enthalpy -1933

Properties of atoms in molecules

→ • Local interpolation is global extrapolation • Linear scaling due to locality ML 0.5k ML 10k

polymer size / # electrons

ML 1k GDB9

234 ●

3

RMSE / %

10

DFT

102

10

906 13

3 ▲ ■

▲ ■









2 □



1



■ ▲ ■ ○

1

1578



▲ ■ ○

▲ ▲ ■ ○













1s C δ



2250 FC ▲



FH ▲ 7 ■



































○ ● □





5 3

compute time / days

104 #

1

0

50

100 13

150

200

4

C δ / ppm

14

25

35

polymer length / nm

Rupp, Ramakrishnan, von Lilienfeld, arxiv 1505.00350, 2015 Matthias Rupp: QM/ML Models — Applications

19

Overfitting: Model complexity and generalization error Underfitting

Fitting

Overfitting

y 1.2

y 1.2

y 1.2

1.0

1.0

1.0

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.0 0.0

0.5

1.0

1.5

x 2.0

0.0 0.0

0.2 0.5

1.0

1.5

x 2.0

0.0 0.0

0.5

1.0

1.5

0.123 / 0.443

0.044 / 0.068

0.036 / 0.939

λ too large

λ right

λ too small

x 2.0

Rupp, PhD thesis, 2009. Vu, Snyder et al, Int. J. Quant. Chem. 115: 1115, 2015 Matthias Rupp: QM/ML Models — Pitfalls

20

et ts

error

tes

stop

Overfitting: Early stopping rule

training set trainingcomplexity Matthias Rupp: QM/ML Models — Pitfalls

21

Validation Golden rule Training must never use validation data Example 1: overfitting × train on all data, predict all data X split data, train, predict Example 2: centering × center data, split data, train & predict X split data, center training set, train, center test set, predict Example 3: cross-validation with feature selection × feature selection, cross-validation X feature selection for each split of cross-validation Matthias Rupp: QM/ML Models — Pitfalls

22

Reliability of predictions

Predictive variance of Gaussian process regression model Snyder et al, Phys Rev Lett 108: 253002, 2012. Matthias Rupp: QM/ML Models — Pitfalls

23

Gradients

Functional derivative of model as-is and projected on training data Snyder et al, J Chem Phys 139: 224104, 2013. Matthias Rupp: QM/ML Models — Pitfalls

24

Summary

• QM/ML models combine quantum chemistry with machine learning by interpolating between reference QM calculations • The concept is broadly applicable and allows investigation of larger systems, longer timescales, and more systems

Matthias Rupp: QM/ML Models — Summary

25

Tutorial International Journal of Quantum Chemistry 115(16), 2015 Special issue on Quantum Chemistry and Machine Learning

Rupp, Int J Quant Chem 115: 1058, 2015. Matthias Rupp: QM/ML Models — Tutorial

26

Acknowledgements Collaborators O A von Lilienfeld, R Ramakrishnan U Basel M Scheffler, A Tkatchenko, V Gobre FHI K-R M¨ uller, A Ziehe, F Biegler, F Brockherde, G Montavon TU Berlin J C Snyder, K Hansen, S Fazli TU Berlin K Burke, I M Pelaschier, J Huang, L Blooston, L Li UCI G Schneider, G Folkers, M Reutlinger ETHZ G Henkelman, D Sheppard, Z Pozun U Austin F M Boeckler, A Lange, M R Bauer, R Wilcken U T¨ ubingen A Knoll, A Lopez-Bezanilla, A Vazquez-Mayagoitia, P O Dral Institutions • IPAM • SNF (205321-134783, PP00P2 138932) • EU IEF (273039) • sciCORE, U Basel • DFG (FOR1406TP4) • FHI, Max Planck Society Matthias Rupp: QM/ML Models — Acknowledgements

27

www.mrupp.info