Reaction simulation expert systems for synthetic organic chemistry

Reaction simulation expert systems for synthetic organic chemistry Jonathan H. Chen and Pierre Baldi University of California, Irvine School of Inform...
Author: Jean Rice
9 downloads 0 Views 1MB Size
Reaction simulation expert systems for synthetic organic chemistry Jonathan H. Chen and Pierre Baldi University of California, Irvine School of Information and Computer Sciences Institute for Genomics and Bioinformatics School of Medicine

http://cdb.ics.uci.edu

Reaction Prediction  

Given a mixture of reactants and reaction conditions, predict the major products NaOMe +

 

 

Δ

?

Fundamental problem-solving skill of expert human chemists Critical for applications such as retro-synthesis design and reaction discovery

Need for Reproducible Expertise Automated suggestion of synthetic reactions by pattern matching is straightforward, but “expertise” is knowing which suggestions are actually feasible and reasonable Pd(0)

Mg

CO (gas)

albuterol fenofibrate DCC

KMnO4

buproprion

atorvastatin

Transformation Rules  

   

Chemical state machine modeling at mechanistic level of detail State information: Molecular structure State transition: Transformation rules π-bond protic acid addition

carbocation halide addition

SMIRKS

Description

[C:1]=[C:2].[H:3][Cl,Br,I:4]>>[+0:3][C:1][C+:2].[Cl,Br,I;-:4]

Alkene, Protic Acid Addition

[C+:1].[Cl,Br,I;-:2]>>[C+0:1][+0:2]

Carbocation, Halide Addition

Reaction Explorer [DEMO]  

Product prediction for different reactions but using a common reagent      

 

 

Sn2 CC1(Oc2cc(c3cc[nH]c3c2O1)CBr)C Nucleophilic Acylation c1c2c(c(cn1)Cl)CCOC2=O Robinson Annulation C[C@H]1c2c(cccn2)CCC1=O

Mechanistic detail explanation of how or why products created Use as synthesis workspace  

Tylenol #94

Results and Progress Expert system with over •  80 reagent models •  1,500 reaction rules •  4,500 validation examples

J. Chem. Educ. 2008, 85, 1699

Subject Categories Implemented

•  Substitution and Elimination of Alkyl Halides •  Alcohols and Epoxides •  Alkenes, Electrophilic Addition •  Alkynes, Addition and Acetylide Ions •  Alkanes, Radical Reactions •  Dienes, Conjugation, Diels-Alder •  Electrophilic Aromatic Substitution •  Reactions of Substituted Benzenes •  Oxidation-Reduction Reactions •  Aldehydes and Ketones •  Carboxylic Acid Derivatives •  Enolate Chemistry •  Aldol Chemistry •  Amines and Arenediazonium Reactions •  Transition Metal (Palladium) Catalysis •  SnAr and Benzyne Reactions •  Naphthalene and Heteroaromatic Reactions •  Pericyclic Reactions •  Carbohydrates •  Amino Acid and Peptide Synthesis

Principle-driven Simulations Principle-Driven Simulations   Not based on transformation rules   Driven by principles of physical chemistry Key Components   Core Reaction Unit Model   Scoring Function for Reactions   Chemical Kinetics Simulation Relative Energy

nN  π*C-O

nO  σ*C-Cl

σ* π* p n π σ

ΔG‡ ΔG Reaction Coordinate

Core Reaction Unit Model    

 

Bond-rearrangement patterns are most typical choice. These only represent the overall “symptom” of the reaction and not the underlying mechanistic steps. Many such patterns must be “memorized” to get decent coverage.

Sn2 Substitution

[CX4H2:1][Br:2]>>[C:1]O

Robinson Annulation

Acyl Substitution (Saponification)

[O:2]=[C:1][OH0:3]>>[O:2]=[C:1][O-].[O-:3]

[*:3][C:2]1[C:11][C:10][C:9][C:8][C:1]1=[O:20].[C:5][C:4](=[O:12])[C:6]=[C:7]>> [*:3][C:2]12[C:11][C:10][C:9][C:8][C:1]1=[C:5][C:4](=[O:12])[C:6][C:7]2

Core Reaction Unit Model Molecular Orbital Interactions as Elementary Reaction Steps

More Favorable The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Less Favorable The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

σ* π*

nCl > pC

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

nI > pC

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

p n

πC=C > σ*H-Br The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

πC=C > σ*H-O

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

π σ

σH-B > π*C=O (ketone)

σH-B > π*C=O (amide)

Scoring Function for Reactions  

Purpose  

 

Statistical Machine Learning    

 

Limited quantitative data available Inspiration from the problemsolving abilities of human experts Use qualitative knowledge of reactivity trends as a major training data source

Relative Energy

 

Identify favorable reaction steps Ideally predicts transition state activation energies (ΔG‡)

ΔG‡ ΔG Reaction Coordinate

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

C. A. Azencott, M. A. Kayala, P. Baldi, “Learning Scoring Functions for Chemical Expert Systems”

Chemical Kinetics Simulation Law of Mass Action Simulation  

Results depend on reactivity scores and concentrations   Reversible reactions driven by Le Chatelier’s principle

Eyring-Evans-Polanyi Equation

 

 

 

 

Catalytic quantities of highly reactive species

Discrete simulation approximation to bootstrap off incomplete information

Principled conversion of ΔG‡ to reaction rate constant k with temperature dependence Theory only applies for elementary reaction steps

Reaction Simulator [DEMO]  

Simple enolate deprotonation  

 

 

No other input but starting materials, selfperception of reactive sites and combinations Kinetic vs. thermodynamic simulator controls

Complex example evolving over time  

Trace full reaction pathway to justify the prediction, including energy diagram

Summary Comparison Transformation Rules

General Principles

Immediately useful Development and results optimization ongoing Predictions within Longer simulation times seconds (minutes) Only covers what has Greater potential for been programmed into it generality and discovery Only provides Kinetics simulations information on major provide information on product(s) major and minor pathways

Acknowledgements                  

NIH/NLM Biomedical Informatics Training Grant UCI Medical Scientist Training Program http://cdb.ics.uci.edu Orange County ARCS® Foundation Prof. Pierre Baldi (ICS) Students Prof. Elizabeth Jarvo (Chem)   Chloe Azencott Dr. Susan King (Chem)   Matt Kayala Prof. Greg Weiss (Chem)   Paul Rigor Prof. David Van Vranken (Chem)   UCI Students Prof. James Nowick (Chem) Course Instructors   Prof. Suzanne Blum   Prof. Zhibin Guan   Prof. Larry Overman   Prof. Ken Shea   Dr. Mare Taagepera   Prof. Chris Vanderwal

Academic Software   OpenEye Software   ChemAxon Software   Peter Ertl, Novartis (JME Editor)

Extend Orbital Chaining  

 

Interactions between a small set of fundamental orbital types dominate organic reactivity Higher order interactions can be composed by chaining fundamental units together

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

nO > πC=C > π*C=O

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

nN > σ*H-C > σ*C-Br

Need for Reactivity Prediction  

   

Retro-synthetic analysis usually only suggests precursors, but does not account for unintended reactivity Existing systems may use exclusion rules Best to reproduce forward sequence of suggested reactions to ensure reliability

Motivation The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have computer, been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may haveyour to delete the and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. image and then insert it again.

 

Total Synthesis of important drugs and chemicals

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Morphine Pain Medication Opium Poppies

 

 

Taxol Anti-cancer Yew Tree Sap

Penicillin G Antibiotic Fungus

Chemical Modification to optimize lead compounds

Goal / Hypothesis:

Andrimid Anti-Tuberculosis Lead Compound

Can a computer expert system reproduce the core problem-solving capabilities needed of human chemists?

SMILES Extensions  

Atom Mapping    

Necessary to map reactant to product atoms Proper transform requires balanced stoichiometry  

Hydrogens generally must be explicitly specified

O1

8

+ H

2 9

R1

3

OH

4 5

O

10

NH-R2

7

Carboxylic acid + Primary amine  Amide + Water

1 7,8 3

+ H2O

2 9

R1

4 5

10

NH-R2

[O:1]=[C:2]([*:9])[O:3][H:7]. [H:8][N:4]([*:10])[H:5]>> [O:1]=[C:2]([*:9])[N:4]([*:10])[H:5]. [H:7][O:3][H:8]

Molecular Orbital List

Filled •  sp2 O •  π CO •  … Unfilled •  π* CO •  σ* HC  π* CO •  …

Filled •  sp3 O •  σ CO •  … Unfilled •  σ* HO •  σ* CO •  …

Filled •  sp3 O •  sp2 O •  … Unfilled •  π* SO •  σ* HO  π* SO •  …

Outline  

Motivation  

 

Rules-based Predictor Capabilities    

 

 

 

 

Qualitative Knowledge vs. Quantitative Data

Simulations  

 

Chaining: Retain simple set of fundamental orbitals, then just compose for higher order

Scoring Interactions  

 

Complex example evolving over time Kinetic vs. thermodynamic example to illustrate simulation controls

Fundamental Reaction Unit Model  

 

Predictive general reagents (NaOH), with mechanism explanations Synthesis workspace (tylenol #94)

Principle-based Functional Demo Intent  

 

Need for reactivity prediction

Chemical Kinetics Discrete model for bootstrapping from incomplete starting information

Rules vs. Principles Ongoing Work  

Parameter development for more reactivity classes