Reaction simulation expert systems for synthetic organic chemistry Jonathan H. Chen and Pierre Baldi University of California, Irvine School of Information and Computer Sciences Institute for Genomics and Bioinformatics School of Medicine
http://cdb.ics.uci.edu
Reaction Prediction
Given a mixture of reactants and reaction conditions, predict the major products NaOMe +
Δ
?
Fundamental problem-solving skill of expert human chemists Critical for applications such as retro-synthesis design and reaction discovery
Need for Reproducible Expertise Automated suggestion of synthetic reactions by pattern matching is straightforward, but “expertise” is knowing which suggestions are actually feasible and reasonable Pd(0)
Mg
CO (gas)
albuterol fenofibrate DCC
KMnO4
buproprion
atorvastatin
Transformation Rules
Chemical state machine modeling at mechanistic level of detail State information: Molecular structure State transition: Transformation rules π-bond protic acid addition
carbocation halide addition
SMIRKS
Description
[C:1]=[C:2].[H:3][Cl,Br,I:4]>>[+0:3][C:1][C+:2].[Cl,Br,I;-:4]
Alkene, Protic Acid Addition
[C+:1].[Cl,Br,I;-:2]>>[C+0:1][+0:2]
Carbocation, Halide Addition
Reaction Explorer [DEMO]
Product prediction for different reactions but using a common reagent
Sn2 CC1(Oc2cc(c3cc[nH]c3c2O1)CBr)C Nucleophilic Acylation c1c2c(c(cn1)Cl)CCOC2=O Robinson Annulation C[C@H]1c2c(cccn2)CCC1=O
Mechanistic detail explanation of how or why products created Use as synthesis workspace
Tylenol #94
Results and Progress Expert system with over • 80 reagent models • 1,500 reaction rules • 4,500 validation examples
J. Chem. Educ. 2008, 85, 1699
Subject Categories Implemented
• Substitution and Elimination of Alkyl Halides • Alcohols and Epoxides • Alkenes, Electrophilic Addition • Alkynes, Addition and Acetylide Ions • Alkanes, Radical Reactions • Dienes, Conjugation, Diels-Alder • Electrophilic Aromatic Substitution • Reactions of Substituted Benzenes • Oxidation-Reduction Reactions • Aldehydes and Ketones • Carboxylic Acid Derivatives • Enolate Chemistry • Aldol Chemistry • Amines and Arenediazonium Reactions • Transition Metal (Palladium) Catalysis • SnAr and Benzyne Reactions • Naphthalene and Heteroaromatic Reactions • Pericyclic Reactions • Carbohydrates • Amino Acid and Peptide Synthesis
Principle-driven Simulations Principle-Driven Simulations Not based on transformation rules Driven by principles of physical chemistry Key Components Core Reaction Unit Model Scoring Function for Reactions Chemical Kinetics Simulation Relative Energy
nN π*C-O
nO σ*C-Cl
σ* π* p n π σ
ΔG‡ ΔG Reaction Coordinate
Core Reaction Unit Model
Bond-rearrangement patterns are most typical choice. These only represent the overall “symptom” of the reaction and not the underlying mechanistic steps. Many such patterns must be “memorized” to get decent coverage.
Sn2 Substitution
[CX4H2:1][Br:2]>>[C:1]O
Robinson Annulation
Acyl Substitution (Saponification)
[O:2]=[C:1][OH0:3]>>[O:2]=[C:1][O-].[O-:3]
[*:3][C:2]1[C:11][C:10][C:9][C:8][C:1]1=[O:20].[C:5][C:4](=[O:12])[C:6]=[C:7]>> [*:3][C:2]12[C:11][C:10][C:9][C:8][C:1]1=[C:5][C:4](=[O:12])[C:6][C:7]2
Core Reaction Unit Model Molecular Orbital Interactions as Elementary Reaction Steps
More Favorable The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Less Favorable The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
σ* π*
nCl > pC
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
nI > pC
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
p n
πC=C > σ*H-Br The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
πC=C > σ*H-O
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
π σ
σH-B > π*C=O (ketone)
σH-B > π*C=O (amide)
Scoring Function for Reactions
Purpose
Statistical Machine Learning
Limited quantitative data available Inspiration from the problemsolving abilities of human experts Use qualitative knowledge of reactivity trends as a major training data source
Relative Energy
Identify favorable reaction steps Ideally predicts transition state activation energies (ΔG‡)
ΔG‡ ΔG Reaction Coordinate
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
C. A. Azencott, M. A. Kayala, P. Baldi, “Learning Scoring Functions for Chemical Expert Systems”
Chemical Kinetics Simulation Law of Mass Action Simulation
Results depend on reactivity scores and concentrations Reversible reactions driven by Le Chatelier’s principle
Eyring-Evans-Polanyi Equation
Catalytic quantities of highly reactive species
Discrete simulation approximation to bootstrap off incomplete information
Principled conversion of ΔG‡ to reaction rate constant k with temperature dependence Theory only applies for elementary reaction steps
Reaction Simulator [DEMO]
Simple enolate deprotonation
No other input but starting materials, selfperception of reactive sites and combinations Kinetic vs. thermodynamic simulator controls
Complex example evolving over time
Trace full reaction pathway to justify the prediction, including energy diagram
Summary Comparison Transformation Rules
General Principles
Immediately useful Development and results optimization ongoing Predictions within Longer simulation times seconds (minutes) Only covers what has Greater potential for been programmed into it generality and discovery Only provides Kinetics simulations information on major provide information on product(s) major and minor pathways
Acknowledgements
NIH/NLM Biomedical Informatics Training Grant UCI Medical Scientist Training Program http://cdb.ics.uci.edu Orange County ARCS® Foundation Prof. Pierre Baldi (ICS) Students Prof. Elizabeth Jarvo (Chem) Chloe Azencott Dr. Susan King (Chem) Matt Kayala Prof. Greg Weiss (Chem) Paul Rigor Prof. David Van Vranken (Chem) UCI Students Prof. James Nowick (Chem) Course Instructors Prof. Suzanne Blum Prof. Zhibin Guan Prof. Larry Overman Prof. Ken Shea Dr. Mare Taagepera Prof. Chris Vanderwal
Academic Software OpenEye Software ChemAxon Software Peter Ertl, Novartis (JME Editor)
Extend Orbital Chaining
Interactions between a small set of fundamental orbital types dominate organic reactivity Higher order interactions can be composed by chaining fundamental units together
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
nO > πC=C > π*C=O
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
nN > σ*H-C > σ*C-Br
Need for Reactivity Prediction
Retro-synthetic analysis usually only suggests precursors, but does not account for unintended reactivity Existing systems may use exclusion rules Best to reproduce forward sequence of suggested reactions to ensure reliability
Motivation The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have computer, been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may haveyour to delete the and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. image and then insert it again.
Total Synthesis of important drugs and chemicals
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Morphine Pain Medication Opium Poppies
Taxol Anti-cancer Yew Tree Sap
Penicillin G Antibiotic Fungus
Chemical Modification to optimize lead compounds
Goal / Hypothesis:
Andrimid Anti-Tuberculosis Lead Compound
Can a computer expert system reproduce the core problem-solving capabilities needed of human chemists?
SMILES Extensions
Atom Mapping
Necessary to map reactant to product atoms Proper transform requires balanced stoichiometry
Hydrogens generally must be explicitly specified
O1
8
+ H
2 9
R1
3
OH
4 5
O
10
NH-R2
7
Carboxylic acid + Primary amine Amide + Water
1 7,8 3
+ H2O
2 9
R1
4 5
10
NH-R2
[O:1]=[C:2]([*:9])[O:3][H:7]. [H:8][N:4]([*:10])[H:5]>> [O:1]=[C:2]([*:9])[N:4]([*:10])[H:5]. [H:7][O:3][H:8]
Molecular Orbital List
Filled • sp2 O • π CO • … Unfilled • π* CO • σ* HC π* CO • …
Filled • sp3 O • σ CO • … Unfilled • σ* HO • σ* CO • …
Filled • sp3 O • sp2 O • … Unfilled • π* SO • σ* HO π* SO • …
Outline
Motivation
Rules-based Predictor Capabilities
Qualitative Knowledge vs. Quantitative Data
Simulations
Chaining: Retain simple set of fundamental orbitals, then just compose for higher order
Scoring Interactions
Complex example evolving over time Kinetic vs. thermodynamic example to illustrate simulation controls
Fundamental Reaction Unit Model
Predictive general reagents (NaOH), with mechanism explanations Synthesis workspace (tylenol #94)
Principle-based Functional Demo Intent
Need for reactivity prediction
Chemical Kinetics Discrete model for bootstrapping from incomplete starting information
Rules vs. Principles Ongoing Work
Parameter development for more reactivity classes