The “Wired” Universe of Organic Chemistry
Bartosz Grzybowski Chemistry & Chemical and Biological Engineering Northwestern University Evanston, IL
Chemistry is What? 1783 Lavoisier discovers the law of mass conservation, marking the inception of “modern” chemistry.
TODAY Chemistry has grown and evolved to include ~7 million known/published substances and ~8 million known reactions.
QUESTIONS? • Are there LAWS that govern the structure and evolution of Chemistry en large? • How can we find and apply such laws?
Translating Chemistry into a Network kin=2, kout=1
• Compounds (6,957,807) Nodes • Reactions (7,539,158) Directed Edges
Chemistry is a Network
Chemistry is a LARGE Network Chemical reaction network of organic chemistry
Human metabolic network.
(~ 10 million substances, ~ 6.5 million reactions and counting …)
(1,496 ORFs, 2,004 proteins, 2,766 metabolites, and 3,311 metabolic and transport reactions)
Fialkowski, M., et al., (2005) Angew. Chemie. Intl. Ed. 44(44)
Duarte, N.C. et al., (2007) PNAS 104(6) Patil, K.R. , Tune, P., (2007) BioZoom 2
1835: 176 compounds
1850: 867 compounds
2006: ~107 compounds
Order?
M. Fiałkowski et al. “Architecture and Evolution of Organic Chemistry.” Angew. Chem. Int. Ed. 44, 7263 (2005).
Our Database (ALL published chemical knowledge) kin=2, kout=1
Network Evolution I • Chemistry has “Scale-Free” topology identical to the WWW ρ (k ) = k − v • Chemistry evolves via “Preferential Attachment” growth mechanism
M. Fiałkowski et al. “Architecture and Evolution of Organic Chemistry.” Angew. Chem. Int. Ed. 44, 7263 (2005).
Network Evolution II • “Scale-Free” ρ (k ) = k • Fractal
−v
B.A. Grzybowski et al. “The Wired Universe of Organic Chemsitry” Nature Chemistry 1, 31-36 9 (2009).
Network Evolution III • Preferential Attachment = “The rich get even richer, the poor stay poor”
M. Fiałkowski et al. “Architecture and Evolution of Organic Chemistry.” Angew. Chem. Int. Ed. 44, 7263 (2005).
How It Works Chemistry is a molecular LEGO and some patterns simply must repeat!
O O S O N O O S O
• Part 2: Applications
B.A. Grzybowski et al. “The Wired Universe of Organic Chemsitry” Nature Chemistry 1, 31-36 9 (2009).
Optimizing Chemical Synthesis How to produce this molecule as efficiently as possible?
Desired Product
How to do so for any of >20 million organic molecules?
For an average molecule… In one step…3 different ways to make Product X
2 1
3
Desired Product, X
Possible Substrates
In two steps…48 different ways to make Product X
Desired Product, X
In three steps…7,203 different ways to make Product X
Desired Product, X
This one’s the cheapest…but how to find it?
Desired Product, X
One (1) Product 1,000,000,000,000,000,000
Age of the Universe (sec)
10,000,000,000,000,000 100,000,000,000,000
U.S. Debt
1,000,000,000,000
WWW Pages World Population
10,000,000,000 100,000,000 1,000,000 10,000 100 1 1
2
3
4
Number of Reaction Steps
5
B.A. Grzybowski et al. “The Wired Universe of Organic Chemsitry” Nature Chemistry 1, 31-36 9 (2009).
Ten (10) Products
Number of Reaction Pathways
1.E+180 10180
1 molecule 10 molecules
1.E+160 10160 1.E+140 10140 1.E+120 10120 1.E+100 10100 80 1.E+80 10 60 10 1.E+60 40 10 1.E+40
Size of the Universe (m)
20 10 1.E+20 0 10 1.E+00
1
2
3
4
Number of Reaction Steps
5
Optimal Substrates for a Small Chemical Company (www.prochimia.com)
~200 thiols, disulfides, silanes for SAMs Q: How to optimize substrates for all syntheses simultaneously?
ProChimia’s Optimized Steiner Tree: = Cost (α N rxn + ∑ $ substr ) / N products i
Chemist vs. Machine… From Steiner Tree
Prochimia®
Cl
OH
NaI, acetone
CH3COSH OH
S O
1st step $3.73
I O
K+
MsCl, NEt3 O
S
S O
O
CH3 O
-N O
2nd
step $0.29
1st step $17.77 2nd step $0.99
O N O
NH(Boc)2, K2CO3 CH3COSH
O
O
O
N
S
O
NH2
step $6.19
HCl + MeOH
4th
HCl + MeOH HS
N
S O
O
O
Total cost: $54.08 / 1g product
3rd step $0.46
O
3rd step $43.87
HS
NH2
4th step $6.19
Total cost: $25.41 / 1g product
Chemical Network Analysis for Terrorist Risk Assessment
•The 1995 terrorist attack in the Tokyo subway was carried out with sarin synthesized by cult members using common and unregulated precursors obtained through a network of front companies. -Environmental Health Perspectives (EHP) “ehp.niehs.nih.gov”
= Dangerous Substance = Precursor
Are we well protected? (DHS, DTRA, EPA, OSHA, FBI, CIA, Army…)
Trust DHS – A bad joke (1)? Sarin
Tabun
Ethanol
Isopropyl alcohol
Na+
Methyl acetate
Phosphorous trichloride
Tetramethyl phosphorodiamidic acid chloride
Sodium salt
Tribenzylamine
Trust DHS – A bad joke (2)? PCP
TNT
N2O
Phenylmagnesium bromide
1-methyl piperidine
Carbonic acid
Nitrous oxide
Methylamine
Benzaldehyde
… an something REALLY scary
VX Nerve Gas
terroristSciFinder Selected target’s information
List of minimal sets
MenuBar
dangerous chemicals
GRAPH
Selected element’s information Search measures Selected set’s elements
Software (terroristSciFinder) Software that identifies and ranks minimal sets of reactants/precursors to Chemical Weapons via Network-Topological Measures
Minimal Set 2
MORE DANGEROUS
3
1 = Dangerous Substance = Substance = Reaction
Sorted Sets
less dangerous
Rational Discovery of Chemical Systems
Reactions at the Same Time and Same Place
Network of Chemistry in not (YET) a Chemical System
Why is that important? Discovery of One-Pot/Tandem Reactions the Holy Grail of Modern Chemistry
1. Reduction of Synthetic Costs by ca. 80% 2. Reduction in chemical waste 3. Reduction in time and labor
Algorithmic search for “one-pot” reactions Abstract chemical reaction network Network subset “One-pot” motif
A B
A substance reaction
~ 10 million substances ~ 6.5 million reactions
C B
C
Network search for “one-pot” reactions Functional group compatibility rules
Connectivity criteria
12 13 14 -NH2 -CH3 -RX
A B C
25 -COCl 26 -OH 27 -CHO
1 0 1
0 0 0
0 1 0
1 = incompatible 0 = compatible
IT WORKS! Some examples of “One-Pot” Reactions
ONE-POT
ONE-POT
Rewiring Chemistry Networks of Confirmed Tandem Reactions Pyridine etwork
Quinoline Network
PI3Kδ inhibitor Network
red arrow = two step tandem reaction purple arrow = three step tandem reaction
Rewiring and Optimizing Pharmaceutical Synthesis Inhibitors of PI3K proteins for long-term Asthma treatment PI3Kd Inhibitor Chemical Network
red arrow = two step tandem reaction purple arrow = three step tandem reaction
Summary and Outlook
For the first time in history we can analyze and learn from ALL chemical knowledge CHEMICAL SYSTEMS are the next REVOLUTION in chemistry This business is going to be worth BILLIONS of dollars
The Networks Team
Prof. Kyle Bishop (PSU)
Dr. Christopher Gothard
Prof. Rafal Klajn (Weizmann)
Ms. Nosheen Gothard
Dr. Siowling Soh (Harvard)
Mr. Partick Fuller
Funding: US Department of Energy, US Army
Questions?
M. Fiałkowski et al. “Architecture and Evolution of Organic Chemistry.” Angew. Chem. Int. Ed. 44, 7263 (2005). K.J.M. Bishop et al. “The Core and Most Useful Molecules in Organic Chemistry.” Angew. Chem. Int. Ed. 45, 5348 (2006). B.A. Grzybowski et al. “The Wired Universe of Organic Chemsitry” Nature Chemistry 1, 31-36 9 (2009). B.A. Grzybowski et al. US. Pat. Appl. #US 2010/0225650 A1
The Grzybowski Group Self-Assembly and Chemical Systems Materials & Nano-Assembly
New Nanoscale Phenomena
Dynamic Nano-Catalysis
Chemical Networks & Tandem Reactions
Chemical Systems
Biological Systems
1835
1840 1845
www.dysa.northwestern.edu
Currently under way
Collected ALL molecular structures Have algorithm to decompose molecules into functional groups SOME QUESTIONS:
Which groups “travel” together Which are mutually exclusive (protection chemistries?) What are the “objective” retrosynthetic strategies
The two types of “optimal” chemical companies
χ = average cost of labor/substrates per rxn.
Catalog Essentials for a Large Chemical Company QUESTION: How might a chemical company optimize its product line such that the compounds it sells would allow making a maximal number of other chemicals? ANSWER: Combine knowledge of the core and periphery with Monte Carlo (MC) optimization.
Maximize “Usefullness” U = (Nmax- N(M))/Nmax
The Optimal 300 • The “optimal” set of M = 300 molecules can be used to synthesize 1,200,000 compounds within 7 synthetic steps
O
$26.20 O
HBr
Acetic Acid (glacial) O
$54.20
Br
OH
Thiourea NaOH (aq) O
$62.60
HS
OH
LiAlH4 Et2O
KI3 CHCl3 OH
O
HS
I2
Methano l
S
OH
S
OH
OH S OH S
O
Dots + Arrows = Savings
B.A. Grzybowski et al. “The Wired Universe of Organic Chemsitry” Nature Chemistry 1, 31-36 9 (2009).
Cost vs. Connectivity
Cost ∝ 1 / k
Optimal Substrates for a Small Chemical Company (www.prochimia.com)
~200 thiols, disulfides, silanes for SAMs Q: How to optimize substrates for all syntheses simultaneously?
ProChimia’s Optimized Steiner Tree: = Cost (α N rxn + ∑ ki−1/ 2 ) / N products i α = 0.1 (labor is still cheap in Poland…)
I need to talk to our chemists… From Steiner Tree
Prochimia®
Cl
OH
NaI, acetone
CH3COSH OH
S O
1st step $3.73
I O
K+
MsCl, NEt3 O
S
S O
O
CH3 O
-N O
2nd
step $0.29
1st step $17.77 2nd step $0.99
O N O
NH(Boc)2, K2CO3 CH3COSH
O
O
O
N
S
O
NH2
step $6.19
HCl + MeOH
4th
HCl + MeOH HS
N
S O
O
O
Total cost: $54.08 / 1g product
3rd step $0.46
O
3rd step $43.87
HS
NH2
4th step $6.19
Total cost: $25.41 / 1g product
If ProChimia moves to the US = Cost (α N rxn + ∑ ki−1/ 2 ) / N products i (labor is expensive) α =1