The “Wired” Universe of Organic Chemistry

Bartosz Grzybowski Chemistry & Chemical and Biological Engineering Northwestern University Evanston, IL

Chemistry is What? 1783 Lavoisier discovers the law of mass conservation, marking the inception of “modern” chemistry.

TODAY Chemistry has grown and evolved to include ~7 million known/published substances and ~8 million known reactions.

QUESTIONS? • Are there LAWS that govern the structure and evolution of Chemistry en large? • How can we find and apply such laws?

Translating Chemistry into a Network kin=2, kout=1

• Compounds (6,957,807)  Nodes • Reactions (7,539,158)  Directed Edges

Chemistry is a Network

Chemistry is a LARGE Network Chemical reaction network of organic chemistry

Human metabolic network.

(~ 10 million substances, ~ 6.5 million reactions and counting …)

(1,496 ORFs, 2,004 proteins, 2,766 metabolites, and 3,311 metabolic and transport reactions)

Fialkowski, M., et al., (2005) Angew. Chemie. Intl. Ed. 44(44)

Duarte, N.C. et al., (2007) PNAS 104(6) Patil, K.R. , Tune, P., (2007) BioZoom 2

1835: 176 compounds

1850: 867 compounds

2006: ~107 compounds

Order?

M. Fiałkowski et al. “Architecture and Evolution of Organic Chemistry.” Angew. Chem. Int. Ed. 44, 7263 (2005).

Our Database (ALL published chemical knowledge) kin=2, kout=1

Network Evolution I • Chemistry has “Scale-Free” topology identical to the WWW ρ (k ) = k − v • Chemistry evolves via “Preferential Attachment” growth mechanism

M. Fiałkowski et al. “Architecture and Evolution of Organic Chemistry.” Angew. Chem. Int. Ed. 44, 7263 (2005).

Network Evolution II • “Scale-Free” ρ (k ) = k • Fractal

−v

B.A. Grzybowski et al. “The Wired Universe of Organic Chemsitry” Nature Chemistry 1, 31-36 9 (2009).

Network Evolution III • Preferential Attachment = “The rich get even richer, the poor stay poor”

M. Fiałkowski et al. “Architecture and Evolution of Organic Chemistry.” Angew. Chem. Int. Ed. 44, 7263 (2005).

How It Works Chemistry is a molecular LEGO and some patterns simply must repeat!

O O S O N O O S O

• Part 2: Applications

B.A. Grzybowski et al. “The Wired Universe of Organic Chemsitry” Nature Chemistry 1, 31-36 9 (2009).

Optimizing Chemical Synthesis How to produce this molecule as efficiently as possible?

Desired Product

How to do so for any of >20 million organic molecules?

For an average molecule… In one step…3 different ways to make Product X

2 1

3

Desired Product, X

Possible Substrates

In two steps…48 different ways to make Product X

Desired Product, X

In three steps…7,203 different ways to make Product X

Desired Product, X

This one’s the cheapest…but how to find it?

Desired Product, X

One (1) Product 1,000,000,000,000,000,000

Age of the Universe (sec)

10,000,000,000,000,000 100,000,000,000,000

U.S. Debt

1,000,000,000,000

WWW Pages World Population

10,000,000,000 100,000,000 1,000,000 10,000 100 1 1

2

3

4

Number of Reaction Steps

5

B.A. Grzybowski et al. “The Wired Universe of Organic Chemsitry” Nature Chemistry 1, 31-36 9 (2009).

Ten (10) Products

Number of Reaction Pathways

1.E+180 10180

1 molecule 10 molecules

1.E+160 10160 1.E+140 10140 1.E+120 10120 1.E+100 10100 80 1.E+80 10 60 10 1.E+60 40 10 1.E+40

Size of the Universe (m)

20 10 1.E+20 0 10 1.E+00

1

2

3

4

Number of Reaction Steps

5

Optimal Substrates for a Small Chemical Company (www.prochimia.com)

~200 thiols, disulfides, silanes for SAMs Q: How to optimize substrates for all syntheses simultaneously?

ProChimia’s Optimized Steiner Tree: = Cost (α N rxn + ∑ $ substr ) / N products i

Chemist vs. Machine… From Steiner Tree

Prochimia®

Cl

OH

NaI, acetone

CH3COSH OH

S O

1st step $3.73

I O

K+

MsCl, NEt3 O

S

S O

O

CH3 O

-N O

2nd

step $0.29

1st step $17.77 2nd step $0.99

O N O

NH(Boc)2, K2CO3 CH3COSH

O

O

O

N

S

O

NH2

step $6.19

HCl + MeOH

4th

HCl + MeOH HS

N

S O

O

O

Total cost: $54.08 / 1g product

3rd step $0.46

O

3rd step $43.87

HS

NH2

4th step $6.19

Total cost: $25.41 / 1g product

Chemical Network Analysis for Terrorist Risk Assessment

•The 1995 terrorist attack in the Tokyo subway was carried out with sarin synthesized by cult members using common and unregulated precursors obtained through a network of front companies. -Environmental Health Perspectives (EHP) “ehp.niehs.nih.gov”

= Dangerous Substance = Precursor

Are we well protected? (DHS, DTRA, EPA, OSHA, FBI, CIA, Army…)

Trust DHS – A bad joke (1)? Sarin

Tabun

Ethanol

Isopropyl alcohol

Na+

Methyl acetate

Phosphorous trichloride

Tetramethyl phosphorodiamidic acid chloride

Sodium salt

Tribenzylamine

Trust DHS – A bad joke (2)? PCP

TNT

N2O

Phenylmagnesium bromide

1-methyl piperidine

Carbonic acid

Nitrous oxide

Methylamine

Benzaldehyde

… an something REALLY scary

VX Nerve Gas

terroristSciFinder Selected target’s information

List of minimal sets

MenuBar

dangerous chemicals

GRAPH

Selected element’s information Search measures Selected set’s elements

Software (terroristSciFinder) Software that identifies and ranks minimal sets of reactants/precursors to Chemical Weapons via Network-Topological Measures

Minimal Set 2

MORE DANGEROUS

3

1 = Dangerous Substance = Substance = Reaction

Sorted Sets

less dangerous

Rational Discovery of Chemical Systems

Reactions at the Same Time and Same Place

Network of Chemistry in not (YET) a Chemical System

Why is that important? Discovery of One-Pot/Tandem Reactions the Holy Grail of Modern Chemistry

1. Reduction of Synthetic Costs by ca. 80% 2. Reduction in chemical waste 3. Reduction in time and labor

Algorithmic search for “one-pot” reactions Abstract chemical reaction network Network subset “One-pot” motif

A B

A substance reaction

~ 10 million substances ~ 6.5 million reactions

C B

C

Network search for “one-pot” reactions Functional group compatibility rules

Connectivity criteria

12 13 14 -NH2 -CH3 -RX

A B C

25 -COCl 26 -OH 27 -CHO

1 0 1

0 0 0

0 1 0

1 = incompatible 0 = compatible

IT WORKS! Some examples of “One-Pot” Reactions

ONE-POT

ONE-POT

Rewiring Chemistry Networks of Confirmed Tandem Reactions Pyridine etwork

Quinoline Network

PI3Kδ inhibitor Network

red arrow = two step tandem reaction purple arrow = three step tandem reaction

Rewiring and Optimizing Pharmaceutical Synthesis Inhibitors of PI3K proteins for long-term Asthma treatment PI3Kd Inhibitor Chemical Network

red arrow = two step tandem reaction purple arrow = three step tandem reaction

Summary and Outlook   

For the first time in history we can analyze and learn from ALL chemical knowledge CHEMICAL SYSTEMS are the next REVOLUTION in chemistry This business is going to be worth BILLIONS of dollars

The Networks Team

Prof. Kyle Bishop (PSU)

Dr. Christopher Gothard

Prof. Rafal Klajn (Weizmann)

Ms. Nosheen Gothard

Dr. Siowling Soh (Harvard)

Mr. Partick Fuller

Funding: US Department of Energy, US Army

Questions?

M. Fiałkowski et al. “Architecture and Evolution of Organic Chemistry.” Angew. Chem. Int. Ed. 44, 7263 (2005). K.J.M. Bishop et al. “The Core and Most Useful Molecules in Organic Chemistry.” Angew. Chem. Int. Ed. 45, 5348 (2006). B.A. Grzybowski et al. “The Wired Universe of Organic Chemsitry” Nature Chemistry 1, 31-36 9 (2009). B.A. Grzybowski et al. US. Pat. Appl. #US 2010/0225650 A1

The Grzybowski Group Self-Assembly and Chemical Systems Materials & Nano-Assembly

New Nanoscale Phenomena

Dynamic Nano-Catalysis

Chemical Networks & Tandem Reactions

Chemical Systems

Biological Systems

1835

1840 1845

www.dysa.northwestern.edu

Currently under way  

Collected ALL molecular structures Have algorithm to decompose molecules into functional groups SOME QUESTIONS:

  

Which groups “travel” together Which are mutually exclusive (protection chemistries?) What are the “objective” retrosynthetic strategies

The two types of “optimal” chemical companies

χ = average cost of labor/substrates per rxn.

Catalog Essentials for a Large Chemical Company QUESTION: How might a chemical company optimize its product line such that the compounds it sells would allow making a maximal number of other chemicals? ANSWER: Combine knowledge of the core and periphery with Monte Carlo (MC) optimization.

Maximize “Usefullness” U = (Nmax- N(M))/Nmax

The Optimal 300 • The “optimal” set of M = 300 molecules can be used to synthesize 1,200,000 compounds within 7 synthetic steps

O

$26.20 O

HBr

Acetic Acid (glacial) O

$54.20

Br

OH

Thiourea NaOH (aq) O

$62.60

HS

OH

LiAlH4 Et2O

KI3 CHCl3 OH

O

HS

I2

Methano l

S

OH

S

OH

OH S OH S

O

Dots + Arrows = Savings

B.A. Grzybowski et al. “The Wired Universe of Organic Chemsitry” Nature Chemistry 1, 31-36 9 (2009).

Cost vs. Connectivity

Cost ∝ 1 / k

Optimal Substrates for a Small Chemical Company (www.prochimia.com)

~200 thiols, disulfides, silanes for SAMs Q: How to optimize substrates for all syntheses simultaneously?

ProChimia’s Optimized Steiner Tree: = Cost (α N rxn + ∑ ki−1/ 2 ) / N products i α = 0.1 (labor is still cheap in Poland…)

I need to talk to our chemists… From Steiner Tree

Prochimia®

Cl

OH

NaI, acetone

CH3COSH OH

S O

1st step $3.73

I O

K+

MsCl, NEt3 O

S

S O

O

CH3 O

-N O

2nd

step $0.29

1st step $17.77 2nd step $0.99

O N O

NH(Boc)2, K2CO3 CH3COSH

O

O

O

N

S

O

NH2

step $6.19

HCl + MeOH

4th

HCl + MeOH HS

N

S O

O

O

Total cost: $54.08 / 1g product

3rd step $0.46

O

3rd step $43.87

HS

NH2

4th step $6.19

Total cost: $25.41 / 1g product

If ProChimia moves to the US = Cost (α N rxn + ∑ ki−1/ 2 ) / N products i (labor is expensive) α =1