Combinatorics on words in DNA computing

Combinatorics on words in DNA computing . Starosta 17. & 20. 5.2011 1 / 29 Outline Outline 1 DNA computing Introduction DNA molecule The rst...

Author: Annice Moore

2 downloads 2 Views 1MB Size

Report

Download PDF

Recommend Documents

License. Combinatorics

INFINITE COMBINATORICS

TOPICS IN ALGEBRAIC COMBINATORICS. Richard P. Stanley

AN ANALYTIC METHOD IN PROBABILISTIC COMBINATORICS

COMBINATORICS MT454 MARK WILDON

MT5821 Advanced Combinatorics

Combinatorics at KTH

MAP363 Combinatorics Answers 1

Survey on Topological Methods in Distributed Computing

Combinatorics. Peter Petersen

Enumerative Combinatorics. Shawn Qian

Combinatorics of the Sonnet

Geometry and Combinatorics

MT5821 Advanced Combinatorics

Combinatorics. Example 1

Adleman (1) introduced DNA-based computing as an approach

Computing exact p-values for DNA motifs (Part I)

MGM 503 Combinatorics [Kombinatorik]

COMBINATORICS- Lecture: 1

PART II COMBINATORICS

Key words. stochastic programming, multistage, mixed-integer, bounding, parallel computing

Words on Fitness Walking

Scientist Proves DNA Can Be Reprogrammed by Words and Frequencies

Combinatorics on words in DNA computing

. Starosta

17. & 20. 5.2011

1 / 29

Outline

Outline

1

DNA computing Introduction DNA molecule The rst DNA computation Operations in DNA computing DNA computing revisited What to avoid in a test tube

2

Combinatorics on Words Setting in CoW Results More results If there is some time left...

2 / 29

DNA computing

Combinatorics on Words

Outline

1

DNA computing Introduction DNA molecule The rst DNA computation Operations in DNA computing DNA computing revisited What to avoid in a test tube

2

Combinatorics on Words Setting in CoW Results More results If there is some time left...

3 / 29

DNA computing

Combinatorics on Words

Introduction

barriers for traditional computers 1

HUP

2

von Neumann bottleneck

DNA/biomolecular computers: 1994 Leonard Adleman

4 / 29

DNA computing

Combinatorics on Words

Introduction

barriers for traditional computers 1

HUP

2

von Neumann bottleneck

DNA/biomolecular computers: 1994 Leonard Adleman

4 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

Structure of DNA

6 / 29

DNA computing

Combinatorics on Words

Watson-Crick complementarity and WK-palindromes

7 / 29

DNA computing

Combinatorics on Words

Hamiltonian Path Problem Leonard Adleman in 1994: Hamiltonian Path Problem (NP-complete) on a graph with 7 vertices

8 / 29

DNA computing

Combinatorics on Words

Adleman's DNA computation I

9 / 29

DNA computing

Combinatorics on Words

Adleman's DNA computation I

9 / 29

DNA computing

Combinatorics on Words

Adleman's DNA computation II

24 Earth masses of DNA

200-node instance would require 10

10 / 29

DNA computing

Combinatorics on Words

Adleman's DNA computation II

24 Earth masses of DNA

200-node instance would require 10

10 / 29

DNA computing

Combinatorics on Words

The likely frame

11 / 29

DNA computing

Combinatorics on Words

Manipulation with DNA

synthesis denaturing, annealing and ligation separation - anity purication detect gel electrophoresis PCR - polymerase chain reaction cutting (using restriction enzymes)

12 / 29

DNA computing

Combinatorics on Words

Very general model of a DNA calculation

1. select the language - code 2. do the computation 3. decode

13 / 29

DNA computing

Combinatorics on Words

Advantages and drawbacks Advantages: Size: the information density could go up to 1 bit per cube nm

9 calculations per ml of DNA per second

High parallelism: 10

Energy eciency: 1019 operations per Joule

Drawbacks: required mass of DNA reagents need of manipulation by a human ...

14 / 29

DNA computing

Combinatorics on Words

Advantages and drawbacks Advantages: Size: the information density could go up to 1 bit per cube nm

9 calculations per ml of DNA per second

High parallelism: 10

Energy eciency: 1019 operations per Joule

Drawbacks: required mass of DNA reagents need of manipulation by a human ...

14 / 29

DNA computing

Combinatorics on Words

What problems can be solved by a DNA computer

Problems solved that can be found in literature: TSP - travelling salesman problem addition SAT - satisability problem DES cracking maximal clique problem ...

15 / 29

DNA computing

Combinatorics on Words

Intramolecular hybridization - hairpins

16 / 29

DNA computing

Combinatorics on Words

Intermolecular hybridization

17 / 29

DNA computing

Combinatorics on Words

Outline

1

DNA computing Introduction DNA molecule The rst DNA computation Operations in DNA computing DNA computing revisited What to avoid in a test tube

2

Combinatorics on Words Setting in CoW Results More results If there is some time left...

18 / 29

DNA computing

Combinatorics on Words

Representing DNA strands

0

DNA strands are considered in their 5 words over

→ 30

orientation as nite

∆ = { A, G , C , T }

involutive antimorphism

WK: A ↔ T , C ↔ G

an encoding (i.e. initial set of words) for a computation is a language L

⊂ ∆∗

19 / 29

DNA computing

Combinatorics on Words

Representing DNA strands

0

DNA strands are considered in their 5 words over

→ 30

orientation as nite

∆ = { A, G , C , T }

involutive antimorphism

WK: A ↔ T , C ↔ G

an encoding (i.e. initial set of words) for a computation is a language L

⊂ ∆∗

19 / 29

DNA computing

Combinatorics on Words

Representing DNA strands

0

DNA strands are considered in their 5 words over

→ 30

orientation as nite

∆ = { A, G , C , T }

involutive antimorphism

WK: A ↔ T , C ↔ G

an encoding (i.e. initial set of words) for a computation is a language L

⊂ ∆∗

19 / 29

DNA computing

Combinatorics on Words

Representing DNA strands

0

DNA strands are considered in their 5 words over

→ 30

orientation as nite

∆ = { A, G , C , T }

involutive antimorphism

WK: A ↔ T , C ↔ G

an encoding (i.e. initial set of words) for a computation is a language L

⊂ ∆∗

19 / 29

DNA computing

Combinatorics on Words

Questions

How to choose the initial coding set L? 1

simulation

2

algorithm

3

theory

Bio-operations vs. L: after each bio-operation the language changes - do the properties hold?

20 / 29

DNA computing

Combinatorics on Words

Questions

How to choose the initial coding set L? 1

simulation

2

algorithm

3

theory

Bio-operations vs. L: after each bio-operation the language changes - do the properties hold?

20 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Constraints on

Combinatorics on Words

L

...

22 / 29

DNA computing

Combinatorics on Words

Conjugates and

Θ-conjugates

Proposition (Lyndon and Schützenberger, 1962)

∈ A∗ such that uv = vw . Then there exist p , q ∈ A+ u = pq, w = qp and v = p (qp ) for i ≥ 0.

Let u , v , w such that

i

Proposition (Kari and Mahalingam, 2007)

Let u , v , w

∈ A+

such that uv

morphism or antimorphism 1

where

Θ

If Θ is an antimorphism, then there exist x , y either u u

2

= Θ(v )w , over A.

If u

is an involutive

∈ A∗

such that

= xy , w = y Θ(x ) and v = Θ(x ); or = x = Θ(w ), y = Θ(y ) and v = y .

Θ is a morphism, then there exist x , y ∈ A∗ = xy and one of the following hold: w

= y Θ(x )

and v

w

= Θ(y )x

and v

such that

i

= Θ(xy )xy Θ(x ) for some i ≥ 0; = Θ(xy )xy Θ(xy )x for some i ≥ 0. i

23 / 29

DNA computing

Combinatorics on Words

Conjugates and

Θ-conjugates

Proposition (Lyndon and Schützenberger, 1962)

∈ A∗ such that uv = vw . Then there exist p , q ∈ A+ u = pq, w = qp and v = p (qp ) for i ≥ 0.

Let u , v , w such that

i

Proposition (Kari and Mahalingam, 2007)

Let u , v , w

∈ A+

such that uv

morphism or antimorphism 1

where

Θ

If Θ is an antimorphism, then there exist x , y either u u

2

= Θ(v )w , over A.

If u

is an involutive

∈ A∗

such that

= xy , w = y Θ(x ) and v = Θ(x ); or = x = Θ(w ), y = Θ(y ) and v = y .

Θ is a morphism, then there exist x , y ∈ A∗ = xy and one of the following hold: w

= y Θ(x )

and v

w

= Θ(y )x

and v

such that

i

= Θ(xy )xy Θ(x ) for some i ≥ 0; = Θ(xy )xy Θ(xy )x for some i ≥ 0. i

23 / 29

DNA computing

Combinatorics on Words

Commutativity and

Θ-commutativity

Proposition (Lyndon and Schützenberger, 1962)

∈ A∗ such that = p and w = p

Let u , v that u

i

j

= vu. Then there exist p ∈ A+ such for i , j > 0 (i.e. it has cyclic solution).

uv

Proposition (Kari and Mahalingam, 2007)

Let u , v

∈ A+

such that uv

= Θ(v )u, A.

where

Θ

is an involutive

morphism or antimorphism over 1

Θ is an antimorphism, then there exist x , y ∈ A∗ such that u = x (yx ) , v = (yx ) , x = Θ(x ), y = Θ(y ), m ≥ 1 and n ≥ 0. If Θ is a morphism, then there exists x ∈ A+ such that: If

n

2

x u

m

i

j

= Θ(x ), u = x and v = x for = x (Θ(x )x ) and v = (x Θ(x )) i

j

, ≥ 1; ≥0

some i j

for some i

and j

≥ 1. 24 / 29

DNA computing

Combinatorics on Words

Commutativity and

Θ-commutativity

Proposition (Lyndon and Schützenberger, 1962)

∈ A∗ such that = p and w = p

Let u , v that u

i

j

= vu. Then there exist p ∈ A+ such for i , j > 0 (i.e. it has cyclic solution).

uv

Proposition (Kari and Mahalingam, 2007)

Let u , v

∈ A+

such that uv

= Θ(v )u, A.

where

Θ

is an involutive

morphism or antimorphism over 1

Θ is an antimorphism, then there exist x , y ∈ A∗ such that u = x (yx ) , v = (yx ) , x = Θ(x ), y = Θ(y ), m ≥ 1 and n ≥ 0. If Θ is a morphism, then there exists x ∈ A+ such that: If

n

2

x u

m

i

j

= Θ(x ), u = x and v = x for = x (Θ(x )x ) and v = (x Θ(x )) i

j

, ≥ 1; ≥0

some i j

for some i

and j

≥ 1. 24 / 29

DNA computing

Set of

Combinatorics on Words

Θ-palindromes

Proposition

The set of all

Θ-palindromes (Θ

being an involutive antimorphism)

is not regular. Lemma (Pumping lemma for regular languages)

Let L be a regular language. Then there exists an integer p depending only on L such that every w be written as w 1

|y | ≥ 1,

2

|xy | ≥ p,

3

for all i

= xyz

≥ 0,

xy i z

∈L

≥1

of length at least p can

satisfying the following conditions:

∈ L.

Proposition

The set of all

Θ-palindromes

is context-free. 25 / 29

DNA computing

Set of

Combinatorics on Words

Θ-palindromes

Proposition

The set of all

Θ-palindromes (Θ

being an involutive antimorphism)

is not regular. Lemma (Pumping lemma for regular languages)

Let L be a regular language. Then there exists an integer p depending only on L such that every w be written as w 1

|y | ≥ 1,

2

|xy | ≥ p,

3

for all i

= xyz

≥ 0,

xy i z

∈L

≥1

of length at least p can

satisfying the following conditions:

∈ L.

Proposition

The set of all

Θ-palindromes

is context-free. 25 / 29

DNA computing

Set of

Combinatorics on Words

Θ-palindromes

Proposition

The set of all

Θ-palindromes (Θ

being an involutive antimorphism)

is not regular. Lemma (Pumping lemma for regular languages)

Let L be a regular language. Then there exists an integer p depending only on L such that every w be written as w 1

|y | ≥ 1,

2

|xy | ≥ p,

3

for all i

= xyz

≥ 0,

xy i z

∈L

≥1

of length at least p can

satisfying the following conditions:

∈ L.

Proposition

The set of all

Θ-palindromes

is context-free. 25 / 29

DNA computing

Combinatorics on Words

Untitled frame

Proposition

Let

be either an involutive morphism or antimorphism and let

Θ

∈ A+ such that wu = Θ(u )v and w Θ(u ) = uv . Then w = (xy ) , y = (yx ) and u = (xy ) x for some x , y ∈ A∗ , x = Θ(x ), y = Θ(y ), m ≥ 1, n ≥ 0. w, v, u

m

m

n

Proposition

Let u

∈A

be a non-empty word such that u

6= Θ(u ).

Then the

following statements are equivalent 1 2

√

u is the product of two non-empty

There exists a non-empty

Θ-commutes 3

Θ-palindromes.

Θ-palindrome

v such that

with u.

u is a product of two non-empty

Θ-palindromes. 26 / 29

DNA computing

Combinatorics on Words

Untitled frame

Proposition

Let

be either an involutive morphism or antimorphism and let

Θ

∈ A+ such that wu = Θ(u )v and w Θ(u ) = uv . Then w = (xy ) , y = (yx ) and u = (xy ) x for some x , y ∈ A∗ , x = Θ(x ), y = Θ(y ), m ≥ 1, n ≥ 0. w, v, u

m

m

n

Proposition

Let u

∈A

be a non-empty word such that u

6= Θ(u ).

Then the

following statements are equivalent 1 2

√

u is the product of two non-empty

There exists a non-empty

Θ-commutes 3

Θ-palindromes.

Θ-palindrome

v such that

with u.

u is a product of two non-empty

Θ-palindromes. 26 / 29

DNA computing

Combinatorics on Words

Second untitled frame

Proposition

Let

Θ

be either a morphic or an antimorphic involution and let

be such that for all a

∈ Σ,

a

Σ

6= Θ(a).

(...)

27 / 29

DNA computing

Combinatorics on Words

References

Lila Kari

28 / 29

Thank you.

29 / 29