Combinatorics on words in DNA computing

’. Starosta

17. & 20. 5.2011

1 / 29

Outline

Outline

1

DNA computing Introduction DNA molecule The rst DNA computation Operations in DNA computing DNA computing revisited What to avoid in a test tube

2

Combinatorics on Words Setting in CoW Results More results If there is some time left...

2 / 29

DNA computing

Combinatorics on Words

Outline

1

DNA computing Introduction DNA molecule The rst DNA computation Operations in DNA computing DNA computing revisited What to avoid in a test tube

2

Combinatorics on Words Setting in CoW Results More results If there is some time left...

3 / 29

DNA computing

Combinatorics on Words

Introduction

barriers for traditional computers 1

HUP

2

von Neumann bottleneck

DNA/biomolecular computers: 1994 Leonard Adleman

4 / 29

DNA computing

Combinatorics on Words

Introduction

barriers for traditional computers 1

HUP

2

von Neumann bottleneck

DNA/biomolecular computers: 1994 Leonard Adleman

4 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

DNA - basics

deoxyribonucleic acid 1950s: DNA carries genetic information (double helix model) structure: polymer chains - strands consisting of bases attached to sugar phosphate backbone 4 bases: A, G, T, C Watson-Crick complementarity: A likes T, C likes G bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per inch

5 / 29

DNA computing

Combinatorics on Words

Structure of DNA

6 / 29

DNA computing

Combinatorics on Words

Watson-Crick complementarity and WK-palindromes

7 / 29

DNA computing

Combinatorics on Words

Hamiltonian Path Problem Leonard Adleman in 1994: Hamiltonian Path Problem (NP-complete) on a graph with 7 vertices

8 / 29

DNA computing

Combinatorics on Words

Adleman's DNA computation I

9 / 29

DNA computing

Combinatorics on Words

Adleman's DNA computation I

9 / 29

DNA computing

Combinatorics on Words

Adleman's DNA computation II

24 Earth masses of DNA

200-node instance would require 10

10 / 29

DNA computing

Combinatorics on Words

Adleman's DNA computation II

24 Earth masses of DNA

200-node instance would require 10

10 / 29

DNA computing

Combinatorics on Words

The likely frame

11 / 29

DNA computing

Combinatorics on Words

Manipulation with DNA

synthesis denaturing, annealing and ligation separation - anity purication detect gel electrophoresis PCR - polymerase chain reaction cutting (using restriction enzymes)

12 / 29

DNA computing

Combinatorics on Words

Very general model of a DNA calculation

1. select the language - code 2. do the computation 3. decode

13 / 29

DNA computing

Combinatorics on Words

Advantages and drawbacks Advantages: Size: the information density could go up to 1 bit per cube nm

9 calculations per ml of DNA per second

High parallelism: 10

Energy eciency: 1019 operations per Joule

Drawbacks: required mass of DNA reagents need of manipulation by a human ...

14 / 29

DNA computing

Combinatorics on Words

Advantages and drawbacks Advantages: Size: the information density could go up to 1 bit per cube nm

9 calculations per ml of DNA per second

High parallelism: 10

Energy eciency: 1019 operations per Joule

Drawbacks: required mass of DNA reagents need of manipulation by a human ...

14 / 29

DNA computing

Combinatorics on Words

What problems can be solved by a DNA computer

Problems solved that can be found in literature: TSP - travelling salesman problem addition SAT - satisability problem DES cracking maximal clique problem ...

15 / 29

DNA computing

Combinatorics on Words

Intramolecular hybridization - hairpins

16 / 29

DNA computing

Combinatorics on Words

Intermolecular hybridization

17 / 29

DNA computing

Combinatorics on Words

Outline

1

DNA computing Introduction DNA molecule The rst DNA computation Operations in DNA computing DNA computing revisited What to avoid in a test tube

2

Combinatorics on Words Setting in CoW Results More results If there is some time left...

18 / 29

DNA computing

Combinatorics on Words

Representing DNA strands

0

DNA strands are considered in their 5 words over

→ 30

orientation as nite

∆ = { A, G , C , T }

involutive antimorphism

WK: A ↔ T , C ↔ G

an encoding (i.e. initial set of words) for a computation is a language L

⊂ ∆∗

19 / 29

DNA computing

Combinatorics on Words

Representing DNA strands

0

DNA strands are considered in their 5 words over

→ 30

orientation as nite

∆ = { A, G , C , T }

involutive antimorphism

WK: A ↔ T , C ↔ G

an encoding (i.e. initial set of words) for a computation is a language L

⊂ ∆∗

19 / 29

DNA computing

Combinatorics on Words

Representing DNA strands

0

DNA strands are considered in their 5 words over

→ 30

orientation as nite

∆ = { A, G , C , T }

involutive antimorphism

WK: A ↔ T , C ↔ G

an encoding (i.e. initial set of words) for a computation is a language L

⊂ ∆∗

19 / 29

DNA computing

Combinatorics on Words

Representing DNA strands

0

DNA strands are considered in their 5 words over

→ 30

orientation as nite

∆ = { A, G , C , T }

involutive antimorphism

WK: A ↔ T , C ↔ G

an encoding (i.e. initial set of words) for a computation is a language L

⊂ ∆∗

19 / 29

DNA computing

Combinatorics on Words

Questions

How to choose the initial coding set L? 1

simulation

2

algorithm

3

theory

Bio-operations vs. L: after each bio-operation the language changes - do the properties hold?

20 / 29

DNA computing

Combinatorics on Words

Questions

How to choose the initial coding set L? 1

simulation

2

algorithm

3

theory

Bio-operations vs. L: after each bio-operation the language changes - do the properties hold?

20 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Combinatorics on Words

Constraints on

L

to avoid possible hybridizations, we impose several constraints on L

L is

Θ-k -m-subword code if ∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅ k

L is

Θ-k -code

i

if

L is bond-free if

∀u , v ∈ A ∩ L, u 6= Θ(v ) k

∀u , v ∈ A ∩ L, H (u , Θ(v )) > d , k

where H is

the Hamming distance frequency of C and G in each word should be

1 2

...

21 / 29

DNA computing

Constraints on

Combinatorics on Words

L

...

22 / 29

DNA computing

Combinatorics on Words

Conjugates and

Θ-conjugates

Proposition (Lyndon and Schützenberger, 1962)

∈ A∗ such that uv = vw . Then there exist p , q ∈ A+ u = pq, w = qp and v = p (qp ) for i ≥ 0.

Let u , v , w such that

i

Proposition (Kari and Mahalingam, 2007)

Let u , v , w

∈ A+

such that uv

morphism or antimorphism 1

where

Θ

If Θ is an antimorphism, then there exist x , y either u u

2

= Θ(v )w , over A.

If u

is an involutive

∈ A∗

such that

= xy , w = y Θ(x ) and v = Θ(x ); or = x = Θ(w ), y = Θ(y ) and v = y .

Θ is a morphism, then there exist x , y ∈ A∗ = xy and one of the following hold: w

= y Θ(x )

and v

w

= Θ(y )x

and v

such that

i

 = Θ(xy )xy Θ(x ) for some i ≥ 0;  = Θ(xy )xy Θ(xy )x for some i ≥ 0. i

23 / 29

DNA computing

Combinatorics on Words

Conjugates and

Θ-conjugates

Proposition (Lyndon and Schützenberger, 1962)

∈ A∗ such that uv = vw . Then there exist p , q ∈ A+ u = pq, w = qp and v = p (qp ) for i ≥ 0.

Let u , v , w such that

i

Proposition (Kari and Mahalingam, 2007)

Let u , v , w

∈ A+

such that uv

morphism or antimorphism 1

where

Θ

If Θ is an antimorphism, then there exist x , y either u u

2

= Θ(v )w , over A.

If u

is an involutive

∈ A∗

such that

= xy , w = y Θ(x ) and v = Θ(x ); or = x = Θ(w ), y = Θ(y ) and v = y .

Θ is a morphism, then there exist x , y ∈ A∗ = xy and one of the following hold: w

= y Θ(x )

and v

w

= Θ(y )x

and v

such that

i

 = Θ(xy )xy Θ(x ) for some i ≥ 0;  = Θ(xy )xy Θ(xy )x for some i ≥ 0. i

23 / 29

DNA computing

Combinatorics on Words

Commutativity and

Θ-commutativity

Proposition (Lyndon and Schützenberger, 1962)

∈ A∗ such that = p and w = p

Let u , v that u

i

j

= vu. Then there exist p ∈ A+ such for i , j > 0 (i.e. it has cyclic solution).

uv

Proposition (Kari and Mahalingam, 2007)

Let u , v

∈ A+

such that uv

= Θ(v )u, A.

where

Θ

is an involutive

morphism or antimorphism over 1

Θ is an antimorphism, then there exist x , y ∈ A∗ such that u = x (yx ) , v = (yx ) , x = Θ(x ), y = Θ(y ), m ≥ 1 and n ≥ 0. If Θ is a morphism, then there exists x ∈ A+ such that: If

n

2

x u

m

i

j

= Θ(x ), u = x and v = x for = x (Θ(x )x ) and v = (x Θ(x )) i

j

, ≥ 1; ≥0

some i j

for some i

and j

≥ 1. 24 / 29

DNA computing

Combinatorics on Words

Commutativity and

Θ-commutativity

Proposition (Lyndon and Schützenberger, 1962)

∈ A∗ such that = p and w = p

Let u , v that u

i

j

= vu. Then there exist p ∈ A+ such for i , j > 0 (i.e. it has cyclic solution).

uv

Proposition (Kari and Mahalingam, 2007)

Let u , v

∈ A+

such that uv

= Θ(v )u, A.

where

Θ

is an involutive

morphism or antimorphism over 1

Θ is an antimorphism, then there exist x , y ∈ A∗ such that u = x (yx ) , v = (yx ) , x = Θ(x ), y = Θ(y ), m ≥ 1 and n ≥ 0. If Θ is a morphism, then there exists x ∈ A+ such that: If

n

2

x u

m

i

j

= Θ(x ), u = x and v = x for = x (Θ(x )x ) and v = (x Θ(x )) i

j

, ≥ 1; ≥0

some i j

for some i

and j

≥ 1. 24 / 29

DNA computing

Set of

Combinatorics on Words

Θ-palindromes

Proposition

The set of all

Θ-palindromes (Θ

being an involutive antimorphism)

is not regular. Lemma (Pumping lemma for regular languages)

Let L be a regular language. Then there exists an integer p depending only on L such that every w be written as w 1

|y | ≥ 1,

2

|xy | ≥ p,

3

for all i

= xyz

≥ 0,

xy i z

∈L

≥1

of length at least p can

satisfying the following conditions:

∈ L.

Proposition

The set of all

Θ-palindromes

is context-free. 25 / 29

DNA computing

Set of

Combinatorics on Words

Θ-palindromes

Proposition

The set of all

Θ-palindromes (Θ

being an involutive antimorphism)

is not regular. Lemma (Pumping lemma for regular languages)

Let L be a regular language. Then there exists an integer p depending only on L such that every w be written as w 1

|y | ≥ 1,

2

|xy | ≥ p,

3

for all i

= xyz

≥ 0,

xy i z

∈L

≥1

of length at least p can

satisfying the following conditions:

∈ L.

Proposition

The set of all

Θ-palindromes

is context-free. 25 / 29

DNA computing

Set of

Combinatorics on Words

Θ-palindromes

Proposition

The set of all

Θ-palindromes (Θ

being an involutive antimorphism)

is not regular. Lemma (Pumping lemma for regular languages)

Let L be a regular language. Then there exists an integer p depending only on L such that every w be written as w 1

|y | ≥ 1,

2

|xy | ≥ p,

3

for all i

= xyz

≥ 0,

xy i z

∈L

≥1

of length at least p can

satisfying the following conditions:

∈ L.

Proposition

The set of all

Θ-palindromes

is context-free. 25 / 29

DNA computing

Combinatorics on Words

Untitled frame

Proposition

Let

be either an involutive morphism or antimorphism and let

Θ

∈ A+ such that wu = Θ(u )v and w Θ(u ) = uv . Then w = (xy ) , y = (yx ) and u = (xy ) x for some x , y ∈ A∗ , x = Θ(x ), y = Θ(y ), m ≥ 1, n ≥ 0. w, v, u

m

m

n

Proposition

Let u

∈A

be a non-empty word such that u

6= Θ(u ).

Then the

following statements are equivalent 1 2



u is the product of two non-empty

There exists a non-empty

Θ-commutes 3

Θ-palindromes.

Θ-palindrome

v such that

with u.

u is a product of two non-empty

Θ-palindromes. 26 / 29

DNA computing

Combinatorics on Words

Untitled frame

Proposition

Let

be either an involutive morphism or antimorphism and let

Θ

∈ A+ such that wu = Θ(u )v and w Θ(u ) = uv . Then w = (xy ) , y = (yx ) and u = (xy ) x for some x , y ∈ A∗ , x = Θ(x ), y = Θ(y ), m ≥ 1, n ≥ 0. w, v, u

m

m

n

Proposition

Let u

∈A

be a non-empty word such that u

6= Θ(u ).

Then the

following statements are equivalent 1 2



u is the product of two non-empty

There exists a non-empty

Θ-commutes 3

Θ-palindromes.

Θ-palindrome

v such that

with u.

u is a product of two non-empty

Θ-palindromes. 26 / 29

DNA computing

Combinatorics on Words

Second untitled frame

Proposition

Let

Θ

be either a morphic or an antimorphic involution and let

be such that for all a

∈ Σ,

a

Σ

6= Θ(a).

(...)

27 / 29

DNA computing

Combinatorics on Words

References

Lila Kari

28 / 29

Thank you.

29 / 29