A Sticker-Based Model for DNA Computation

A Sticker-Based Model for DNA Computation Roweis, S., Winfree, E., Burgoyne, R., Chelyapov, N. V., Goodman, M. F., Rothemund, P. W. K., & Adleman, L. ...
Author: Arlene Reed
22 downloads 0 Views 5MB Size
A Sticker-Based Model for DNA Computation Roweis, S., Winfree, E., Burgoyne, R., Chelyapov, N. V., Goodman, M. F., Rothemund, P. W. K., & Adleman, L. M. (1998). A Sticker-Based Model for DNA Computation. Journal of Computational Biology, 5(4), 615–629. doi:10.1089/cmb. 1998.5.615

Slides by Reem Mokhtar

Sticker(s) Model N KM memory complexes = bit string: l 

l 

Memory strand (N bases)

1/on = memory strand region with hybridized sticker strand

0/off = memory strand region w/out sticker Sticker strands (M bases * K regions) strand

Roweis, S., Winfree, E., Burgoyne, R., Chelyapov, N. V., Goodman, M. F., Rothemund, P. W. K., & Adleman, L. M. (1998). A Sticker-Based Model for DNA Computation. Journal of Computational Biology, 5(4), 615–629. doi:10.1089/ cmb.1998.5.615

Each memory strand along with its annealed stickers (if any) represents one bit string. Such partial duplexes are called memory complexes. A large set of bit strings is represented by a large number of identical memory strands each of which has stickers annealed only at the required bit positions. We call such a collection of memory complexes a tube. This differs from previous representations of information using DNA in which the presence or absence of a particular subsequence in a strand corresponded to a particular bit being on or off (e.g., see Adleman, 1994; Lipton, 1995). In this new model, each possible bit string is represented by a unique association of memory strands and stickers whereas previously each bit string was represented by a unique molecule. To give a feel for the numbers involved, a reasonable size problem (for example, breaking DES as discussed in Adleman et al, 1999), might use memory strands of roughly 12,000 bases (N), which represent 580 binary variables (K) using 20 base regions (M). The information density in this storage scheme is (l/M) bits/base, directly comparable to the density of previous schemes (Adleman, 1994; Boneh et al, 1996; Lipton, 1995). We remark that while information storage in DNA has a theoretical maximum value of 2 bits/base, exploiting such high values in a separation-based molecular computer would require the ability to reliably separate strands using only single base mismatches.

Sticker(s) Model

bit...

bit ¡+2

bit I

|T FIG. 1.

complex

bit...

(up to bit K)

C A T A

0

A memory strand and associated stickers (together called a memory complex) represent a bit the left has all three bits off; the bottom complex has two annealed stickers and thus two bits

on

string. The top on.

Roweis, S., Winfree, E., Burgoyne, R., Chelyapov, N. V., Goodman, M. F., Rothemund, P. W. K., & Adleman, L. M. (1998). A Sticker-Based Model for DNA Computation. Journal of Computational Biology, 5(4), 615–629. doi:10.1089/ cmb.1998.5.615

Sticker(s) Model T A G C CA G T A T C G T G A A T C GG T C A T A G CA C T

T A G C C AAGG TT AA TT C G T G A

Combine

Combine

T A G C CA G T A T C G T G A A T C GG T C A T A G CA C T G AT GA T T A G C AC T A T C G T G A

A C T CA A AT TC CGGGGT TCCAA TT AA GG C A G T A T

G CC TA C T A TG A CA A TACTGCG GT GC TA CT A A G T A T T A G C C

A T C GG T C A T A G CA C T

A TG A CA A TA CTGCG GTGC TA CT A GC C TA C T G A T A T A G C C

T A G C C

T

A T C GG T C A T A G CA C T

T A G C C

T A A TG A CA A TACTGCG GT GC TA C G CC TA C T

A G T A T

A G T A T

A C T CA A AT TC CGGGGT TCCAA TT AA GG C

A T C GG T C A T A G CA C T

A G T A T A T C GG T C A T A G CA C T

T A G C CA G T A T C G T G A A T C GG T C A T A G CA C T A G T A T C G T G A

T A G C CA G T A T C G T G A

A T C GG T C A T A G CA C T A G C C A TT C G G T C A T A G C A C T

Separate on Bit 1

T A G C CA G T A T C G T G A A T C GG T C A T A G CA C T

Separate on Bit 1

T A TGAC GC C C A G T A T C G T G A A G CA C T A T C G G GT GC TA CT A T A G CA C T A T C

A G T A T C G T G A

A T C GG T C A T A G CA C T A G T A T A T C GG T C A T A G CA C T

T A G C C G TT C AG A A T CT GA TGGCAA C T A T C G

T A G C CG T C A T A G C A C T A T C G

A T C GG T C A T A G CA C T A G T A T

A T C GG T C A T A G CA C T

A T C GG T C A T A G CA C T

A T C GG T C A T A G CA C T

A G T A T

A G T A T C G T G A A T C GG T C A T A G CA C T A G T A T A T C GG T C A T A G CA C T

Roweis, S., Winfree, E.,A Burgoyne, R., Chelyapov, N.Set V., Goodman, M. F., Rothemund, P. W. K., & Adleman, L. M. Bit 3 G C CA G T A T C G T G A T T A G C CA G T A T C G T G A (1998). A Sticker-Based Model for DNA Computation. Journal of Computational Biology, 5(4), 615–629. doi:10.1089/ A T C GG T C A T A G CA C T A T C GG T C A T A G CA C T cmb.1998.5.615 C G T G A A C T A T CCGCGATGC TA AT TA CGGC T G A A G

Set Bit 3

T C A T A G CA C T A T C GGC CA G T A T C G T G A A G

T A G C C A T C GG T C A T A G CA C T T A G C C

A T C GG T C A T A G CA C T

A T C GG T C A T A G CA C T A G T A T A G T A T C G T G A

Sticker(s) Model

A T C GG T C A T A G CA C T

A T C GG T C A T A G CA C T A G T A T

A T C GG T C A T A G CA C T

T A G C CA G T A T C G T G A

Set Bit 3

T A G C CA G T A T C G T G A

A T C GG T C A T A G CA C T

A T C GG T C A T A G CA C T C G T G A

A T C GG T C A T A G CA C T T A G C C

A T C GG T C A T A G CA C T C G T G A T A G C C

A T C GG T C A T A G CA C T A G T A T

A T C GG T C A T A G CA C T A G T A T C G T G A

A T C GG T C A T A G CA C T

A T C GG T C A T A G CA C T

T A G C CA G T A T C G T G A

Clear Bit 1

A G T A T C G T G A

A T C GG T C A T A G CA C T A G T A T

A T C GG T C A T A G CA C T A G T A T

A T C GG T C A T A G CA C T T A G C C

A T C GG T C A T A G CA C T

A T C GG T C A T A G CA C T C G T G A T A G C C

A T C GG T C A T A G CA C T C G T G A

A T C GG T C A T A G CA C T

A T C GG T C A T A G CA C T

Roweis, S., Winfree, E., Burgoyne, R., Chelyapov, N. V., Goodman, M. F., Rothemund, P. W. K., & Adleman, L. M. (1998). A Sticker-Based Model for DNA Computation. Journal of Computational Biology, 5(4), 615–629. doi:10.1089/ cmb.1998.5.615

Graphical/Data •  Purpose: -  Ease of interaction and design -  Aid in validating designs

•  Representations might include -  GUI input -  Rendered results -  Back-end data structures

Input string (upper or lower case); @ to qui .........

I .........

2 .........

3 .........

4 .....

OUGGAGUACACAACCUGUACACUCUUUC length = 28

Dot Bracket Notation

UUGGAGUACACAACCUGUACACUCUUUC

..(((((..(((...)))..)))))..,

minimum free energy = - 3 . 7 1 ..((((([[( ..... )))..)))))... free energy

of ensemble

= -4.39

Hofacker et al. 0.3372 frequency of mfe s tI.r uL, ctu r e in ensemble a

172 tram> R | A f o l d -T 42 -pl Input string (upper or lower case); @ to quit .........

I .........

2 .........

3 .........

4 .........

5 .........

6 .........

7

.........

OUGGAGUACACAACCUGUACACUCUUUC length = 28 UUGGAGUACACAACCUGUACACUCUUUC

..(((((..(((...)))..)))))..,

minimum free energy = - 3 . 7 1 ..((((([[( ..... )))..)))))... free energy

of ensemble

b

= -4.39

frequency of mfe structure in ensemble 0.337231 a

U

U

G

G

A

G

U

A

C

A C

A

vl i .'~ i i i i i i i i j, satisfy: 50

ik < i j + s j + o j , and ik + sk + ok ≥ i j + 2s j + o j 58 61 29

24

64 21

68 17

78

82 97

8

3 2

103 3' 1 0 5'

Figure 3: 104 base RNA segment with two stem-loops, five internal loops and one three-way multi-branch [2]. The ′ ISO representation, starting from the marked 5 end, is [(2, 5, 92), (8, 7, 57), (17, 3, 46), (21, 2, 40), (24, 3, 32), (29, 7, 16), (40, 4, 3), (82, 6, 4)].

The first constraint places the opening bases of feature (i, s, o)k within the unpaired open region of feature (i, s, o) j . The second constraint places the corresponding closing bases of (i, s, o)k outside feature (i, s, o) j , hence the features are crossing. We see an example in Figure 2, with three areas of intercalated bindings. First, P1 (0,7,23) and P2 (9,7,50) qualify, since 9 < 0 + 7 + 23 and 9 + 7 + 50 ≥ 0 + 14 + 23. Second, P3 (16,3,8) and P1.1 (20,2,15) qualify, since 20 < 16 + 3 + 8 and 20 + 2 + 15 ≥ 16 + 6 + 8. Finally P1 (0,7,23) and P1.1 (20,2,15) qualify, since 20 < 0 + 7 + 23 and 20 + 2 + 15 ≥ 0 + 14 + 23. This final binding intercalation is not usually described in the literature as a pseudoknot, however the bindings are distinct and not well-nested. An additional filter equation for recursive pseudoknots can be written in similar fashion to check for structure within the opening of a pseudoknot.

4. ALTERNATIVE REPRESENTATIONS 2. Internal loops. An internal loop is formed by unpaired open regions surrounded by exactly two stems, where at least one unpaired base must occur on both sides. An internal loop is recognized within the list of triples where features (i, s, o) j and (i, s, o) j+1 satisfy:

The earliest non-tabular secondary structure representation, the dot-parenthesis (dot-bracket) notation, was introduced in 1984 [5]. Thermodynamic modeling programs such as Vienna [4], Mfold [6], and Nupack [13], all use the basic dot-parenthesis notation as ei-

Languages/Grammar • 

Purpose: – 

• 

Represent reaction/ interaction sequence/ network

Representations might include: - 

Regular Grammars

- 

Context-free Grammars

- 

Graph Grammar

- 

etc

Languages Alphabet: Any finite set of symbols (∑) Example:

26 letters of english alphabet: String: A finite-length (n>=0) sequence of symbols (like words) over an alphabet ∑ Example: a, ab, abc and bba are strings defined over ∑2

1)  Sipser, M. (2006). Introduction to the Theory of Computation (Vol. 27). Thomson Course Technology Boston, MA. 2)  Lecture Notes on Regular Languages and Finite Automata

Languages ∑* is the set of all strings over an alphabet ∑ of ANY finite length. Null (empty) string → ε Example: If ∑ = {a,b} then: ∑* = {ε, a, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, ….}

1)  Sipser, M. (2006). Introduction to the Theory of Computation (Vol. 27). Thomson Course Technology Boston, MA. 2)  Lecture Notes on Regular Languages and Finite Automata

Languages Language: A language is a set of strings all of which choose their symbols from some one alphabet. --> words in a dictionary, not grammatical rules.

Problem: Is this string in this language?

1)  Sipser, M. (2006). Introduction to the Theory of Computation (Vol. 27). Thomson Course Technology Boston, MA. 2)  Lecture Notes on Regular Languages and Finite Automata

Languages Finite Automata: Involves: 1)  states 2)  transitions among states in response to: 3)  inputs

Useful for building: compilers, verification systems (ex: circuits, protocols)

1)  Sipser, M. (2006). Introduction to the Theory of Computation (Vol. 27). Thomson Course Technology Boston, MA. 2)  Lecture Notes on Regular Languages and Finite Automata 3)  Images: Wikipedia. Retrieved from: http://en.wikipedia.org/wiki/Finite-state_machine

Languages

Retrieved from Prat, A. M., Lecture Notes on Regular Languages and Finite Automata (pp. 11)

Languages Regular Expressions: Structural notation for describing a pattern that can be represented by a finite automata Some RE symbols: ( ) | * If A, B are languages: l 

(AB) → concatenation of enclosed languages A and B (A U B)

l 

A | B → A or B

l 

A* → 0 or more of A

1)  Sipser, M. (2006). Introduction to the Theory of Computation (Vol. 27). Thomson Course Technology Boston, MA. 2)  Lecture Notes on Regular Languages and Finite Automata

Languages A language L over an alphabet ∑ is just a set of strings ∑*. Thus any subset

determines a language over ∑ A language determined by a reg. exp. r over ∑ is

Example: L((01)*) → represents the language L which includes all the strings of alternating 0's and 1's that begin with 0 and end with 1.

1)  Sipser, M. (2006). Introduction to the Theory of Computation (Vol. 27). Thomson Course Technology Boston, MA. 2)  Lecture Notes on Regular Languages and Finite Automata

Languages Regular language: A language that can be expressed using a regular expression.

Example: M is a regular language given by regular expression r = 1*(0(1*)0(1*))*

Some cannot be represented Example: Set of strings {anbn | n>=0) cannot be expressed using a finite automaton Retrieved from Wikipedia: http://en.wikipedia.org/wiki/Deterministic_finite_automaton

DFA Deterministic Finite Automata (DFA): Involves: 1)  Finite set of states (Q) 2)  Finite set of Input symbols (∑) 3)  Transition function: transition_function (a state, input symbol) → another state OR 4)  Start State (a state from Q) 5)  A set of final/accepting states F s.t.

1)  Sipser, M. (2006). Introduction to the Theory of Computation (Vol. 27). Thomson Course Technology Boston, MA. 2)  Lecture Notes on Regular Languages and Finite Automata

DFA DFA: M with a binary alphabet, which requires that the input contains an even number of 0s. The state diagram for M M = (Q, Σ, δ, q0, F) where Q = {S1, S2}, Σ = {0, 1}, q0 = S1, F = {S1}, and

Retrieved from Wikipedia: http://en.wikipedia.org/wiki/Deterministic_finite_automaton

Instead of regular expressions... We will use production rules (which do the same thing as REs): A production rule is a:

1)  'rewrite rule': rule of the form A → X, where: A 'label' X 'sequence of labels or semantic units' 2)  which recursively substitutes a non-terminal 3)  with a string of terminals and/or nonterminals terminal: 1)  Sipser, M. (2006). Introduction to the Theory of Computation (Vol. 27). Thomson Technology Boston, 1) literal that actually occurs in the language being Course expressed. MA. 2)  Wikipedia: Context-free Retrieved from: http://en.wikipedia.org/wiki/Context-free_grammar 2) cannotgrammar. be rewritten/broken down further 3)  Wikipedia: Terminal and non-terminal symbols. 3) cannot be changed using rules of the grammar Retrieved from: http://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols

Chomsky Hierarchy

sky hierarchy

Grammar

Languages

Automaton

Type-0

Recursively enumerable Turing machine

Type-1

Context-sensitive

Linear-bounded non-deterministic Turing machine

Type-2

Context-free

Non-deterministic pushdown automaton

Type-3

Regular

Finite state automaton

Production rules (constraints) (no restrictions)

and

However, there are further categories of formal languages, some of which are given in the following table:

References

1] Chomsky, Noam (1956). "Three models for the description of language" (http:/ / www. chomsky. info/ articles/ 195609--. pdf). IRE Transactions on Information Theory (2): 113–124. doi:10.1109/TIT.1956.1056813. .

Chomsky, Noam (1959). "On certain formal properties of grammars" (http://www.diku.dk/hjemmesider/ ansatte/henglein/papers/chomsky1959.pdf). Information and Control 2 (2): 137–167. doi:10.1016/S0019-9958(59)90362-6.

Context-Free Languages Languages that can be represented by context-free grammars What are context-free grammars? Formal grammar: l 

with a set of production rules

l 

describes how to generate strings from the language's alphabet

l 

with a starting non-terminal, or head

Example: Context-free grammar: A → aB | a | ε B → bAS S→c The following strings can be generated by these production rules: abc, aabc, abac, ababc, …

Context-Free Grammars

Image from: Pratt, A. M., Lecture Notes on Regular Languages and Finite Automata, (pp 47)

Some other stuff Chomsky grammars? Pumping Lemma?

What does this have to do with DNA Self-assembly? Generative grammars closely correspond to self-assembly and ligation. Example: DNA-based computing: DNA strands → set of strings over an alphabet String over the set {A, C, G, T} if you read 5' → 3' ...can use specific notations to deal with: l 

Complementarity (ACGTGCG' = CGCAGCT)

l 

Circular strands (prefix symbol ◦)

Use a code book s.t. each symbol in ∑ is represented by an N-base sequence.

Winfree, E., Eng, T., & Rozenberg, G. (2001). String tile models for DNA computing by self-assembly. In A. Condon & G. Rozenberg (Eds.), DNA Computing, Lecture Notes in Computer Science (Vol. 2054, pp. 63–88). Springer Berlin / Heidelberg. Retrieved from http://www.springerlink.com/content/m6p32bcpldkghx4f/abstract/

Winfree, E., Eng, T., & Rozenberg, G. (2001). String tile models for DNA computing by self-assembly. In A. Condon & G. Rozenberg (Eds.), DNA Computing, Lecture Notes in Computer Science (Vol. 2054, pp. 63–88). Springer Berlin / Heidelberg. Retrieved from http://www.springerlink.com/content/m6p32bcpldkghx4f/abstract/

What are Graph Grammars? A graph grammar, or transformation, is: Rule based modification of graphs Core of a rule (or production), p = (L, R) Find a match of L in the source graph and replace L with R to ge tto the target graph of the graph transformation. H. Ehrig, K. Ehrig, U. Prange, and G. Taentzer. 2006. Fundamentals of Algebraic Graph Transformation (Monographs in Theoretical Computer Science. an EATCS Series). Springer-Verlag New York, Inc., Secaucus, NJ, USA.

Graph Grammars A graph grammar, or transformation is: A set of rules or production. A production p = (L, R) is a pair of graphs (L, R) called LHS L and RHS R. Applying the rule p=(L, R) means finding a match of L in the source graph (G) and replacing L by R à Leading to a target graph (H)

tual description to visual modeling.

notion of graph transformation to comprise the conce nd graph rewriting. In any case, the main idea of gra Graph Grammars e rule-based modification of graphs, as shown in Fig.

p = (L, R)



✁❆ L



R

Fig. 1.1. Rule-based modification of graphs

of a rule or production, p = (L, R) is a pair of graphs ( d side L and the right-hand side R. Applying the rul

Graph Grammars A graph transformation from G ⇒H, usually contains the following steps: 1. 

Choose a production rule

2. 

Check application conditions

3. 

Remove from G the parts in L that are not in R to get D

4. 

Glue R to D at the part of L that still has an image in D. Add R/L to D to get E (if an additional embedding exists, embed further) and end with H.

ther issues addressed in the literature of the algebrai omputation based semantics, application conditions, s Graphand Grammars onsistency conditions, generalization of the algebr tructures. Direct derivation of a production rule: consider production rule p as a finite, OVERVIEW THE ALGE schematic representation of a3.2. potentially infinite set of directOF derivations.

Graphs, Productions and Derivations If a match m fixes an occurrence of L respectively, in a given graph G, then denotes sche side, as G=>H a finite, the direct derivation where p is applied to G leading to the derived graph H. The basic idea of all graph transformation of direct derivations. Ifapproaches the m a t c h ion p : L -+ R, whereG graphs andHRdenotes are called , then GL % thethe dire to a derived graph H . Intuitively of L in G by R . The essential q transformation approaches are:

3.2.1

1. What is a “graph”?

have to find an occurrence of the left-hand side L1 in G I , c following. A match m : L --+ G for a production p is a grap mapping nodes and edges of L to G , in such a way that the and the labels are preserved. The match ml : L1 --+ G1 of th (1) maps each element of L1 to the element of G I carrying A match m for a production rule p is a graph homomorphism, mapping nodes Applying p l to structures graph and G1 labels at match ml we ha and edges of Lproduction in to G s.t. graphical are preserved object from G1 which matches an element of L1 that ha element in R1, i.e., vertex 2 of G1 is deleted. Symmetrically, element of R1 that has no corresponding element in L1, i.e. All remaining elements of G1 are preserved, thus, roughly s CHAPTER PUSHOUT APPROACH 170 graph H I is constructed as G3.IDOUBLE - ( L IR1) U (R1 -LI).

Graph Grammars

J 3

"I

Figure 3.1: Direct derivations.

edges are also deleted. The graph obtained is called D. Glue the right-hand side R to the graph D at the part of L which st has an image in D. The part of R not coming from L is added disjoint to D. The resulting graph is E. If the production p contains an additional embedding relation, then emb the right-hand side R further into the graph E according to this embeddin relation. The end result is the graph H.

Graph Grammars

L

G

R

D

E

H

Fig. 1.2. Graph transformation from an operational point of view

Graph transformation systems can show two kinds of nondeterminism t, several productions might be applicable and one of them is chosen a

3.2. OVERVIEW OF THE ALGEBRAIC APPROACHES

Distinguish Approaches

side, respectively, as a finite, schematic description of a potentially in of direct derivations. If the m a t c h m fixes an occurrence of L in a giv G , then G % H denotes the direct derivation where p is applied to G to a derived graph H . Intuitively, H is obtained by replacing the oc of L in G by R . The essential questions distinguishing the particul transformation approaches are:

1. What is a “graph”? 2. How can we match L with a subgraph of G?

3. How can we define the replacement of L by R in G?

In the algebraic approaches, a graph is considered as a two sorted algeb the sets of vertices V and edges E are the carriers, while source s : E target t : E -+ V are two unary operations. Moreover we have label lv : V + LV and le : E + L E , where LV and L E are fixed label a for vertices and edges, respectively. Since edges are objects on their o this allows for multiple (parallel) edges with the same label. Each graph production p : L

-+ R

defines a partial correspondence

So what?

Retrieved from http://www.mathsisfun.com/sets/injectivesurjective-bijective.html

CHAPTER 3. DOUBLE PUSHOUT APPROACH

170

Problems?

J 3

Sometimes the rules can apply in more than one way on the same graph (2). "I

Figure 3.1: Direct derivations.

L-P-R

I

T

G-p*-H

I

T

Figure Example and schematic representation of direct derivation G % H . CHAPTER 3. DOUBLE PUSHOUT APPROACH

170

3.2:

The application of a production can be seen as an embedding into a context which is the part of the given graph G that is not part of the match, i.e. the edge 4 in our example. Hence, at the bottom of the direct derivation, th co-production p ; : GI HI relates the given and the derived graphs, keepin track of all elements that are preserved by the direct derivation. Symmetrically the co-match m'l, which is a graph homomorphism, too, maps the right-hand side R1 of the production to its occurrence in the derived graph H I . A direc derivation from G t o H resulting from an application of a production p at match m is schematically represented as in Figure 3.2, and denoted by d = y-f

J 3

(G % H ) .

"I

Figure 3.1:

In general L does not have to be isomorphic t o its image m ( L ) in G, i.e elements of L may be identified by m. This is not always unproblematic, as w may see in the direct derivation (2) of Figure 3.1. A production pa, assumin two vertices and deleting one of them, is applied t o a graph G2 containing onl one vertex, i.e., vertices 1 and 2 of L2 are both mapped t o the vertex 3 of G2 Direct Thus, derivations. the production p2 specifies both the deletion and the preservation of 3 We say that the match m2 contains a conflict. There are three possible ways t solve this conflict: Vertex 3 of G2 may be preserved, deleted, or the application of the production p2 at this match may be forbidden. In the derivation (2) o Figure 3.1 the second alternative has been choosen, i.e., the vertex is deleted

Problems? It could also result in the deletion vertex that has an edge which is not part of the match (3).

CHAPTER 3. DOUBLE PUSHOUT APPROACH

170

J 3

"I

Figure 3.1: Direct derivations.

Approaches 1. 

Node label replacement approach

2. 

Hyperedge replacement approach

3. 

Algebraic approach is based on pushout constructions:

1.  SPO 2.  DPO 4. 

Logical approach

5. 

Theory of 2-structures

6. 

Programmed graph replacement approach

Algebraic Approach Double or Single pushout approaches. http://www.cs.le.ac.uk/events/segravis/material/Ehrig-Tutorial4.pdf Handbook of Graph Transformations Fundamentals of Algebraic Graph Transformations

to the graph H and hence to a DPO graph transformation G ⇒ H via p and m. For the construction of the first step, however, a gluing condition has to be satisfied, which allows us to construct D with L +K D = G. In the case of an injective match m, the gluing condition states that all dangling points of L, i.e. the nodes x in L such that m(x) is the source or target of an edge e in G \ L, must be gluing points x in K.

Algebraic Approach

DPO

L m

G

K (1)

R (2)

D

H

1.3.Graph DPOTransformation graph transformation 1.2 The Main Ideas of theFig. Algebraic Approach

13

(4) (1) (2) (2) (1) (2) A simple example of (1) a DPO graph transformation step is given in Fig. 1.4, corresponding to the general scheme in Fig. 1.3. Note that in the diagram (PO1), G (3)is the gluing of the graphs L and D (5) along K, where the numbering L K R of the nodes indicates are mapped by graph morphisms. The (PO1) how the nodes (PO2) mapping from the node mapping. Note G of the edges canDbe uniquely deduced H (4) that the gluing condition is satisfied in Fig. 1.4, because the dangling points (1) (2) (1) (2) (1) (2) (1) and (2) of L are also gluing points. Moreover, H is the gluing of R and D along K in (PO2), leading to a graph transformation G ⇒ H via p. In (3) (5) fact, the diagrams (PO1) and (PO2) are pushouts in the category Graphs of (6) (7) (6) (7) (6) (7) graphs and graph morphisms (see Chapter 2). Fig. 1.4. reasons, Example ofthe DPOmorphisms graph transformation For technical K → L and K → R in the productions are usually restricted to injective graph morphisms. However, we allow

e in G after deletion of L \ dom(p) in Fig. 1.5. As a result, edge e would be deleted in H. A detailed presentation and comparison of the two approaches is given in Volume 1 of the Handbook of Graph Grammars and Computing by Graph Transformation [Roz97]. In this book, however, we consider only the DPO approach as the algebraic graph transformation approach.

Algebraic Approach

SPO

(4) (1)

(2)

p

(1)

(3)

(2)

(5)

L

R

G

H (4)

(1)

(2) e

(1)

s

e

(3) (6)

(2)

(5) (7)

(6)

(7)

Fig. 1.5. Example of SPO graph transformation

1.2.4 From Graphs to High-Level Structures The algebraic approach to graph transformation is not restricted to the graphs of the form G = (V, E, s, t) considered above, but has been generalized to a large variety of different types of graphs and other kinds of high-level struc-

V a set with of vertices, E is a set Much of pairs fromappeared V , and first l : V elsewhere → Σ is a Weisbegin basic definitions. of or thisedges section V is we a set of vertices, E is a set pairs graph-theory or edges fromdefinitions V , and graphs l : are V → Σnumisthus a [14], include it for Basic not labeling function. Wecompleteness. restrict ourofdiscussion to simple labeled and G by VG,toE simple G labeling function. We restrict discussion labeled graphs and thus bered, but only recalled [6].Weour simply use the term graph. denote G G and lG the vertex set, edge set simply use the term graph. We denote by by VGV ,Σ E lG theGvertex set Alabeling simple labeled graph an G alphabet a triple = (V,set, E, l)edge where and function of theover graph or ,G Eisand and l when there is no danger of Gpairs or byorVedges ,E l when there Vand is labeling a set of function vertices, E the is agraph set V c, , and l : is V no → danger Σ is a of confusion. We will usually use of the alphabet Σ and =from {a, b, ...}. of confusion. We will usually use the alphabet Σ = {a, b, c, ...}. labeling labeled Givenfunction. graphs We G1 restrict and G2 ,our wediscussion write f :toGsimple f : VGand →thus V G2 1 → G 2 and graphs 1 Given graphs G and G , we write f : G → G and f : V → V 1 G1 edge set G2 simply use the by Vfrom lG2 the G , Ethe G and equivalently to term mean1graph. that fWe is2 adenote function vertex setvertex of G1 set, to the vertex equivalently to mean that f is a function from vertex set of G1 to vertexG1 1 set 2 and function of by V1preserving , the E and l when there isif the no danger oflabeling G2 . A2function h :the G1 graph → G2 G is or a label embedding setconfusion. of G2 . A h : …} G1 → a label preserving if 2 is alphabet Alphabet = function {a,will b, usually c, of We useGthe Σ = {a, b,embedding c, ...}. 1V 1. h is injective, Given graphs G and G , we write f : G → G and f : V → 1 2 1 2 G G2 1 1. h is injective, 2. ⇔ {h(x), ∈ EG2 , from the vertex set of G1 to the vertex equivalently to mean that fh(y)} is a function 1 1 ∈∈ E 2. {x, {x,y} y} EG G1 ⇔2{h(x), h(y)} ∈ EG2 , 3. llG1G= ◦◦ h. set h : G1 → G2 is a label preserving embedding if 2 3. of =. lA lG2function h.

m graph. We denote by V , E and l the vertex set, edg tion of the graph G or by V , E and l when there is no da Example: will usually use the Directed alphabet Σ Self-Assembly = {a, b, c, ...}. G and G , we write f : G → G and f : V → ean that f is a function from the vertex set of G to the v tion h : G → G is a label preserving embedding if G G Function: label preserving embedding if: 1

2

injective, IfIf1.hhh isisis also surjective it is is called called an an isomorphism. isomorphism.The Thegraphs graphsGG1 1and andGG2 2are are also surjective then then it 2. {x, EG1 ⇔ {h(x), h(y)}G EGG, ) if there exists an isomorphism relating said to be isomorphic (written G∈ ≃ said to y} be ∈ isomorphic (written 11 ≃ G222) if there exists an isomorphism relating 3. lG1 = lG2 ◦ h. them. them. G2

⇔ {h(x), h(y)} ∈ E

,

IfDefinition h is also surjective then is called an isomorphism. The graphs GR1. .The and 2 are Definition 1. is aitpair pair of graphs graphs = (L, (L,R) R) where where TheG graphs 1. A A rule rule is a of rr = VVLL==VV graphs R said to R beare isomorphic G1side ≃ Gand therehand existsside an of isomorphism relating 2 ) ifright L called left hand side and right hand side ofrrrespectively. respectively. The L and and R are called the(written The them. size whose vertex vertex sets sets have haveone, one,two twoand andthree threevertices vertices size of of rr is is |V |VLL| = |VR R |. Rules whose are ternary, respectively. respectively. 1 are called called unary, unary, binary and ternary, Definition 1. A rule is a pair of graphs r = (L, R) where VL = VR . The graphs LWe and R refer are called leftbeing hand side and right of r respectively. The 1the as 2 constructive We may to constructive (ELLhand ⊂EERside ),destructive destructive (ELL⊃⊃E E may refer to rules (E ⊂ (E )) R), RR size of r is |VL | = constructive |VR |. Rules whose vertex setsA have two and three vertices or mixed mixed (neither or destructive). rule isisacyclic ififits hand or (neither destructive). A ruleone, acyclic itsright right hand are called unary, binary and ternary, respectively. side contain contain no no cycles (the left hand side hand side side may may contain contain cycles). cycles).

ve then it is called an isomorphism. The graphs G and G phic (written G ≃ G ) if there exists an isomorphism rel We may refer2.to rules as is being constructive (EL ⊂ E ), destructive (EL ⊃ ER ) Definition G Definition 2. A rule r applicable applicable to to aa graph graph G ififRthere thereexists existsan anembedding embedding or mixed (neither constructive or destructive). A rule is acyclic if its right hand → G. G. In In this this case the function L on R hh :: LL → function hh is is called called aa witness. witness.An Anaction action onaagraph graph side contain no cycles (the left hand side may contain cycles). G is is aa pair pair (r, (r, h) h) such such that that rr is G is applicable applicable to to G G with with witness witnessh. h.

rule is a pair of graphs r = (L, R) where V = V . The gr

L and R are called the left hand side and right hand side of r respectively. The size of r is |VL | = |VR |. Rules whose vertex sets have one, two and three vertices are called unary, binary and ternary, respectively. We may refer to rules as being constructive (EL ⊂ ER ), destructive (EL ⊃ ER ) or mixed (neither constructive or destructive). A rule is acyclic if its right hand side contain no cycles (the left hand side may contain cycles). Definition 2. A rule r is applicable to a graph G if there exists an embedding h : L → G. In this case the function h is called a witness. An action on a graph G is a pair (r, h) such that r is applicable to G with witness h. Definition 3. Given a graph G = (V, E, l) and an action (r, h) on G with r = (L, R), the application of (r, h) to G yields a new graph G′ = (V ′ , E ′ , l′ ) defined by V′ =V E ′ = (E − {{h(x), h(y)} | {x, y} ∈ L}) ∪{{h(x), h(y)} | {x, y} ∈ R} ! l(x) if x ̸∈ h(VL ) l′ (x) = lR ◦ h−1 (x) otherwise. r,h

We write G −−→ G′ to denote that G′ was obtained from G by the application of (r, h). Definition 4. A graph assembly system is a pair (G0 , Φ) where G0 is the initial

E ′ = (E − {{h(x), h(y)} | {x, ∈ L}) ∪{{h(x), h(y)} | {x, y} y} ∈ R} ! ∪{{h(x), | {x, y} ∈ R} l(x) if h(y)} x ̸∈ h(V L) ′ ! l (x) = l(x) x ̸∈(x) h(V lR ◦ifh−1 otherwise. L) ′ l (x) = lR ◦ h−1 (x) otherwise. r,h −→ G′ to denote that G′ was obtained from G by the application of We write G −r,h Weh). write G −−→ G′ to denote that G′ was obtained from G by the application of (r, (r, h). Definition 4. A graph assembly system is a pair (G0 , Φ) where G0 is the initial Definition system a pair graph and Φ4.is A a graph set of assembly rules (called the is rule set).(G0 , Φ) where G0 is the initial graph and Φ is a set of rules (called the rule set). We often refer to a system simply by its rule set Φ and assume that the initial Weisoften refer to agraph system simplybyby its rule set Φ and assume that the initial graph the infinite defined graph is the infinite graph defined by G0 ! (N, ∅, λx.a) (1) G0 ! (N, ∅, λx.a) (1) where a ∈ Σ is the initial symbol (here λx.a is the function assigning the label ∈ Σ is the initial symbol (here λx.a is the function assigning the label awhere to allavertices). a to all vertices). Definition 5. An assembly sequence of a system (G0 , Φ) is a finite sequence Definition 5.that An there assembly sequence of a of system (G0{(r , Φ), his )} a kfinite sequence k {G such exists a sequence actions where ri ∈ Φ i }i=0 i i i=1 k k {G } such that there exists a sequence of actions {(r , h )} where r i i i=1 i ∈ Φ andi i=0 ri ,hi and −− → Gi+1 Gi r−− i ,hi − − − − → Gi Gi+1 for i ∈ {0, ..., k − 1}. for i ∈ {0, ..., k − 1}. Thus, a system (G0 , Φ) defines a non-deterministic dynamical system whose Thus, a system (G0 , Φ) defines a non-deterministic dynamical system whose states are labeled graphs over V . The system is non-deterministic since, at states are labeled graphs over VGG00. The system is non-deterministic since, at any step, many rules in Φ may be simultaneously applicable, each possibly via any step, many rules in Φ may be simultaneously applicable, each possibly via

any step, many rules in Φ may be simultaneously applicable, each possibly via several witnesses. Two vertices in a graph G are connected if there is a path (sequence of edges) connecting them in G. The connectivity relation on V is an equivalence relation partitioning V into sets {Vi }i∈I where v1 and v2 are connected if and only if v1 , v2 ∈ Vi for some i. The sets Vi are called the components of G. A graph G is connected if it has exactly one component. Definition 6. A connected graph G is reachable in a system (G0 , Φ) if there exists an assembly sequence {Gi }ki=0 of (G0 , Φ) such that G is isomorphic to some component of Gk . The set of all such reachable graphs is denoted R(G0 , Φ), or just R(Φ) if G0 defined by (1).

Definition 7. A graph G ∈ R(G0 , Φ) is stable if for all G′ there does not exist an action (r, h) on the disjoint union G ⨿ G′ such that r = (L, R) ∈ Φ and h(L) ∩ VG is nonempty. The set of all such stable graphs is denoted S(G0 , Φ), or just S(Φ) if G0 defined by (1).

ple 2. Define a mixed rule set with three binary constructive rules, two y constructive rules and one binary destructive rule by ⎧ ⎪ d d1 ⎪ ⎪ ⎪ ⇒ ✡ ❏ ⎪ ✡ ❏ ⎪ ⎪ a a ⇒ b − c, ⎪ b f b1 f1 , ⎪ ⎪ Example ⎪ ⎨ f1 f2 Φ2 = ⇒ ✡ ❏ ✡ ❏ ⎪ ⎪ a c ⇒ e − d, b1 g b2 g1 , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ a e ⇒ g − f, b2 − f2 ⇒ b3 f3 ⎪ ⎪ ⎩

ample assembly sequence for Φ2 is shown in Figure 1(b). The three bionstructive rules yield chains of length 4. The two constructive and cyclic float The last rule, removes the first triangulaty rules “triangulate” Particles the cycle. in stirred or Occasionally, e to yield a length 4 cycle, which is the unique stable graph of the system. agitated fluid parts collide If states Else, correspond to an generalization of grammars on strings studied in comparticles assembly rule, stick and can particles do anything that string grammars canrebound do (repchange state.

ph grammars are a cience. Thus, they regular languages, arbitrary computation, ...). In the next example, we ow to simulate Fig. a string grammar in Chomsky Normal Form [9] with a 6. A particle based embedding of self assembly using graph grammars. grammar. From the example, it should be clear that any string grammar, t-free or context sensitive, can be simulated in a similar fashion.

rule set

Single Pushout Approach

Double Pushout Approach

Brijder et al

Kawamata et al

McCaskill et al

Klavins et al