CSE 105: Introduction to the Theory of Computation, Winter 2003 Solution to Problem Set 2

A. Hevia and J. Mao February 14, 2003

Solution to Problem Set 2

2.1 Given the grammar G: E →E+T |T T →T ×F |F F → (E ) | a Give the parse trees and derivations for each string. a. The derivation is E ⇒ T ⇒ F ⇒ a. The parse tree is shown in Figure 1. b. The derivation is E ⇒ E + T ⇒ T + T ⇒ F + T ⇒ a + T ⇒ a + F ⇒ a + a. The parse tree is shown in Figure 1. c. The derivation is E ⇒ E + T ⇒ E + T + T ⇒ T + T + T ⇒ F + T + T ⇒ a + T + T ⇒ a + F + T ⇒ a + a + T ⇒ a + a + F ⇒ a + a + a. The parse tree is shown in Figure 1. d. The derivation is E ⇒ T ⇒ F ⇒ (E) ⇒ (T ) ⇒ (F ) ⇒ ((E)) ⇒ ((T )) ⇒ ((F )) ⇒ ((a)). The parse tree is shown in Figure 1. E

E

E

E

T

E

T

F

T

F

E

T

F

a

F

a

T

F

a

E

+ a

T

T

F E

+ F

T

a +

F

a (

)

E T F (

) a

Figure 1: 2.1 (a), (b), (c), and (d) 2.3 The given context-free gramar G is

1

CSE 105, Solution to Problem Set 2

2

R 7→ XRX | S S 7→ aT b | bT a T 7→ XT X | X | ² X 7→ a | b a. The variables of G are { R, X, S, T }. The terminals of G are { a, b }. The start variable is R. b. Strings in L(G) are ab, ba, and aaaabbaaa. c. Strings not in L(G) are aaa, aba, and bb. d. T → aba: false. e. T →∗ aba: true. f. T → T : false. g. T →∗ T : true. h. XXX →∗ aba: true. i. X →∗ aba: false. j. T →∗ XX: true. k. T →∗ XXX: false. l. S →∗ ²: false m. The language generated by L is the language of all strings w over { a, b } such that w is not palindrome, that is, w 6= wR . 2.6 b. L is the complement of the language { an bn : n ≥ 0 }. First, let’s see what the complement of L looks like: L = { an bm : n 6= m } ∪ { (a ∪ b)∗ ba(a ∪ b)∗ } Let’s call the leftmost language L1 and the rightmost L2 . The context-free grammar that generate L1 is S1 → aS1 b | T | U T → aT | a U → Ub | b The context-free grammar that generate L2 is S2 → RbaR R → RR | a | b | ² Therefore, the context-free grammar G that generate L = L1 ∪ L2 is S → S1 | S2 S1 → aS1 b | T | U S2 → RbaR T → aT | a U → Ub | b R → RR | a | b | ²

3

CSE 105, Solution to Problem Set 2

c. L = { w#x : wR is a substring of x for w, x ∈ {0, 1}∗ }. The context-free grammar G that generate L is S → TR T → 0T 0 | 1T 1 | #R R → RR | 0 | 1 | ² 2.7 a). The language of strings over the alphabet { a, b } with twice as many a’s as b’s. The PDA that recognizes this language is shown in Figure 2. a, ² → a

², $ → ² b, a → ²

q2

q3

², a → ²

a, ² → a

q8

b, $ → $ q0

², ² → $

q4

q1

², ² → a

², ² → h

q5

², $ → ²

a, $ → $

b, ² → h

², ² → h

q6

a, h → ²

q7

b, ² → h

Figure 2: 2.7 (a) d). L = {x1 #x2 · · · #xk | k ≥ 1, each xi ∈ { a, b } and for some i and j, xi = xR j }. The PDA of Figure 3 recognizes L. a, ² → ² b, ² → ²

a, ² → ² b, ² → ²

q2

q0

², ² → $

², $ → ²

q5

², ² → ²

², ² → ²

#, ² → ²

#, ² → ² q1

q4 ², ² → ²

#, ² → ² ², $ → ²

², ² → ² #, ² → ²

², ² → ² q3

q7

², ² → ²

a, ² → a b, ² → b

Figure 3: 2.7 (d)

q6

a, a → ² b, b → ²

4

CSE 105, Solution to Problem Set 2

2.8 This sentence ”the girl touches the boy with the flower” has two distinct parse trees as shown in the corresponding figure. This shows that this CFG is ambiguous. S

NP

VP

CN

A

S

NP

CV

CN

N touches

the

VP

CV

A

N

the

girl

PP NP

touches

P

CN

NP CN

girl CN

PP

A A

N P

the

boy with

with N

A

the

N

flower

CN the A

N

the

flower

boy

Figure 4: 2.8 2.13 Given the grammar G: S 7→ T T | U T 7→ 0T | T 0 | # U 7→ 0U 00 | # a). The language generated by L = L(G) is the set of strings that either are composed by the concatenation of 3 or more arbitrary-length strings of zeroes (delimited by the symbol #) or strings of the form 0k #02k for k ≥ 0. More formally, a word w in L is of the form w = 0k1 #0k2 # · · · #0ki for i ≥ 3 (where kj ≥ 0, for all j) or of the form w = 0k #02k for k ≥ 0. b). Let’s prove L is not regular. Towards a contradiction, assume L is regular. By the pumping lemma, there exists a pumping length p > 0 such that for any word w ∈ L, |w| ≥ p, we can split word w into 3 pieces w = xyz, |y| > 0, |xy| ≤ p, such that for any integer i ≥ 0, w0 = xy i z belongs to language L too. Now, consider the word w = 0p #02p . Clearly this word belongs to L and |w| ≥ p so it must sutisfy the conditions of the theorem. Any partition of word w into 3 pieces xyz much be such that y is composed only by 0’s from the leftmost sequence of 0’s (otherwise |xy| would be larger than p which is not allowed). Therefore, y = 0k for k > 0. We now “pump down” this word, obtaining w0 = xy 0 z. Clearly, word w0 = 0p−k #02p does not belong to L because k > 0. So, any possible partition of w cannot be pumped for all i > 0 (since i = 0 does not work!). We have obtained a contradiction. Therefore, L must be non regular.

CSE 105, Solution to Problem Set 2

5

2.18 b). L = { 0n #02n #03n : n ≥ 0 }. Let’s prove L is not context free. Towards a contradiction, assume L is context free. By the pumping lemma for context-free languages (PL4CFL), there exists a pumping length p > 0 such that any word w, |w| ≥ p can be split into 5 pieces w = uvxyz, |vy| > 0, |vxy| ≤ p such that for any integer i ≥ 0, the word w0 = uv i xy i z belongs to language L. Consider the word w = 0p #02p #03p which satisfies the conditions of the PL4CFL. Let’s analyze all possible partitions of w into 5 pieces. Recall that |vxy| ≤ p so the substring vxy cannot span more than two consecutive sequences of zeroes. • Case 1: vxy is within the leftmost sequence of zeroes. Clealy, if we pump it down, say i = 0, we’ll decrease the number of zeroes in the leftmost sequence without modifying the number of zeroes on the second and third sequences. Therefore, the resulting word will not be in the language. • Case 2: vx is within the leftmost sequence of zeroes but y includes the first # (and possibly some zeroes from the second sequence). Clearly, if we pump it up, say i = 2, we’ll obtain a word w0 = uv 2 xy 2 z which contains three symbols #, and thus w0 is not in the language. • Case 3: v is within the leftmost sequence of zeroes, x contains the leftmost symbol # and y comprises only symbols from the middle sequence of zeroes. Clearly, if we pump it down, say i = 0, the number of zeroes in the first and second sequence will decrease but the number of zeroes in the third sequence will not. Therefore, the word will not be in the language. • Case 4: vxy is completely within the middle or rightmost sequence of zeroes. These cases are analogous to case 1. • Case 5: x or y include the rightmost symbol #. This case is analogous to cases 3 or 2 (respectively). Since for all possible partitions the resulting pumped word w0 is not in language L, then we obtain a contradiction. Therefore, L is not context free. c). L = { z#x | z is a substring of x, where z, x ∈ { a, b }∗ }. Let’s prove L is not context free. Towards a contradiction, assume L is context free. Consider the word w = bp ap #bp ap in L which satisfies the conditions of the PL4CFL. Let’s analyze all possible partitions of w into 5 pieces. • Case 1: vxy is within the leftmost sequence of b’s. Clealy, if we pump it down, say i = 0, we’ll decrease the number of b’s in the leftmost string (left of the symbol #) without modifying the number of b’s on the rightmost string (right of the symbol #). Therefore, the left string will not be a substring of the right string, and the resulting word will not be in the language. • Case 2: vx is within the leftmost sequence of b’s but y includes some b’s and some a’s from the left string (left of the symbol #). Clearly, if we pump it up, say i = 2, we’ll obtain a word w0 = uv 2 xy 2 z where the leftmost string contains some “out of order” symbols, that is, a substring of the form ab which cannot be a substring of the rightmost word. Therefore, w0 is not in the language.

6

CSE 105, Solution to Problem Set 2

• Case 3: v is within the leftmost sequence of b’s, x contains some b’s and a’s and y comprises only a’s (left of symbol #). Clearly, if we pump it up, say i = 2, we obtain a string where the left string is of the form bp+k ap+k for k > 0. This string can not be a substring of the string of the rightmost string (which is unmodified). Therefore, the word is not be in the language. • Case 4: v contains some b’s and some a’s from the leftmost word. This cases are analogous to case 2 (if we pump up, there will be a’s and b’s out of order which can’t be a substring of the rightmost word). • Case 5: y contains the symbol #. If we pump it up, we obtain a word with more than one symbol #, which can’t belong to the language. • Case 6: x contains only a’s from the leftmost string, x contains the symbol # and y contains b’s from the rightmost word. If we pump up, say i = 2, we obtain a string where the leftmost word contains more a’s than the rightmost word. The resulting word cannot be in the language. • Case 7: v contains the symbol #. This case is analogous to case 5. • The rest of the cases for which vxy is somewhere within the rightmost word are analogous to cases 1,2,3, and 4. Since for all possible partitions the resulting pumped word w0 is not in language L, then we obtain a contradiction. Therefore, L is not context free. 2.25 Notice that Y generates all possible strings. The grammar generates all strings NOT of the form ak bk for k ≥ 0. Thus the complement of the language generated is L(G) = {ak bk : k ≥ 0}. The CFG for this language is very easy to generate: S → aSb|² 2.26 C = { x#y : x, y ∈ {0, 1}∗ , and x 6= y }. Solution: The fundamental idea is that if x and y differ, there must be a position k in both words such that xk 6= yk . Our PDA, while reading the first string, will count symbols (by pushing some other symbol x into the stack) until nondeterministically will decide to store the current symbol in the stack. Then, when reading the second string it, the PDA will count symbols (by poping the x’s from the stack) until nondeterministically will decide to check whether the current symbol matches the symbol stored in the stack. If they differ, it will accept. Otherwise it will reject. q0

², ² → $

q1

0, ² → 0 1, ² → 1

0, ² → x 1, ² → x

q2

#, ² → ²

0, ² → ² 1, ² → ²

q3

0, 1 → ² 1, 0 → ²

0, x → ² 1, x → ²

q4

0, ² → ² 1, ² → ²

Figure 5: 2.26. PDA recognizing language C.

², $ → ²

q5