Right Linear Grammars, Regular Languages, and Finite State Automata

Right Linear Grammars, Regular Languages, and Finite State Automata 1 Regular Languages def A · B = {xcy | x ∈ A, y ∈ B} def A∗ = (a) e ∈ A∗ empty st...
Author: Alannah Gibson
0 downloads 3 Views 70KB Size
Right Linear Grammars, Regular Languages, and Finite State Automata 1 Regular Languages def

A · B = {xcy | x ∈ A, y ∈ B} def A∗ = (a) e ∈ A∗ empty string (b) If x ∈ A and y ∈ A∗ then xcy ∈ A∗ 1. 2. 3. 4. 5. 6.

concatenation product

∅ is a regular language. For any string x ∈ Σ∗ , {x} is a regular language. S If A and B are regular languages, so is A B. If A and B are regular languages, so is A · B. If A is a regular language, so is A∗ . Nothing else is a regular language except by way of 1-5.

Examples: 1. ∅ is a regular language. 2. If Σ = { a, b, c, d }, then { cdddab } is a regular language. 3. { cdddab } and { acddb } and { aaa } are regular; therefore, so are { cdddab, acddb }, { cdddab, aaa }, { acddb, aaa }, and { cdddab, aaa, acddb }, 4. A = { ab, cd } and B = { ba, dc } are regular; therefore so are { abba, cddc, abdc, cdba}(A · B), { baab, dccd, bacd, dcab }(B · A), 5. If { aaa } is a regular language, so is L = (aaa)∗ . L: All strings of a’s with length evenly divisible by 3. Cute fact department: By the above definition (clause 1), e ∈ ∅∗ Note that {e} = ∅∗ and therefore, by clause 5, {e} is a regular language. 1

1.1 Regular Expressions 1. 2. 3. 4. 5. 5′ .

e {x} S A B. [A, B]. If A is a regular language, so is A∗ . If A is a regular language, so is A∗ − {e} = A+ .

Intuition: The regular languages are those that can obtained from the empty language (∅) and the languages with just one string ({a}, {e}, {ab}, . . . ) by repeated application of 1. Union 2. Concatenation 3. Kleene star

1.2 Examples ab+ c∗ d In: 1. 2. 3. 4.

abd abbbbbbbbbbbd abbcd abbbbbbbbbbbccccccccccccd.

1. 2. 3. 4. 4.

e acbbbbbbbbbbbd acd bbbbbbbbbbbccccccccccccd. bbbbbbbbbbbcccccccccccc.

Out:

The language with 2 bs: B2 = a∗ ba∗ ba∗ The language with 3 bs: B2 = a∗ ba∗ ba∗ ba∗ 2

1 d a 0

b 4

2

b

d

c 3

Figure 1: A Finite State Automaton for the language ab+ c∗ d The language with 2 or 3 bs: B2 = a∗ ba∗ b?a∗ ba∗ The language with an odd number of bs. a∗ ba∗ (ba∗ ba∗ )∗

2 Finite-State Automata

3

c

a

b

0

d

3

2

{b,c}

Figure 2: A Finite State Automaton for the language ab{b, c}∗ d

1

a

2

b a 3

b

a 4

Figure 3: A tricky example

4

1

a 0

b 3

a 1 a

b

Figure 4: Another version of the tricky example: same language

1

Figure 5: FSA for the empty language

3 Sketch of 1/2 of Proof that FSA languages and Regular languages are the same Theorem 17.1 (p. 464) A set of strings is a finite automaton language if and only if it is a regular language. One half of Theorem 17.1: If a language L is a regular language then L is a finite automaton language (L has a finite automaton that exactly describes it). Proof by induction on Regular languages.

3.1 Base Case First we prove there are FSAs for the empty language (Figure 5) and the unit languages (Figure 6). Note that one of the latter is the FSA for the unit language admitting the empty string (7).

5

2

x

1

2

Figure 6: FSA for a unit language

e

1

2

Figure 7: FSA for the unit language admitting the empty string

3.2 Induction Step To complete our demonstration, that if there are FSA’s for L1 S it is enough to show ∗ and L2 , there are FSAs for L1 L2 , L1 L2 , and L1 .

4 Type 3 Grammars Also called right linear grammars. Types of rules allowed: A→x A → xB x stands for a string of terminals 1. VN Caps: Nonterminals (allowed on left-hand sides of rules) 2. VT Lower case: terminals (words/alphabet/substrings of the language (allowed ONLY on right hand sides of rules) Some disallowed rules: 1. S → A B 2. S → A 3. S → A x 6

1

a

a

2

b

b

2

3

3

c c

c c

eps

1

Figure 8: A random regular language L1 and its Kleene closure Allowed 1. S → a 2. S → b 3. S → abaaaba B

7

4

4

1

1

d

a

2

2

e

b

3

3

f 7

f f

c c

4

4

d f

6

e

5

Figure 9: A random regular language L2 and the concatenation product of L1 with L2 .

4

a 1

2

b

3

c c f

d 5

e

6

Figure 10: The union of L1 with L2 .

8

f

7

a S

a

b bb

A

B

b

F

Figure 11: The FSA for grammar G

0 0

S1 1

1

S0 1

0

S2

Figure 12: An FSA that accepts strings such that the number of 1’s is congruent to 1 mod 3 A grammar: G = hVN , VT , S, Ri, VN =  {S, A, B}, VT = {a, b}; S → aA ,          A → a A,  A → bb B, R =    B → bB ,        B →b S ⇒ aA ⇒ aa A ⇒ aabbB ⇒ aabbb B ⇒ aabbbb The procedure for going from an FSA to a grammar: 1. For each transition of the form (qi , x, qj ), add a rule of the form qi → x qj . 9

2. For each transition of the form (qi , x, qj ), if qj is a final state, add a rule of the form qi → x. Note that transitions to final states will always generate two rules. For example, the FSA in Figure 12 will generate the following grammar: Rule S0 → 1 S1 S0 → 0 S0 S1 → 1 S2 S1 → 0 S1 S2 → 1 S0 S2 → 0 S2 S0 → 1 S1 → 0

Transition (S0 1 S1) (S0 0 S0) (S0 1 S2) (S1 0 S1) (S2 1 S0) (S2 0 S2) (S0 1 S1) (S1 0 S1)

Transition to final state yes no no yes no no yes yes

5 Possible and Impossible Figure 13 gives an FSA that accepts strings such that the number of a’s and b’s is congruent mod 3. Note that there are 3 kinds of states one can be in with respect to a’s in accepting such strings, a mod 0 state, a mod 1 state and a mod 2 state; similarly there are 3 kinds of states with respect to b’s. Capturing all combinations of such a and b states requires 3 × 3 = 9 states. The first digit in the name of each state encodes the congruence class for the number of a’s seen, and the second digit the congruence class for the number of b’s. For example state 12 means the number of a’s seen is congruent to 1 mod 3 and the number of b’s seen is congruent to 2 mod 3. Thus the only states in which the number of a’s and b’s are congruent mod 3 are 00, 11, and 22, so these become the final states. The rules for adding transitions are the following: 1. (xy,a,zy) is a transition, where x + 1 ≡ z mod 3 2. (xy,b,xz) is a transition, where y + 1 ≡ z mod 3 For example, 1. (00,a,10) is a transition, because 0 + 1 ≡ 1 mod 3 2. (12,b,10) is a transition, because 2 + 1 ≡ 0 mod 3

10

a a

02

b

12

a b

22

b b

b

a 00

10

a 20

a

b

b

01

b

b

a

11

a

21

a

Figure 13: An FSA that accepts strings such that the number of a’s and b’s is congruent mod 3

11

a S0

b

a S1

b

a

a S2

b

...

b

Sn

Figure 14: An FSA that accepts equal numbers of a’s and b’s It is instructive to examine the difference between this language, for which an FSA exists, and one for which an FSA does not exist: L = {an bn | n ∈ ℵ} This language requires the same number of a’s and b’s. To keep that track of the fact that n a’s have been seen, and that n b’s are owed a machine will need n states, arrayed as in Figure 14. Assuming S0 is the only final state this machine accepts any string of a’s followed by exactly the same number of b’s, as long as there are n or less of them. Using loops, it will also accept many strings with n+1 or more a’s and exactly the same number of b’s, but only by interleaving a’s and b’s.1 To accept any string with n+1 a’s and still keep track of the counting dependency to enforce the requirement of exactly n+1 following b’s, n+1 states are required. To accept an bn for any n, an infinite number of states would be required. In the case of the machine of Figure 13, congruence mod 3 was at issue, only 3 pieces of information per letter; in the case of Figure‘14 equality is at issue.

1

The limitation is that it cannot accept strings with more than n consecutive a’s.

12

6 The Pumping Lemma Theorem: The Pumping Lemma If L is an infinite finite automaton language over alphabet Σ, then there are strings x,y,z in Σ∗ , y non empty, such that xyn z ∈ L for all n > 0 or n = 0. Example: a(ba)∗ c is a regular language. Pumping String: 1. x=a 2. y=ba 3. z=c Sketch of proof: We have an infinite language but a finite number of states. Let n be the number of states and consider a string s of length n. In admitting s, some state Si must have been visited twice. Let s=xyz, where y is the substring admitted by the state sequence connecting Si to Si . Then for any n > 0, xyn z is admitted by the machine. The theorem is often used in its contrapositive form: Contrapositive Pumping Lemma If there are no strings x,y,z in Σ∗ , y non empty, such that xyn z is in L for all n > 0 or n = 0, then L is not an infinite finite automaton language. Example: It can be shown, using The Pumping Lemma, that an bn is not an infinite finite automaton language. Proof by enumeration of cases. Since y cannot be empty, there are 3 possibilities for a pumping string for an bn . 1. y consists entirely of a’s 2. y consists of some sequence of a’s followed by a sequence of b’s. 3. y consists entirely of b’s In each of these cases, using y as the repeating part of a pumping string generates strings that are not in the language:

13

1. y consists entirely of a’s: if xyz is in the language, xy2 z is not, because xy2 z has more a’s than b’s. 2. y consists of some non-empty sequence of a’s followed by a non-empty sequence of b’s. if xyz is in the language, xy2 z is not, because xy2 z has a’s following b’s. 3. y consists entirely of b’s: if xyz is in the language, xy2 z is not, because xy2 z has more b’s than a’s.

7 Applications to English (1)

a. b. c. d.

The cat died. The cat the dog chased died. The cat the dog the rat bit chased died. The cat the dog the rat the elephant admired bit chased died.

(2)

a. A = {the cat, the dog, the rat, the kangaroo, . . . } b. B = {chase, bit, admired, befriended, . . . } c. CE = xn y n−1, where x ∈ A, y ∈ B The language CE in (2c) is not a regular language. (exercise) This alone does not prove English is not a regular language. Why not?

14

In general, if a subset of L is not regular, that does not mean L is not regular: an bn ⊂ a∗ b∗ Consider the regular language: AB = x∗ y ∗, where x ∈ A, y ∈ B Claim A: English ∩ AB = CE From Claim A it follows that English is not a regular language: If English were a regular language, then its intersection with the regular language AB would have to be a regular language (closure under intersection). Since CE is provably NOT regular, English is not regular. Consider what claim A entails: (3)

a. b. c. d.

* The cat the dog died. * The cat chased died. * The cat the dog bit chased died. * The cat the dog the rat chased died.

These strings are in AB but not in CE, so if these strings were in English, the intersection of AB and English would NOT be CE. So, a key part of the proof is not just that the strings in (1) are English, but that the strings in (3) are not. That is, the counting dependencies between the parts of the strings are enforced. Other relevant English constructions: Dependency Either . . . or If . . . then Anyone . . . is

Type Conjunction Complex clause Subject-Verb agreement

Unsuitable *Either . . . then * If . . . or *Anyone . . . are

Such dependencies can be nested arbitrarily deeply. Recall that regular languages can’t capture arbitrary palindromes:2 2 The proof that xxR is not regular uses the Pumping Lemma indirectly. Intersecting xxR with regular language aa∗ bbaa∗ gives {an b2 an | n > 1}

It is easy to show that THIS is not regular via the Pumping Lemma.

15

abbabbabba

Chomsky and Miller (1963) have an example that nested English pairing dependencies nest arbitrarily deeply as well: (4)

Anyone1 who feels that if2 so-many3 more4 students5 whom we6 haven’t6 actually admitted are5 sitting in on the course than4 ones we have that3 the room had to be changed, then 2 probably auditors will have to be excluded, is1 likely to agree that the curriculum needs revision.

16