Transforming a grammar for LL(1) parsing

Transforming a grammar for LL(1) parsing Ambiguous grammars are not LL(1) but unambiguous grammars are not necessarily LL(1) Having a non-LL(1) unamb...
Author: Randall Nelson
198 downloads 2 Views 12MB Size
Transforming a grammar for LL(1) parsing

Ambiguous grammars are not LL(1) but unambiguous grammars are not necessarily LL(1) Having a non-LL(1) unambiguous grammar for a language does not mean that this language is not LL(1). But there are languages for which there exist unambiguous context-free grammars but no LL(1) grammar. We will see two grammar transformations that improve the chance to get a LL(1) grammar: I I

Syntax analysis

Elimination of left-recursion Left-factorization

145

Left-recursion The following expression grammar is unambiguous but it is not LL(1): Exp Exp Exp Exp2 Exp2 Exp2 Exp3 Exp3

! Exp + Exp2 ! Exp

Exp2

! Exp2

! Exp2 ⇤ Exp3 ! Exp2/Exp3 ! Exp3 !

num

! (Exp)

Indeed, First(↵) is the same for all RHS ↵ of the productions for Exp et Exp2 This is a consequence of left-recursion. Syntax analysis

146

Left-recursion Recursive productions are productions defined in terms of themselves. Examples: A ! Ab ou A ! bA. When the recursive nonterminal is at the left (resp. right), the production is said to be left-recursive (resp. right-recursive). Left-recursive productions can be rewritten with right-recursive productions Example: N

! .. .

N↵1

N N

! ! .. .

N↵m

N

!

Syntax analysis

1

N

! .. .

1N

0

N

!

nN

0

N

0

, N0

n

N0

! ↵1 N 0 .. . ! ↵m N 0 ! ✏

147

Right-recursive expression grammar Exp Exp Exp Exp Exp2 Exp2 Exp2 Exp3 Exp3

Syntax analysis

0

! Exp + Exp2

Exp

! Exp2

Exp 0

! Exp

Exp 0

Exp2

! Exp2 ⇤ Exp3 ! Exp2/Exp3 ! Exp3

Exp2

,

Exp2

0

Exp20

num

Exp20

! (Exp)

Exp3

!

Exp3

!

Exp2Exp 0

!

+Exp2Exp 0

!



!

⇤Exp3Exp20

!



!

(Exp)

! ! ! !

Exp2Exp 0 Exp3Exp20 /Exp3Exp20 num

148

Left-factorisation The RHS of these two productions have the same First set. Stat ! Stat !

if Exp then Stat else Stat if Exp then Stat

The problem can be solved by left factorising the grammar: Stat !

ElseStat !

if Exp then Stat ElseStat else Stat

ElseStat ! ✏ Note I

I

Syntax analysis

The resulting grammar is ambiguous and the parsing table will contain two rules for M[ElseStat, else] (because else 2 Follow (ElseStat) and else 2 First(else Stat)) Ambiguity can be solved in this case by letting M[ElseStat, else] = {ElseStat ! else Stat}. 149

Hidden left-factors and hidden left recursion Sometimes, left-factors or left recursion are hidden Examples: I

The following grammar: A ! da|acB

! abB|daA|Af

B



I

has two overlapping productions: B ! daA and B ) daf . The following grammar: S T

! Tu|wx

! Sq|vvS



has left recursion on T (T ) Tuq)

Solution: expand the production rules by substitution to make left-recursion or left factors visible and then eliminate them Syntax analysis

150

Summary

Construction of a LL(1) parser from a CFG grammar Eliminate ambiguity Eliminate left recursion left factorization Add an extra start production S 0 ! S$ to the grammar

Calculate First for every production and Follow for every nonterminal Calculate the parsing table Check that the grammar is LL(1)

Syntax analysis

151

Recursive implementation

Recursive implementation

From the parsing table, it is easy to implement a predictive parser recursively From the parsing table, it is easy to implement a predictive parser

From the parsing table, it is easy to implement a predictive parser 3.12. LL(1) PARSING 81 3.12. LL(1) PARSING recursively (with one function per nonterminal) 3.12. LL(1) PARSING

recursively T0 T

!

T

T

R R

0

T T R

! ! ! !

function parseT’() = if next = ’a’ or next = ’b’ or next = ’$’ then parseT() ; match(’$’) else reportError()

if next = ’a’ or next = ’b’ or next = ’$’ then parseT() ; match(’$’)

T$

! ! ! !

R

function parseT() = else reportError() if next = ’b’ or next = ’c’ or next = ’$’ then parseR() elsefunction if next parseT() = ’a’ then= if next ;= parseT() ’b’ or next = ’c’ or next = ’$’ then match(’a’) ; match(’c’) parseR() else reportError()

T$

aTc ✏ bR

R aTc

else if next = ’a’ then

match(’a’) function parseR() = ; parseT() ; match(’c’) else reportError() if next = ’c’ or next = ’$’ then



function parseT() =

a $ (* do nothing *) R b! bR c if next = ’b’ or next = elsefunction if next parseR() = ’b’ then= T0 T0 ! T$ T0 ! T$ T0 ! T$ match(’b’) if next ;= parseR() ’c’ or next = ’$’ then parseR() T T ! aTc T ! R T ! R T ! R a b c $ else reportError() (* do nothing *) RT 0 T 0 ! T $R T !0 ! bR T $R ! ✏ RT !0 ✏! T $ if then next = ’a’ then else ifelse next = ’b’

T R

T ! aTc

Syntax analysis

Syntax analysis

T !R R ! bR

81

function parseT’() = if next = ’a’ or next = ’b’ or next = ’$’ then parseT() ; match(’$’) function parseT’() = else reportError()

T !R R!✏

T !R R!✏

’c’ or next = ’$’ then

match(’b’) ; parseR() Figure 3.16: Recursive descent parser for grammar 3.9 else reportError()

match(’a’) ; parseT() ; match(’c’) else reportError()

For parseR, we must choose the empty production on symbols in FOLLOW(R) Figure 3.16: Recursive descent parser forall grammar 3.9 (c or $). The production R! bR is chosen on input b. Again, other symbols produce an error. 62 The function match takes as argument a symbol, which it tests for equality For parseR, we must choose the empty production on symbols FOLLOW(R) with the next input symbol. If they are equal, the following symbol is readininto (c or next. $). The R !is bR is chosen on first inputinput b. Again, other symbols the variable Weproduction assume next initialised to the symbolall before produce an error. parseT’ is called. 62 The function takeschecks as argument a symbol, testsbefor equality The program in figurematch 3.16 only if the input is valid. which It can iteasily with the next ainput symbol. If theytheare equal, the following extended to construct syntax tree by letting parse functions return thesymbol sub-treesis read into variable Weparse. assume next is initialised to the first input symbol before for thethe parts of inputnext. that they

function parseR() = if next = ’c’ or next = ’$’ then (* do nothing *) else if next = ’b’ then parseT’ is called. match(’b’) ; parseR() The program in figure 3.16 only checks if the input is valid. It can easily be 3.12.2 Table-driven LL(1) parsing extended toelse construct areportError() syntax tree by letting the parse functions return the sub-trees for the parts ofparsing, input that parse. In table-driven LL(1) wethey encode the selection of productions into a table instead of in the program text. A simple non-recursive program uses this table and a stack to perform the parsing. 3.12.2 Table-driven LL(1) parsing The table is cross-indexed by nonterminal and terminal and contains for each such pair the production (if any) that iswe chosen for that nonterminal that ter-into a table In table-driven LL(1) parsing, encode the selection of when productions minal is the next input symbol. This decision is made just as for recursive descent

(Mogensen)

Figure 3.16: Recursive descent parser for grammar 3.

Syntax analysis

instead of in the program text. A simple non-recursive program uses this table and

152

Outline 1. Introduction 2. Context-free grammar 3. Top-down parsing 4. Bottom-up parsing Shift/reduce parsing LR parsers Operator precedence parsing Using ambiguous grammars 5. Conclusion and some practical considerations

Syntax analysis

153

Bottom-up parsing

A bottom-up parser creates the parse tree starting from the leaves towards the root It tries to convert the program into the start symbol Most common form of bottom-up parsing: shift-reduce parsing

Syntax analysis

154

Bottom-up parsing: example Bottum-up parsing of int + (int + int + int) One View of a Bottom-Up Parse

Grammar: S E E T T

! E$ ! T

! E + T !

S S → E$ E→T E→E+T T → int T → (E)

E T E

int

! (E )

E E

E

T int

T +

(

int

T +

int

T +

int

)

$

(Keith Schwarz)

Syntax analysis

155

Bottom-up parsing: example Bottum-up parsing of int + (int + int + int): Grammar:

int + (int + int + int)$ T + (int + int + int)$ S ! E$ E + (int + int + int)$ E ! T E + (T + int + int)$ E ! E + T E + (E + int + int)$ E + (E + T + int)$ T ! int E + (E + int)$ T ! (E ) E + (E + T )$ E + (E )$ E + T$ E$ S Top-down parsing is often done as a rightmost derivation in reverse (There is only one if the grammar is unambiguous). Syntax analysis

156

Terminology A Rightmost (canonical) derivation is a derivation where the rightmost nonterminal is replaced at each step. A rightmost ⇤ derivation from ↵ to is noted ↵ )rm .

A reduction transforms uwv to uAv if A ! w is a production ⇤

↵ is a right sentential form if S )rm ↵ with ↵ = x where x is a string of terminals. A handle of a right sentential form (= ↵ w ) is a production A ! and a position in where may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of : ⇤ S )rm ↵Aw )rm ↵ w I I

Syntax analysis

Informally, a handle is a production we can reverse without getting stuck. If the handle is A ! , we will also call the handle. 157

Handle: example Bottum-up parsing of int + (int + int + int) Grammar:

int + (int + int + int)$ T + (int + int + int)$ S ! E E + (int + int + int)$ E ! T E + (T + int + int)$ E ! E + T E + (E + int + int)$ E + (E + T + int)$ T ! int E + (E + int)$ T ! (E ) E + (E + T )$ E + (E )$ E + T$ E$ S The handle is in red in each right sentential form Syntax analysis

158

Finding the handles

Bottom-up parsing = finding the handle in the right sentential form obtained at each step This handle is unique as soon as the grammar is unambiguous (because in this case, the rightmost derivation is unique) Suppose that our current form is uvw and the handle is A ! v (getting uAw after reduction). w can not contain any nonterminals (otherwise we would have reduced a handle somewhere in w )

Syntax analysis

159

Shift/reduce parsing

Proposed model for a bottom-up parser: Split the input into two parts: I I

Left substring is our work area Right substring is the input we have not yet processed

All handles are reduced in the left substring Right substring consists only of terminals At each point, decide whether to: I I

Syntax analysis

Move a terminal across the split (shift) Reduce a handle (reduce)

160

Shift/reduce parsing: example

Grammar: E T F

!

E + T |T

!

( E )| id

!

T ⇤ F |F

Bottum-up parsing of id + id ⇤ id

Syntax analysis

Left substring $ $id $F $T $E $E + $E + id $E + F $E + T $E + T ⇤ $E + T ⇤ id $E + T ⇤ F $E + T $E

Right substring id + id ⇤ id$ +id ⇤ id$ +id ⇤ id$ +id ⇤ id$ +id ⇤ id$ id ⇤ id$ ⇤id$ ⇤id$ ⇤id$ id$ $ $ $ $

Action Shift Reduce Reduce Reduce Shift Shift Reduce Reduce Shift Shift Reduce Reduce Reduce Accept

by F ! id by T ! F by E ! T by F ! id by T ! F by F ! id by T ! T ⇤ F by E ! E + T

161

Shift/reduce parsing In the previous example, all the handles were to the far right end of the left area (not inside) This is convenient because we then never need to shift from the left to the right and thus could process the input from left-to-right in one pass. Is it the case for all grammars? Yes ! Sketch of proof: by induction on the number of reduces I I

Syntax analysis

After no reduce, the first reduction can be done at the right end of the left area After at least one reduce, the very right of the left area is a nonterminal (by induction hypothesis). This nonterminal must be part of the next reduction, since we are tracing a rightmost derivation backwards.

162

Shift/reduce parsing

Consequence: the left area can be represented by a stack (as all activities happen at its far right) Four possible actions of a shift-reduce parser: 1. 2. 3. 4.

Syntax analysis

Shift: push the next terminal onto the stack Reduce: Replace the handle on the stack by the nonterminal Accept: parsing is successfully completed Error: discover a syntax error and call an error recovery routine

163

Shift/reduce parsing There still remain two open questions: At each step: I I

How to choose between shift and reduce? If the decision is to reduce, which rules to choose (i.e., what is the handle)?

Ideally, we would like this choice to be deterministic given the stack and the next k input symbols (to avoid backtracking), with k typically small (to make parsing efficient) Like for top-down parsing, this is not possible for all grammars Possible conflicts: I I

Syntax analysis

shift/reduce conflict: it is not possible to decide between shifting or reducing reduce/reduce conflict: the parser can not decide which of several reductions to make

164

Shift/reduce parsing

We will see two main categories of shift-reduce parsers: LR-parsers I I

They cover a wide range of grammars Di↵erent variants from the most specific to the most general: SLR, LALR, LR

Weak precedence parsers I I I

Syntax analysis

They work only for a small class of grammars They are less efficient than LR-parsers They are simpler to implement

165

Outline 1. Introduction 2. Context-free grammar 3. Top-down parsing 4. Bottom-up parsing Shift/reduce parsing LR parsers Operator precedence parsing Using ambiguous grammars 5. Conclusion and some practical considerations

Syntax analysis

166

LR-parsers LR(k) parsing: Left-to-right, Rightmost derivation, k symbols lookahead. Advantages: I I I I

The most general non-backtracking shift-reduce parsing, yet as efficient as other less general techniques Can detect syntactic error as soon as possible (on a left-to-right scan of the input) Can recognize virtually all programming language constructs (that can be represented by context-free grammars) Grammars recognized by LR parsers is a proper superset of grammars recognized by predictive parsers (LL(k) ⇢ LR(k))

Drawbacks: I

More complex to implement than predictive (or operator precedence) parsers

Like table-driven predictive parsing, LR parsing is based on a parsing table. Syntax analysis

167

LR Parsing Algorithm

Structure of a LR parser

input a1

... ai

... an

$

stack Sm Xm

LR Parsing Algorithm

Sm-1

output

Xm-1 . .

Action Table

S1 X1 S0

Syntax analysis

Goto Table

terminals and $ s t a t e s

four different actions

non-terminal s t a t e s

each item is a state number

168

Structure of a LR parser A configuration of a LR parser is described by the status of its stack and the part of the input not analysed (shifted) yet: (s0 X1 s1 . . . Xm sm , ai ai+1 . . . an $) where Xi are (terminal or nonterminal) symbols, ai are terminal symbols, and si are state numbers (of a DFA) A configuration corresponds to the right sentential form X1 . . . Xm ai . . . an Analysis is based on two tables: I I

Syntax analysis

an action table that associates an action ACTION[s, a] to each state s and nonterminal a. a goto table that gives the next state GOTO[s, A] from state s after a reduction to a nonterminal A 169

Actions of a LR-parser Let us assume the parser is in configuration (s0 X1 s1 . . . Xm sm , ai ai+1 . . . an $) (initially, the state is (s0 , a1 a2 . . . an $), where a1 . . . an is the input word) ACTION[sm , ai ] can take four values: 1. Shift s: shifts the next input symbol and then the state s on the stack (s0 X1 s1 . . . Xm sm , ai ai+1 . . . an ) ! (s0 X1 s1 . . . Xm sm ai s, ai+1 . . . an ) 2. Reduce A ! (denoted by rn where n is a production number) I I

I

Pop 2| | (= r ) items from the stack Push A and s where s = GOTO[sm r , A] (s0 X1 s1 . . . Xm sm , ai ai+1 . . . an ) ! (s0 X1 s1 . . . Xm r sm r As, ai ai+1 . . . an ) Output the prediction A !

3. Accept: parsing is successfully completed 4. Error: parser detected an error (typically an empty entry in the action table). Syntax analysis

170

LR-parsing algorithm Create a stack with the start state s0 a = getnexttoken() while (True) s = pop() if (ACTION[s, a] = shift t) Push a and t onto the stack a = getnexttoken() elseif (ACTION[s, a] = reduce A ! ) Pop 2| | elements o↵ the stack Let state t now be the state on the top of the stack Push A onto the stack Push GOTO[t, A] onto the stack Output A ! elseif (ACTION[s, a] = accept) break // Parsing is over else call error-recovery routine

Syntax analysis

171

Example: parsing table for the expression grammar (SLR) Parsing Tables for Expression Grammar Action Table

1. E 2. E 3. T

1) 2) ! E +3)T ! T 4) ! T ⇤ 5) F 6)

4. T ! F

5. F ! (E ) 6. F ! id

E → E+T E→T T → T*F T→F F → (E) F → id

state

id

0

s5

+

*

(

Goto Table )

$

s4

1

s6

2

r2

s7

r2

r2

3

r4

r4

r4

r4

r6

r6

4

s4 r6

T

F

1

2

3

8

2

3

acc

s5

5

E

r6

6

s5

s4

7

s5

s4

9

3 10

8

s6

9

r1

s7

s11 r1

r1

10

r3

r3

r3

r3

11

r5

r5

r5

r5 34

Syntax analysis

172

of Athe(S)LR-Parser -- Example Example: LRActions parsing with expression grammar stack 0 0id5 0F3 0T2 0T2*7 0T2*7id5 0T2*7F10 0T2 0E1 0E1+6 0E1+6id5 0E1+6F3 0E1+6T9 0E1

Syntax analysis

input id*id+id$ *id+id$ *id+id$ *id+id$ id+id$ +id$ +id$ +id$ +id$ id$ $ $ $ $

action shift 5 reduce by F→id reduce by T→F shift 7 shift 5 reduce by F→id reduce by T→T*F reduce by E→T shift 6 shift 5 reduce by F→id reduce by T→F reduce by E→E+T accept

output F→id T→F

F→id T→T*F E→T

F→id T→F E→E+T

173

Constructing the parsing tables

There are several ways of building the parsing tables, among which: I I I I I

LR(0): no lookahead, works for only very few grammars SLR: the simplest one with one symbol lookahead. Works with less grammars than the next ones LR(1): very powerful but generate potentially very large tables LALR(1): tradeo↵ between the other approaches in terms of power and simplicity LR(k), k> 1: exploit more lookahead symbols

Main idea of all methods: build a DFA whose states keep track of where we are in the parsing

Syntax analysis

174

Parser generators

LALR(1) is used in most parser generators like Yacc/Bison We will nevertheless only see SLR in details: I I I I

Syntax analysis

It’s simpler. LALR(1) is only minorly more expressive. When a grammar is SLR, then the tables produced by SLR are identical to the ones produced by LALR(1). Understanding of SLR principles is sufficient to understand how to handle a grammar rejected by LALR(1) parser generators (see later).

175

LR(0) item An LR(0) item (or item for short) of a grammar G is a production of G with a dot at some position of the body. Example: A ! XYZ yields four items:

A ! .XYZ A ! X .YZ A ! XY .Z A ! XYZ .

(A ! ✏ generates one item A ! .) An item indicates how much of a production we have seen at a given point in the parsing process. I

A ! X .YZ means we have just seen on the input a string derivable from X (and we hope to get next YZ ).

Each state of the SLR parser will correspond to a set of LR(0) items A particular collection of sets of LR(0) items (the canonical LR(0) collection) is the basis for constructing SLR parsers Syntax analysis

176

Construction of the canonical LR(0) collection

The grammar G is first augmented into a grammar G 0 with a new start symbol S 0 and a production S 0 ! S where S is the start symbol of G We need to define two functions: I I

Closure(I ): extends the set of items I when some of them have a dot to the left of a nonterminal Goto(I , X ): moves the dot past the symbol X in all items in I

These two functions will help define a DFA: I I

Syntax analysis

whose states are (closed) sets of items whose transitions (on terminal and nonterminal symbols) are defined by the Goto function

177

Closure Closure(I ) repeat for any item A ! ↵.X in I for any production X ! I = I [ {X ! . } until I does not change return I Example: E0 ! E E !E +T E !T T !T ⇤F T !F F ! (E ) F ! id Syntax analysis

Closure({E 0 ! .E })

=

{E 0 ! .E ,

E ! .E + T E ! .T

T ! .T ⇤ F T ! .F

F ! .(E )

F ! . id } 178

Goto Goto(I , X ) Set J to the empty set for any itemSA ! ↵.X in I J = J {A ! ↵X . } return closure(J) Example: E0 ! E E !E +T E !T T !T ⇤F T !F F ! (E ) F ! id

Syntax analysis

I0

=

{E 0 ! .E ,

E ! .E + T E ! .T

T ! .T ⇤ F T ! .F

F ! .(E )

goto(I0 , E ) = {E 0 ! E ., E ! E . + T } goto(I0 , T ) = {E ! T ., T ! T . ⇤ F } goto(I0 , F ) = {T ! F .} goto(I0 ,0 (0 ) = Closure({F ! (.E )}) = {F ! (.E )} [ (I0 \ {E 0 ! E }) goto(I0 , id) = {F ! id.}

F ! . id } 179

Construction of the canonical collection C = {closure({S 0 ! .S})} repeat for each item set I in C for each item A ! ↵.X in I C = C [ Goto(I , X ) until C did not change in this iteration return C Collect all sets of items reachable from the initial state by one or several applications of goto. Item sets in C are the states of a DFA, goto is its transition function

Syntax analysis

180

T ! .T ⇤ F I : E ! E + .T F ! .(E ) T6Parsing T! ! .F .T ⇤ F T ! .F T (SLR) Tables !T .F ! .T ⇤ F for E T0 ! ! .(E .F ) F ! . id F ! .(E ) F ! .F F !T .(E ) I0 : E ! .E , Action T F! ! ..(E ) F ! . id F id I9 : E !1)E E+→TE+T . F ! IE6 :! E state.(Eid) + * .E ! + TE + .T F ! . id F ! . id 0 F ! . id I6 :I9 E: E !!EET++! .T ⇤ FT s of a LR-parser I1 : for ! . expression 0 s5 TT . E. → Example: parsing table the EE 0! .TE ! 2) T .T ⇤EF+ .Tgrammar I : E ! E + .T 0 6 I : E ! E . I : E ! E + T . I : E ! E + T . 1 s6 I : E ! 1 1. E ! ET.⇤ +⇤! T E! ! .T E. + 9 ⇤.T I0 : E ! .E , II6 ! : .T ! ET I6 : E ! E + .TT T F :E F .9 TT ! F ⇤6 FT 3) T+→ T*F T ! .T ⇤ F10 t us assume parser !2 .T⇤. ⇤FFr2 s7 E! !T E ..! + T.F T ! .T ⇤ F E the ! .E + Tis in configuration T ! .T I⇤2 F !T T : T E T .T .F I11! :T F! (E ).FFT table T ! ⇤! .T ⇤→ 2. TT T: E ! .F 10 4)F T !IExample: .F parsing 3 for r4 T ! .F I10 : T ! T ⇤ Fthe . r4 ex (SLR) Parsing Tables for Expression Grammar F ! .(E ) I : E ! T . 2 F E ! .T T ! .F T ! T . ⇤ F ! .(E ) T ! .F I : F ! (E ). F ! .(E ) I : E ! E + .T 11 I : T ! T ⇤+ F.T. 4 E(E s5). (s X s . . . Xm sm , ai ai+1 . . . aT F+! .(ET)⇤ 5)F F10→ (E)I11 n $) 6 : : EF ! ! .(E ) T.F! !F Tid.! . ⇤ Action F6F !Table IGoto ! E3. 6 : E T ! .T0 ⇤1 F1 F ! .(EI)30 : FT Table ! F ! .(E ) F ! . id E 0! !.F.E , .Fid 5 r6 r6 T ! .T ⇤ F T ! .T ⇤ F ! . id ). F ⇤! T !F .T F . id 6) IF11→:id F ! (E accept 0state T ! Fid..is+ ! .F + (! .F idIII143 ::: IaEF ! E 6 .F s5 → .E+T TE .E T Parsing ! E4.+ T T . !FF ! . id (SLR) nitially, the T state is (s0 , a1 a2 . .1). FanE! $), where . .E a(.E Table EE *input + T . )T . $ I9 E1: ETT2 ! 1 .:! n ! 9E I9) the :T ! + ! .F 0 s5 s4 3 F ! .(E ) I4 T: .EE F! ! E.E (.E )T I9 ! : TE. ⇤! +!TE. + T . .+ I9 : 2)E E! →E T + FT !. ⇤.(E ) F !7 .(Es5) T .T ! + TT .! ord) :EE(E F+ 5. )FIF9 ! ) F . id F ! .(E 1 T I! s6: T acc.T !E E+ FI !: 1.. E idE ! E! ! .E T E F⇤! + T.T F !8 . id s6 !E TT! T .I2⇤ :F EE . I6+ T !T .T ⇤F I : FT ! !.T Tid⇤! F . TT. ⇤ → T*F ! .T !FT . ⇤ F CTION[sm ,60ai ] can take four 3)values: 2 s7 ! T ⇤ r2 F .r2 I10: E 10Ir2: :T ! E + .F 6 E ! E + .T T ! .T ⇤ F 1) EI9→ E! ! 6. ).F ! id I1 : E !2.E . ! IT : T.T T F . : E+T E !9 E +state Tr1. ids7 + :Fr4 ! E⇤ ! E + T . I : F ! (E 10 . I⇤⇤! F6F:I9 T E ! T .F I : T ! T ⇤ F . 4) T → F T ! .T 11 T .T ⇤ F 10 3 r4 r4 r4 1. Shift s: shifts the next input symbol and then the state11s on the (E ). ! F .T ⇤2)F .EI6→: TET ! 10 I : I !: .T TT⇤⇤I.F 0Fr3 s5r3 F : TT⇤! 10 ET + T! ! .T ⇤ F (E ! . ⇤.T E ! E . +IT : IE T ! Tgrammar ..T ⇤ F⇤ 7F8 T10 T ! T ! .F F ! .(E ) : F ! (E ). I : F ! ). I : T F . T .F 5) F → (E) 4 s5 s4 2 3 ! E + .T 11 3 11 Example: parsing table for the expression TmTsm! 11 r5 s6 stack (s0 X1 s1 . .3. .X ,6ai a.F . . .Fan ) ! (s0 X1 s1 . . . Xm ai s, ai+1 . . . an ) i+1 ⇤ ! T 1 r5 1. : (E EF! T !: .(E .FF)I! IF11! ).!E(E+).3)T TI10→: T*F 11 TT! T! !.T T ⇤⇤ F F. I2 : AE!! T . T ! ⇤) ..Fid))number) 5!(.E r6I10F r6: T r6T.F r6F . F ! .(E ! I : F ! .(E 6) F → id 2. Reduce (denoted by rn where n is a production T ! .T ⇤ F 4 r2 . id) F9 !3 .(E ! ) I11 : TF ! !.F (E ).2 F ! 4.TFr.T !.(E Ffrom Fs4! (E ). 6!.E s5 F 2. E ! T I Pop T F ! ..(E E 0! Eid+ . ) TI11F: F! 4) T → F 2| ! | (= )⇤items the.Fstack I1 : E T ! 3 r4 !. id .(E ) I7 E: + F .T !10.(Eid.) F ! .(E ) ! .s id 7Tables s4 I6 : E !Grammar F ! .Es5id (SLR) Parsing for Expression II3Push : TA! FF and s.Fwhere = GOTO[s , A] E . + T E ! .T m r I : F ! id. 5) F → (E) 4 s5 5. ! (E ) 5 F ! .(E ) E⇤ ! E . ⇤+.F F3. T ! T ⇤ F F ! . id I : T T F ! . id 8 s6 s11 7 T ! .T F I : E ! E + T . I : E ! E + .T F! ! id. . :. . (.E XE , ai aE . . aT ! m s) m! n) . i+1 .+ I4(s:06XF1Is91! .T F s7 TableI6 r1: Er1 ! r6 II25 :: TE ! T . 9⇤Action E Table + .T .(ESyntax Goto 6)Syntax FI9→ id : analysis E ! E + T5 . F a! . id analysis 9 r1 ) !T .F (s0 XE . . 6. . .E XmF+ T ! Fid 1 s1! r s! m i ai+1 . . . an ) E!! E. Tr3⇤+F . TF⇤ ! 4. T ! F T⇤r As, Tstate .F . +⇤Ir39F *:T 6 s5 T0! ! )T F !E .T F T .T ! .! ⇤ T ! T. ⇤ F EE→+ E+T 10 r3 ( r3 $T F) Syntax ! . idanalysis I Output the prediction !idT .E , I9 : TE !F T .I0 : E A 1) F ! .(E T ! .F I6 :T ! E .T s4 E 3 F 0! .(E E ! .T ! TTr5+ 11 r5T !s5 F I. )10 ! ⇤. ⇤ FI18F ..F2Syntax ! I10 : T ! T ⇤ 7F . s5 E analysis ! .E +r5T: r5T T I! !T FT→. .T⇤ F I3 : FET1! T2)T⇤ !completed 3. Accept: parsing isT successfully ! !!Eacc +⇤.T .TF F .: idF ! (E .) 5. F ! (E ) F1.:! .(E s6 E id s6)I I6 :: ET T10 .T ⇤!F)E + I11 : F ! (E ). 8 Syntax analysis Ian F! !.entry (.E F).! .(E ) ! E. + F 3) T → T*F ! T ⇤ F . 4 : empty I : F ! (E E E .T 10 11 4. Error: parser detected an error (typically in the action 2 r2 s7 r2 r2 I : F ! (E ). T ! .F I : T ! T ⇤ F . 9 r1 T ! .T ⇤ F 10 E + .T F id I9 : E ! T .E + .T6. F ! id T11 ! I5 : analysis FE ! id. + T I2. :.F ! 6 : .EE 6! EE.! T 4) T → F Syntax F )I! id+! T3 ! ! .E .T r4⇤I11 F r4 : TFF! table). r4 r4 ! (E ). ! .(E 10 r3 .F I : F ! (E ). Syntax analysis Syntax analysis 11 T .(ETE ) ! T ! T . ⇤ F T ! .T ⇤ F I9 F: ! E ! + .T. 5)⇤ FF → (E) E ! .T Syntax analysis s4 F I! T4 !s5.F E) !8 E 2+ T3 . . id 11 r5 9 :.(E 3. .! T !. ⇤TF⇤ F F ! F ! idTT is 78 ! .F T T ! .T ⇤ F I : T ! T ⇤ F . ! .F 6) F → id 5 r6 r6 r6 r6 10 F ! .(E ) I9 :FE! !. EidT+ ! T. T. ⇤ F Syntax analysis 88 I5 :I10F: ! 4. T 6 !s5.F s4 I11 : F ! (E 9 ). 3 .(E ) F ! T Tid. !F! T!⇤F.(E F. ) Syntax analysis F ! . id Syntax !E:T+ .T⇤TF! I10 ⇤ 10F . Syntax analysis ET! . T Syntax analysis s4 F7 !s5.(E ) I9 : analysis Syntax analysis 80 F ! . id !). I11 : 5. F ! (E FF! (E. )id : T !: T F . (E I1 : E80 ! E .s6 I10 T Is11 ). ! .F ⇤⇤Syntax F! 11 T analysis F ! . id I9 : E ! E + T . Syntax I9 : E ! E + T . : F ! r1(E ). E9 !analysis E. + r1 Ts7I11 6. F ! id : T r1! T ⇤ F . Syntax analysis I : F ! id. I10 T ! T. ⇤ F T ! T. ⇤ F r3 r3 r3 I25 : E10 ! T . r3 I11 : F ! (E ). Syntax analysis 88 r5 r5 I T ⇤ analysis F. I10 : T ! T ⇤ F . T11 ! T . r5⇤ F r5 10 : T !Syntax I11 : F ! (E ). I11 : F ! (E ). I3 : T ! F . Syntax analysis Syntax analysis I4 : analysis F ! (.E ) Syntax E ! .E + T E ! .T Syntax analysis 88

Example

Example Example Example

Example

Example Example Example Example

Example Example

Example

Example Example

Example

Example Example

Example

Example

Example

34

34

Syntax analysis Syntax analysis

Syntax analysis

181

Constructing the LR(0) parsing table 1. Construct C = {I0 , I1 , . . . , In }, the collection of sets of LR(0) items for G 0 (the augmented grammar) 2. State i of the parser is derived from Ii . Actions for state i are as follows: 2.1 If A ! ↵.a is in Ii and goto(Ii , a) = Ij , then ACTION[i, a] = Shift j 2.2 If A ! ↵. is in Ii , then set ACTION[i, a] = Reduce A ! ↵ for all terminals a. 2.3 If S 0 ! S. is in Ii , then set ACTION[i, $] = Accept

3. If goto(Ii , X ) = Ij , then GOTO[i, X ] = j.

4. All entries not defined by rules (2) and (3) are made “error” 5. The initial state s0 is the set of items containing S 0 ! .S ) LR(0) because the chosen action (shift or reduce) only depends on the current state (but the choice of the next state still depends on the token) Syntax analysis

182

S . ( THREE. L) CHAPTER PARSING ( S

x

.x

3

(

3.3.SLR PARSING (.L) PARSING Example of a CHAPTER LR(0)THREE. grammar

L

4

S'

S.$

S

.x

S

S

.(L) .x

.S

S ! → S$ 1 L . L , S2 S x S' . S3$( L → SS S . (xL. ) S → S$ 0 S . ( 4L ) L → SL , S x. x 3 S →(L) 1 L (→ S 3 4. x S S (.L) S → 21 → x( L ) S' L → LL S, S . S 4 S.$ LS → L . L , S7 →Sx 23 S ( L S S ..( L ) GRAMMAR L → L3.20. , S 4 0

TER THREE. PARSING !

GRAMMAR 3.20.

S S

L

L,S.

, Lx (

8 5 L,.S (L.)S .(L) .x L.,S

L S S SL ,

L

) 5

SS L

( L (. )L L.,S )

9

L

L,S.

6

).

7 6 Rather than rescan the stack for each token, the parser can remember inS .stack element. S ( L ) . the parsing algorithm stead the state forL for each Rather thanreached rescanstates the stack for each token,Then the parser can remember inFIGURE 3.21. LR(0) Grammar 3.20. isstead the state reached for each stack element. Then the parsing algorithm ther than rescan the stack forLook each up token, the parser in- to get action; top stack state,can andremember input symbol, is FIGURE (element. ) Then xLR(0) , parsing S 3.20.L 3.21. states for$Grammar the state reached for each stack the algorithm IfLook action is 1 s3 up top stack s2 state, and input g4symbol, to get action; 2 If Shift(n): r2 r2 Advance r2 r2 input r2one token; push n on stack. action is ok up top stack state, and input symbol,( to get) action; x , $ S L Pop as many times as number of 3 Reduce(k): s31 s2 stack g7 g5the s3 s2 g4token; Shift(n): Advance input one push n on stack. action is 4 ar2 the right-hand side of rule k; 2 r2 r2 r2 r2 on symbols Reduce(k): Pop stack as many times as the number of 3 s3 s2 s8 g7 g5 Shift(n): Advance input one push 5 token; s6 n on stack. Let X symbols be the left-hand-side symbol of of rulerule k; k; 4 a the right-hand on side 6 r1 r1 r1 r1 r1 Reduce(k): Pop stack as many times as the number of s8 5 s6 the state In now on top of stack, look up X to k; get “goto n”; 7 r36 r3 r3 r3 r3 ber1k; the left-hand-side symbol of rule r1 r1Let r1 symbols on the right-hand side ofr1Xrule 8 s37 g9 Push n onr3top of r3 r3s2 r3 r3 stack. In therule state now ong9top of stack, look up X to get “goto n”; Let X be the left-hand-side symbol 9 Accept: r48 r4 r4 ofs2 r4 k; r4 s3 Stop parsing, report success. r4 r4 r4totop r4of Pushr4upn Xon stack. In the state now on top9of stack, look get “goto n”; Error: Stop parsing, report failure. 3.22. LR(0) parsing table for Grammar Push n on TABLE top of stack. (Appel) Accept: Stop parsing, report success.3.20. TABLE 3.22. LR(0) parsing table for Grammar 3.20. Accept: Stop parsing, report Error: success. Stop parsing, report failure. LR(0)failure. PARSER GENERATION Error: Stop parsing, report Syntax analysis 183

T ! .T 0⇤ F

Example Example Example of a

I6 : E ! E + .T

E Example 0 : .T F: ! .(E T I! ! ⇤!F .E , TExample !I6 .F (SLR) Tables TParsing !T .F 0 ! .E , E ! E)+Example .T ! .T ⇤ F for Expre I0 : .F E ! .E +T ExampleTFT0 ! Example Example Example ! .(E .F Example: the F parsing ! . .T id.T .(E ) +! I6 ⇤table :F ! E .F + .Texpression Tfor ! T FE! I6F: ! E ! E E,)! ! .T .E + T .(E ) I0 : E !Example .E Action Table E F! ! ..(E ) grammar non LR(0) FI9! F id! Eid! .T ! F ! .(E⇤)F T1)! .F→TE+T T: .! .T ⇤EFE+ E .T .T IE6Example :! E .EExample +! T !TE .T + ⇤ F.T F ! . id I : E ! E + .T F 0 ! . id state

id

+

*

(

)

F !0 . Tables id I6 :I9 E: E !! .T F T++ ! T .(SLR) ⇤ )F T6Parsing ctions of a LR-parser I1 : for ! . expression T! ! .T .T Example ⇤⇤F F s4 TEE ! .F T! . E.(E Example: parsing table the grammar ! .F EE 0! .TET ! 2) → T Example T .F ! .Ts5⇤ F for Expres T I :Example E ! E + .T

6 I1 .T : T E! !.T E TT.s6. ETE !! E.F+ 1+ :FT.(E E ! E + .T T ! .F 1. E ! ET.⇤ + T F⇤! ! E ! EE .0I.+ 9. id F .(E ) IT I0 : E 0 ! .E , II6 ! : .T E ! E F I6 : E ! E + T F :! ⇤:.T FEI.9F:! TT ! F ⇤6! ! .(E ) 3) T+→ T*F ! .E ,) Action Table 0 :T T ! .T ⇤ F10 ! Let us assume parser 2 T . ⇤ Fr2 s7 r2 TFT ! E I! ! + !T..F .(E ) .T ⇤ F T ! F! id! ! I:9! E⇤ +→ TE+T .! E the ! .E + Tis in configuration ! .(E T ! .T I⇤2 F . for ⇤ idF) the : T E T T .T .F I11! F.! ! (E ). IEE6FF.. :! E ! state + * ( ) T .TE⇤→ .Eid + TE + .T 2. T:T T: E ! .F 10 1) EF 4)F FT T !IExample: .F F ! . parsing table 3 id r4 F ! . id 0 T ! .F I : T ! T ⇤ Fr4. r4 expres (SLR) Parsing Tables for Expression Grammar F ! .(E ) F ! . id I : E ! T . 10 I : E ! E + .T ⇤ FT ActionsE of I! E! . expression ! a.TLR-parser 0 s5 s4 : E ! ET).+! TT . E. → Tparsing ! .F 2. . table TEE.)0! ⇤! FT 1 :.(E 6.(E:I)9 F FT$) ! Example: for the grammar T ! .F .T I ! (E 2) .T ⇤ F F ! I : E ! E + .T 11 I : T ! T ⇤ F . 5) F → (E) 4 s5 s4 (s0 X s . . . X s , a a . a EFE! 6F F+E! )ET 10I9+ :.T EE ! +! .T 1 01 m m i i+1 n I! 6E I11 ! (E ). !T.(E ) E + .TI6 : IE 6 :! :. T E! E.id 3. T ⇤T.⇤ F+ E + T.T I.9: ::! !1E E+ + T.s6 . I.+ E ! 1 .T ⇤! F..T .F 1. E ! E T E E I.(E :T.T ! E T !I.T I6 .(E : EI)3! ET T ! ⇤! F 0! I610F :! T ⇤ T*F F 0 : ⇤EF ! .E , F ! T ! F ⇤6 T:FTable F id Action GotoFTable ! FT .. ! .(E ) 3) T→ ! . id E+ !F .E ,! 5 2 .T ⇤ Fr6 r6 r6 ! .T ⇤ F T ! 0: T T ! .T ⇤ F T ! .F Let us assume the parser is in configuration 6) F → id F ! . id r2 s7 r2 T ! T . ⇤ F E ! E . + T T ! .T$ ⇤ F E T IF11 :).FFT ! (E FI10⇤! .TI11id !F .T accept E ! .E +1)T 0state ⇤2 F ! T .). ⇤F I! : (.E E. + T .* T (! T T ! FT ! ! .F id + .F ) :! F! (E F ! idT .T ⇤ 2. E ! TT I(SLR) ::: IaE.T E TF:. ! ! .F T F! 6 3 .F s4 for → .E+T 4) →(SLR) F I table F .⇤! id.T ! .F .F II143 ! F ! )! E TE T Parsing ! I9 : ETGrammar !T E4. +Example: T (initially, the T state is (s0 , a1 a.T where is the input Example parsing r4 r4 Tables + T .! 2 . . . anE$), 1 .:Parsing nE.! T .F ! for Ts5 ⇤ Fthe . r4 expres for Expression 9.F F .(E )T I0.2. E :a.E T I! :Tables E.! E + . 10 : T 9) T2 ! T ! T ! T .)! ⇤FFIs4 F .(E s5 1 I 3 .F T ! .F F ! .(E )E ! (s I : F ! (E ). I : F ! (.E 7 s5 s4 : E ! E + T . F ! .(E ) : E ! E + .T E ! E . + T 11 I : T ! T ⇤ F . 4 I : E ! E + T . 5) F → (E) 4 s5 s4 X s . . . X s , a a . . . a $) 2) E → T ! .(E ) F ! .(E ) I : E ! E + .T 9 T ! T . ⇤ F 6 F ! .(E ) E ! .T .ETn! TT 90 1 1 m m i i+1 F ) word) I9+! :T.FE! T . 6 : F ! (E ). TE ⇤ F+10 ! T ..! ⇤ Action F.T IF ! )E T .! ⇤ Table F.(E 5. F3. (E! 6 :!E F Fid .(E F ! .(E 1)3 :TF s6:F acc.T ! .F idE I60+ E Goto !!E E.T+ I! T ! .! F) !6).(EF )→ id 11 FT!! s11 FI6!: 1.. E idET FTable ! E ! .E , F⇤! 5. .T r6 ! E! .E + T 0 T ! +⇤3)T .T id ! .Tid.+ TT! T .I2⇤ :F EE F ! id .T I⇤10F: T ! . 0Istate T ! .T ⇤ F T ⇤. ! F .⇤! → T*F ! .T ).⇤ Fs6r6 r6 Action T T Fid idFT . ⇤I11F : F ! 89(E ! FT. ⇤ accept ! T ! FidT ACTION[s take values: T ! four .F :a(.E ! .r2 + T* ⇤ (! ) I $: E E !T F .T 213 ::: E s7 r2 id+ m , 0ai ] can 10 ! E ..! 4. T 6 .F s5state E! →I .E+T .F ids7 +s4 for *r1 E TF .F II! ! )E E .E + TE ! ET.E ! EF +! T .! state is (s:4) .1) . Fan⇤ $), where a9F .Ir2F .6F is the input 6 : IF E ! E .T Parsing T ! .T ⇤ F E + T .+ 9F:! EidI! 0, a 1 a→ 2 .F 1. .:⇤ n:T 6. idFF ! . id1) EI(SLR) I1 :(initially, E !2.the E . ! :I4 :TI.T T ! ⇤ F . : E+T ET! E + Tr1. Tables I : E ! E + T . I : ! E + T . I ! (E ). 10 T E ! T 9→ .F I T ! T F . 9 T ! .F T T ! .T ⇤ 11 9 T .T ⇤ F 10 0 s5 s4 1 2 3 F ! .(E ) 3 r4 r4 r4 r4 (E ). FI! ! (.E )! I:9.T :T⇤⇤TE ! E + T . 1. Shift s:word) shifts the next input symbol the state s :on the I ! T ⇤ F . 11 E.E .F+ T 4T 107 .(Es5) 0 r3 s5r3 s4 r3 I9 and : 2)E Ethen ! +! .EE I : T ! T ⇤ F . I)7 : T T .F →E T FT !. ⇤.(E F ! T ! . ⇤ F ! F .T 10 ! + T 10 I : E ! E + T . I : E ! E + .T T .T ⇤ F T ! T . ⇤ F E ! E . +F T!1..IE T ! F 5. F9 ! (E ) 2) E6 → T T ! T ..T F⇤+ ! T ..F ⇤! F⇤ E F T ! F T ! 1 T(E s6 acc F ! .(E ) I : E E ! E + T : F ! (E ). I F ! ). I : T ! F . Action T .F 5) F → (E) 4: s5 s4 8 .T 2 3 ! .(E ) 8 . id s6 s11 6 id ! E + .T 11 3 11 parsing table for the expression grammar E ! .E + T 6 T ! .F 11 r5 r5 r5 T I : E ! E + .T F ! stack Example: (s0 X1 s1 . .3. .X s , a a . . . a ) ! (s X s . . . X a s, a . . . a ) F ! . id T ! T . ⇤ F 0 1T 1→ T*F m i+1 ⇤ F n Im2! :i .F EE ! . In ⇤r2F F1011! Tm! 1 Ti+1 !T .T 6 i] T IT : F ! ⇤! F(E ! .T . ⇤! F+ 1. E. FT! E(E ! .F IT! :.F :T.(E F).ET ). 11 T! T)..3)T ⇤ FTI10→: T*F TT! F can take four 3) values:F T ! T ⇤r2 .F .r2 2 r6I r6 s7 r6 T! !9.T T ⇤⇤state Fr1. ids7 s6 ! id I2 :ACTION[s . mE, 0ai! :⇤10 T:I :T ! T ⇤) F II.T ! + + *r1 .T id)! 5 (.E r6..T 10 6 :: E ! .(E ) I E ! E + T ! ⇤ F 1) E → E+T T ! .F E ! .T I : F 6. F ! id : E . I : T ! ⇤ F F ! .(E 6) F → id 2. Reduce AE!! IT by rn where n is a production number) 6 T ! .T ⇤ F I : E ! E + T . 4 1(denoted : E ! E + T . I F ! (E ). 10 T . F 2. E ! T 9 .F 2 r2 s7 F ! . id I : T ! T ⇤ F . 4) T → F T ! .T ⇤ F 11 9 T ! .T ⇤ F .(E ).T 10 symbol and then the F ! .(E ) input 3 state r4 r4 TF ! I:11sr4FExample :on Fr4the ! (E ). I : ! (E ). 1. Shift s:.T⇤ shifts the next I : T ! T ⇤ F . 10.F r3 s5r3 r3 I : T ! T ⇤ F . I : T ⇤ .F 0 F ! .(E ) 0 11 T ! ⇤ F 10 4. ! F I ! (E ). 6 s5 s4 9 3 7 10 T ! T F 11 2. E ! T I6 → :F TE T ! .(E .T F (E T! ! ET + . ⇤.T F !E T :.Fstack I Pop 2| | (= r ) E F . 11 id+ I1Ftable :+E(E) E ! E .! 4) 2)T E → ! T)grammar ..T F: ⇤FF8 ! 2.(Eid3.) T !.F items from TT F ! )⇤F! T IE :5) ! (E F I! :1 .E T .F .(E → 4: TF s5.F! s4! id Ithe .T 11. .! 3). I⇤7 E Example: F !11.(E ) 31 r5 r5 s6r4 r4 T.mT+ ! r5 stack F (s0 X! .3. . id X s! ,6parsing ai a.F .FanFE )! (sF sfor .I..T .id Xthe . .! aIn6.) ). :! E ! + .T10 0 X! m ai s, aexpression 1 s1 . . m i+1 ⇤ i+1 . s4 F ! ! T 1. : (E EF! T !: .(E .FF)I! 71Tables s5 ).!E(E+).3)TF T→ 11 (SLR) Parsing for Grammar II3Push T I →(E) : TT! !.T T ⇤⇤ FF . : 2.TAReduce ! and sT =.)GOTO[s ,IA] E EF .! + IF T ! ⇤ F . E !I11 ..Fid))Expression 5!T r6I10F r6: T r6T.F 2 s:.Fwhere F id. F ! .(E 5. ! Iis! : .T (.E 5.T ! 6)r where F :→⇤E id AE! !!(E (denoted by nF! F ! .(E T)mrn . ⇤+ F3.) T ! T ⇤ 5) FT*F ! . .F id 4 s5 r2 s7 . .F id I.7)r6.T : T⇤ FF E T! F I10 F ! id.(E 84 a production s6: Enumber) s11.+ .(E F.+ .(E )! T ! I. 90⇤! ! E T E E .T : TF ! ! (E ).52 I : F ! id. F ! .(E ) . :.I.! XE ,+ a! . ⇤.! aT ! 11 4. T ! F I : F (E ). 6! s5 s4 ! 9 3 T . F m s) mT n) . i aE i+1 5 ! 11 I4(s:I06XF1:Is91! (.E T ! .T F 2. E ! T r6 r6 I : E ! T F ! . id I : E E . 4) T → F I : E ! E + .T E .E + T Action Table Goto Table Pop 2| | (=Fr )! items from the2.Fstack 91 6)Syntax FI9→ id T ! r4 r4 F! ! .(E ) IF7 E:!+ : analysis EF! +)T3 . . id F6Expression . :! id r1 s7 r1 ).(Eid.) analysis F .(E !10Syntax !E.(E Ir1 E.F .T 7Tables s4. T (s0 XE . .I6. .I.E X s! awhere ai+1 ..s.id . anGOTO[s )(SLR)TParsing T ! .T ⇤and Fid F.! ! .E*s5:id 1 s1! m rT m r As, i! ! T .! for T ⇤6+ F :F+ FF A s..F A] E .r3+E T TT !! F T ⇤ 5)F F →T(E) T! .F 64 s5 3Push EF Tr0,! T ⇤I9F.T (! ) TE $T E .T TGrammar F E ! E .4. I! id. s5 F T ! T F 5. ! (E+ )! 5 :id F=E+T )mstate 1) E→ r3s11! F3. F! !T. .id⇤ F IF.7⇤.T :! T⇤. Fidanalysis T ⇤+.F F! ! .+ id I Output the prediction .E ,+! 8 r3 s6: E r3 Syntax I E ! .I.(E T ! A ! I.69⇤:T E! T 0!: E 10 I6.F Em⇤ .T F .(E II25!::s5 ! id. . ::. .! XE ,+ ai aEE . .T aT T I.T ! m s) n) . i+1 .+ ! E .T E ! F1:Is991! (.E 75 . s5 r6 r6 TF ! FE s4 1E 2 )E3Table FT0! .(E EI. )10 ! TIr5 E ! Ir5+ :⇤ ! + .T ! T . F 4(s:0 X Action Table Goto 6 11 r5 .T r5T I : ! F 6) F → id Syntax analysis : T ! ⇤ F . I : E E + T F ! . id T ! .F I : T ! T ⇤ F 2) E → T 3 Syntax analysis 9 r1 s7 r1 r1 Syntax analysis F ! .(E ) 9 10 :F ! + T F! (EF) ! I! ! T ⇤ F⇤T .Fid (s .E . 6. . .E X s! . . . aE ) ! .E ! .T T . .⇤a⇤i aFi+1 0X 1! mF r! m r As, 3. Accept: parsing isT successfully completed EE! ! E T.F . T ⇤(EFF .) 5. 4. T ⇤.+ F ! idF I.F :. +⇤Ir3E9FT*:! +T ! .T ⇤.T F.I8T acc T! E1 sF + T T ! 6 s5 F1.I:! .(E )T s6 E + T T0s6 state id FnFT1! .(.E id ! .T T10 .T ⇤! T ! F ! (E T .). ⇤ 8F 1)! EIE E+T 10 r3 ( F) r3! r3.(E I11 : FT ! Syntax analysis F ESyntax ! . id ! )! E ! .E , action ).(E I9T(typically : TE 3).F the! prediction Iin :F TT! ! T). ⇤ET$T ..TEE1F 4→:+ 0 :entry :6T (E ! .+ Fanalysis E .I! .T FF 10 T 4. Error: parser detected an error empty the IIr56r2! :T ! + r2 s7 r2! s5 s4.T 2) 3 ::.FOutput FEIid ! )..T FTI011 ! .(E ! .T I! :E(E T !→ TAT*F ⇤2)!Syntax FIE5.an .! ⇤ TT:! ⇤r5I! 11 r5 r5 10 I! ! F F I. )10 ! T ⇤.E F+ ! .F : E.F E .E ! TI11 ! E +analysis .T6. → 3 :id. Syntax F! id(E ) I10 : T ! T ⇤978F . s5 s6r1 s7 : . T⇤analysis FEF2! 9T I2. + 6F :T I8.! :TidF (E .) 5. E1 r4 ! .E + F ! : T ! T ⇤ F . .E + T 6! T ! T EE.! T parsing is successfully completed F ! id 4) T → F 10 F ! . I : E ! E + .T T ! .T ⇤ F s6 acc T ! .T ⇤ F table). 3. Accept: F ! .(E ) 1. E ! E + T 3 r4 r4 r4 I : F ! (E ). 6 FF ! .(.E id )ITF! !: .TF⇤! F (E ). .(E )Syntax r3 r3 I11 : F ! (E ).10 .F analysis Syntax analysis Ian : empty !11 F). ! .(E 11 ! .(Eparser )TII! T → T*F :! T ! T ⇤analysis . E).T!⇤EF. + F :s7the Faction ! (E Tr2! ! . ⇤F! F T I94.F: Error: E ! + T . 5).⇤id E2 ! .TI11 10 TE F detected error (typically entry in ESyntax r2 s4 r28 E :.T ! .F I! :(E T3))..T ! T ⇤T F .4 : s5.T Syntax analysis r1 .T ⇤T! Fan → (E) 4 ! 3+ Syntax analysis 10 F ! .F :.(E E .! I9) :! E T .E + .T6. F ! id T11 ! : .ET E F I! .T! id F id. 119 r5 s7r5 9 T I2. :.F EF! 6F2+ 3. .! T ! ⇤EF E3! ! + 6! TE + F )IT ! id 4) T → F T I5! F table). ! idTT T !F.E .T ⇤IF FTr4! tax analysis r4 r4 F:Fr6! ! (E ). ⇤! T .! ⇤T F ! .(E .T ⇤ I:r610 T! ! Fanalysis ..F .F 11 10 r3 r3 r6 r6 r4 T78 .F :.T.id F⇤! (E ).F5 ! .(E Syntax analysis Syntax 11 .(ET ) I! 6) F→ T ! T . ⇤ F ) T ! .T ⇤ F I9 F: ! ESyntax ! E + T T T . ⇤ F F I : E ! E + T . E ! .T 9 F ! . id Syntax analysis analysis 88 r5 r5 s4 I! T4 !s5.F E) (E ! E32+.(E T3 .) I5 :I10F: ! . id 11 9F :.(E 4. T F.(E 3. T ⇤ 5) F F → (E)F s4 I11F:F! F98). ! T6 ! !s5..F ! ! Tid. !F! ⇤ F.! .T)T!! FT ! id Syntax analysis Syntax analysis 78 id5 ! T . ⇤TF !E:T+ T ! .T r6⇤EFT Ir6.10 :F T ..F analysis .F 6) F → id r6I10 r6. TT! T⇤T F. .⇤⇤! F I.(E Syntax ! T10 FFSyntax I9 : F E! ! T .⇤ s4 ! 9 : )analysis . E! idT+ F F7 !s5.(E )! SyntaxSyntax analysisanalysis 88 Syntax analysis 80 ! id ! F I11 : 5. FI5! (E ). 4. T s5.F s4 T 3 .(E ) F9 .). ! T6. ! I11 !). (E F:IF10! (E )id :! T.id. !F! T!⇤F.(E F . ) I1 : E80 ! EF I10 T : T ! ⇤F:! F ..F⇤(E Syntax analysis . id I : F T ! T F s6 s11 ! T . ⇤ 11 I : T ! T ⇤ F . 10ISyntax F ! . Fid E ! T . !F E! analysis analysis Syntax analysis 7 !s5.(E )Syntax s4 10 :+Syntax Eanalysis +. T . Syntax 80 Syntax analysis idSyntax I9 : IE !analysis +(E T . . )id 9T I11 : I9FI:10 ! (E ).E !). FE! analysis E T 9 I! r1! s7 r1 r1 (E : T ! ⇤ F . : EE.80+ E . 1 I : F ! (E ). 6. F 11!: 5. id FF! I : T ! T ⇤ F . s6 s11 T ! T . ⇤ F 11 F10 ! id. F ! .10 id T T ⇤ F+ T . ISyntax Eanalysis !. E Syntax T !I9 T F E + TII.25 :: E r3. + Ts7I11 r3 : r3 : .E⇤ ! 9 :! Fr1). ! r1(E ). E ! TE.9 r3!analysis : F (E Syntax analysis 88 6. F ! id I10! : T !T ⇤ F . Syntax analysis 11 r1 I5 : F10r5!Iid. r5 r3 I r3 : T ! T ! . ⇤. F r3 r5 TT ⇤F ISyntax T ! TT⇤ ! F .T. ⇤ F T11 I! ⇤ FTr5 . r3 I11 10 2 : TE. ! 10 : analysis : F ! (E ). 88 Syntax analysis r5 !). T ⇤ F . I10 (E : ). T ! T ⇤ FI3. : T ! FT.11 ! T . r5⇤ F r5 I11r5 : I10F : !T (E I11 : F ! Syntax analysis Syntax analysis SyntaxI analysis I11 : F ! (E ). I : F ! 11 : F ! (E ). I3 : (.E T )! F . Syntax analysis 4 analysis Syntax analysis Syntax I! F+ !T(.E ) 4 : .E analysis ESyntax +T E ! Syntax .TE ! .E Syntax analysis 88 analysis E ! .T Syntax analysis 88 Syntax analysis 88 T ! .T ⇤ FSyntax analysis 88 TSyntax ! .T analysis ⇤F Syntax analysis Syntax analysis 80 Syntax analysis 80 T ! .FT ! .FSyntax analysis Syntax analysis F ! .(E Syntax analysis F )! .(E ) Syntax analysis 91 Syntax analysis 184 Syntax analysis 88 Syntax analysis F ! . id Syntax analysis

Example Example Example of a Example non LR(0) grammar Example Example Example Example

Example

Example

Example Example ExampleExample

Example

Example Example Example Example

Example

Example

Example

Example Example

Example

Example

Example

34

34

34

34

Conflict: in state 2, we don’t know whether to shift or reduce.

Constructing the SLR parsing tables 1. Construct c = {I0 , I1 , . . . , In }, the collection of sets of LR(0) items for G 0 (the augmented grammar) 2. State i of the parser is derived from Ii . Actions for state i are as follows: 2.1 If A ! ↵.a is in Ii and goto(Ii , a) = Ij , then ACTION[i, a] = Shift j 2.2 If A ! ↵. is in Ii , then ACTION[i, a] = Reduce A ! ↵ for all terminals a in Follow (A) where A 6= S 0 2.3 If S 0 ! S. is in Ii , then set ACTION[i, $] = Accept

3. If Goto(Ii , A) = Ij for a nonterminal A, then GOTO[i, A] = j 4. All entries not defined by rules (2) and (3) are made “error” 5. The initial state s0 is the set of items containing S 0 ! .S ) the simplest form of one symbol lookahead, SLR (Simple LR) Syntax analysis

185

Example

(SLR) Parsing Tables for Expression Grammar Action Table 1) 2) 3) 4) 5) 6)

E T F

First id ( id ( id (

Syntax analysis

E → E+T E→T T → T*F T→F F → (E) F → id

Follow $+) $+*) $+*)

state

id

0

s5

+

*

(

Goto Table )

$

s4

1

s6

2

r2

s7

r2

r2

3

r4

r4

r4

r4

r6

r6

4

s4 r6

T

F

1

2

3

8

2

3

acc

s5

5

E

r6

6

s5

s4

7

s5

s4

9

3 10

8

s6

9

r1

s7

s11 r1

r1

10

r3

r3

r3

r3

11

r5

r5

r5

r5 34

186

SLR(1) grammars

A grammar for which there is no (shift/reduce or reduce/reduce) conflict during the construction of the SLR table is called SLR(1) (or SLR in short). All SLR grammars are unambiguous but many unambiguous grammars are not SLR There are more SLR grammars than LL(1) grammars but there are LL(1) grammars that are not SLR.

Syntax analysis

187

Conflict example for SLR parsing

(Dragonbook)

Follow (R) contains ’=’. In I2 , when seeing ’=’ on the input, we don’t know whether to shift or to reduce with R ! L. Syntax analysis

188

Summary of SLR parsing Construction of a SLR parser from a CFG grammar Eliminate ambiguity (or not, see later) Add the production S 0 ! S, where S is the start symbol of the grammar Compute the LR(0) canonical collection of LR(0) item sets and the Goto function (transition function) Add a shift action in the action table for transitions on terminals and goto actions in the goto table for transitions on nonterminals Compute Follow for each nonterminals (which implies first adding S 00 ! S 0 $ to the grammar and computing First and Nullable) Add the reduce actions in the action table according to Follow

Check that the grammar is SLR (and if not, try to resolve conflicts, see later)

Syntax analysis

189

Outline 1. Introduction 2. Context-free grammar 3. Top-down parsing 4. Bottom-up parsing Shift/reduce parsing LR parsers Operator precedence parsing Using ambiguous grammars 5. Conclusion and some practical considerations

Syntax analysis

190

Operator precedence parsing Bottom-up parsing methods that follow the idea of shift-reduce parsers Several flavors: operator, simple, and weak precedence. In this course, only weak precedence Main di↵erences compared to LR parsers: I I

I

Syntax analysis

There is no explicit state associated to the parser (and thus no state pushed on the stack) The decision of whether to shift or reduce is taken based solely on the symbol on the top of the stack and the next input symbol (and stored in a shift-reduce table) In case of reduction, the handle is the longest sequence at the top of stack matching the RHS of a rule

191

Structure of the weak precedence parser input

a1

ai

an

$

stack Xm Xm

1

Weak precedence parsing

output

X2 X1

Shift-reduce table terminals, nonterminals and $

terminals and $

˜ modifier) (A

Syntax analysis

Shift/Reduce/Error

192

Weak precedence parsing algorithm Create a stack with the special symbol $ a = getnexttoken() while (True) if (Stack= = $S and a = = $) break // Parsing is over Xm = top(Stack) if (SRT [Xm , a] = shift) Push a onto the stack a = getnexttoken() elseif (SRT [Xm , a] = reduce) Search for the longest RHS that matches the top of the stack if no match found call error-recovery routine Let denote this rule by Y ! Xm r +1 . . . Xm Pop r elements o↵ the stack Push Y onto the stack Output Y ! Xm r +1 . . . Xm else call error-recovery routine

Syntax analysis

193

Example for the expression grammar Example: Shift/reduce table

E E T T F F

!E +T !T !T ⇤F !F ! (E ) ! id

Syntax analysis

E T F ⇤ + ( ) id $

⇤ S R

+ S R R

(

) S R R

S S S R R

R R

id

S S S R R

S

$ R R R

R R S

194

Example of parsing Stack $ $id $F $T $E $E + $E + id $E + F $E + T $E + T ⇤ $E + T ⇤ id $E + T ⇤ F $E + T $E

Syntax analysis

id + id +id +id +id +id id

Input ⇤ id$ ⇤ id$ ⇤ id$ ⇤ id$ ⇤ id$ ⇤ id$ ⇤id$ ⇤id$ ⇤id$ id$ $ $ $ $

Action Shift Reduce Reduce Reduce Shift Shift Reduce Reduce Shift Shift Reduce Reduce Reduce Accept

by F ! id by T ! F by E ! T by F ! id by T ! F by F ! id by T ! T ⇤ F by E ! E + T

195

Precedence relation: principle

We define the (weak precedence) relations l and m between symbols of the grammar (terminals or nonterminals) I I

X l Y if XY appears in the RHS of a rule or if X precedes a reducible word whose leftmost symbol is Y X m Y if X is the rightmost symbol of a reducible word and Y the symbol immediately following that word

Shift when Xm l a, reduce when Xm m a Reducing changes the precedence relation only at the top of the stack (there is thus no need to shift backward)

Syntax analysis

196

Precedence relation: formal definition Let G = (V , ⌃, R, S) be a context-free grammar and $ a new symbol acting as left and right end-marker for the input word. Define V 0 = V [ {$} The weak precedence relations l and m are defined respectively on V 0 ⇥ V and V ⇥ V 0 as follows: +

1. X l Y if A ! ↵XB 2. X l Y if A ! ↵XY + 3. $ l X if S ) X ↵ 4. X m a if A ! ↵B + 5. X m $ if S ) ↵X

is in R, and B ) Y , is in R +

is in R, and B ) X and



)a

for some ↵, , , and B

Syntax analysis

197

Construction of the SR table: shift Shift relation, l: Initialize S to the empty set. add $ l S to S for each production X ! L1 L2 . . . Lk for i = 1 to k 1 add Li l Li+1 to S 3 repeat for each⇤ pair X l Y in S for each production Y ! L1 L2 . . . Lk Add X l L1 to S until S did not change in this iteration. 1 2



We only need to consider the pairs X l Y with Y a nonterminal that were added in

S at the previous iteration Syntax analysis

198

Example of the expression grammar: shift Step 1 Step 2

E E T T F F

!E +T !T !T ⇤F !F ! (E ) ! id

Step 3.1

Step 3.2

Step 3.3

Syntax analysis

Sl$ E l+ +lT T l⇤ ⇤lF (lE E l) +lF ⇤ l id ⇤l( (lT + l id +l( (lF (l( (lid

199

Construction of the SR table: reduce Reduce relation, m: Initialize R to the empty set. add S m $ to R for each production X ! L1 L2 . . . Lk for each pair X l Y in S add Lk m Y in R 3 repeat for each⇤ pair X m Y in R for each production X ! L1 L2 . . . Lk Add Lk m Y to R until R did not change in this iteration. 1 2



We only need to consider the pairs X m Y with X a nonterminal that were added in

R at the previous iteration. Syntax analysis

200

Example of the expression grammar: reduce Step 1 Step 2

Step 3.1 E E T T F F

!E +T !T !T ⇤F !F ! (E ) ! id

Step 3.2

Step 3.3

Syntax analysis

E m$ T m+ F m⇤ T m) T m$ F m+ )m⇤ id m ⇤ F m) F m$ )m+ id m + )m) idm) id m $ )m$

201

Weak precedence grammars Weak precedence grammars are those that can be analysed by a weak precedence parser. A grammar G = (V , ⌃, R, S) is called a weak precedence grammar if it satisfies the following conditions: 1. There exist no pair of productions with the same right hand side 2. There are no empty right hand sides (A ! ✏) 3. There is at most one weak precedence relation between any two symbols 4. Whenever there are two syntactic rules of the form A ! ↵X and B ! , we don’t have X l B

Conditions 1 and 2 are easy to check

Conditions 3 and 4 can be checked by constructing the SR table.

Syntax analysis

202

Example of the expression grammar Shift/reduce table E E T T F F

!E +T !T !T ⇤F !F ! (E ) ! id

E T F ⇤ + ( ) id $

⇤ S R

+ S R R

(

) S R R

S S S R R

R R

id

S S S R R

S

$ R R R

R R S

Conditions 1-3 are satisfied (there is no conflict in the SR table) Condition 4: I I

Syntax analysis

E ! E + T and E ! T but we don’t have + l E (see slide 250) T ! T ⇤ F and T ! F but we don’t have ⇤ l T (see slide 250)

203

Removing ✏ rules Removing rules of the form A ! ✏ is not difficult

For each rule with A in the RHS, add a set of new rules consisting of the di↵erent combinations of A replaced or not with ✏. Example: S

! AbA|B

B ! b|c A ! ✏

is transformed into S

! AbA|Ab|bA|b|B

B ! b|c

Syntax analysis

204

Summary of weak precedence parsing

Construction of a weak precedence parser Eliminate ambiguity (or not, see later) Eliminate productions with ✏ and ensure that there are no two productions with identical RHS Construct the shift/reduce table Check that there is no conflict during the construction Check condition 4 of slide 254

Syntax analysis

205

Outline 1. Introduction 2. Context-free grammar 3. Top-down parsing 4. Bottom-up parsing Shift/reduce parsing LR parsers Operator precedence parsing Using ambiguous grammars 5. Conclusion and some practical considerations

Syntax analysis

206

Using ambiguous grammars with bottom-up parsers All grammars used in the construction of Shift/Reduce parsing tables must be un-ambiguous We can still create a parsing table for an ambiguous grammar but there will be conflicts We can often resolve these conflicts in favor of one of the choices to disambiguate the grammar Why use an ambiguous grammar? I I

Because the ambiguous grammar is much more natural and the corresponding unambiguous one can be very complex Using an ambiguous grammar may eliminate unnecessary reductions

Example: E ! E + E |E ⇤ E |(E )|id

Syntax analysis

)

E ! E + T |T T ! T ⇤ F |F F ! (E )|id 207

Set of LR(0) items of the ambiguous expression grammar

E ! E + E |E ⇤ E |(E )|id

Follow (E ) = {$, +, ⇤, )} ) states 7 and 8 have shift/reduce conflicts for + and ⇤.

(Dragonbook) Syntax analysis

208

Disambiguation Example: Parsing of id + id ⇤ id will give the configuration (0E 1 + 4E 7, ⇤id$) We can choose: I I

ACTION[7, ⇤] =shift 5) precedence to ⇤ ACTION[7, ⇤] =reduce E ! E + E ) precedence to +

Parsing of id + id + id will give the configuration (0E 1 + 4E 7, +id$) We can choose: I I

ACTION[7, +] =shift 4) + is right-associative ACTION[7, +] =reduce E ! E + E ) + is left-associative

(same analysis for I8 ) Syntax analysis

209

outline 1. Introduction 2. Context-free grammar 3. Top-down parsing 4. Bottom-up parsing Shift/reduce parsing LR parsers Operator precedence parsing Using ambiguous grammars 5. Conclusion and some practical considerations

Syntax analysis

210

Top-down versus bottom-up parsing Top-down I I I

Easier to implement (recursively), enough for most standard programming languages Need to modify the grammar sometimes strongly, less general than bottom-up parsers Used in most hand-written compilers and some parser generators (JavaCC, ANTLR)

Bottom-up: I I I

Syntax analysis

More general, less strict rules on the grammar, SLR(1) powerful enough for most standard programming languages More difficult to implement, less easy to maintain (add new rules, etc.) Used in most parser generators (Yacc, Bison)

211

Hierarchy of grammarCHAPTER classes THREE. PARSING Unambiguous Grammars LL(k)

LR(k)

LL(1)

LR(1)

Ambiguous Grammars

LALR(1) SLR LL(0)

LR(0)

(Appel) FIGURE 3.29.

Syntax analysis

A hierarchy of grammar classes.

For example, the items in states 6 and 13 of the LR(1) parser 212 for G

Error detection and recovery In table-driven parsers, there is an error as soon as the table contains no entry (or an error entry) for the current stack (state) and input symbols The least one can do: report a syntax error and give information about the position in the input file and the tokens that were expected at that position In practice, it is however desirable to continue parsing to report more errors There are several ways to recover from an error: I I I I

Syntax analysis

Panic mode Phrase-level recovery Introduce specific productions for errors Global error repair

213

Panic-mode recovery

In case of syntax error within a “phrase”, skip until the next synchronizing token is found (e.g., semicolon, right parenthesis) and then resume parsing In LR parsing: I I I

Syntax analysis

Scan down the stack until a state s with a goto on a particular nonterminal A is found Discard zero or more input symbols until a symbol a is found that can follow A Stack the state GOTO(s, A) and resume normal parsing

214

Phrase-level recovery

Examine each error entry in the parsing table and decide on an appropriate recovery procedure based on the most likely programmer error. Examples in LR parsing: E ! E + E |E ⇤ E |(E )|id I

I

Syntax analysis

id + ⇤id: ⇤ is unexpected after a +: report a “missing operand” error, push an arbitrary number on the stack and go to the appropriate next state id + id) + id: Report an “unbalanced right parenthesis” error and remove the right parenthesis from the input

215

Other error recovery approaches Introduce specific productions for detecting errors: Add rules in the grammar to detect common errors Examples for a C compiler: I ! if E I (parenthesis are missing around the expression) I ! if (E ) then I (then is not needed in C) Global error repair: Try to find globally the smallest set of insertions and deletions that would turn the program into a syntactically correct string Very costly and not always e↵ective

Syntax analysis

216

Building the syntax tree Parsing algorithms presented so far only check that the program is syntactically correct In practice, the parser also needs to build the parse tree (also called concrete syntax tree) Its construction is easily embedded into the parsing algorithm Top-down parsing: I I

Syntax analysis

Recursive descent: let each parsing function return the sub-trees for the parts of the input they parse Table-driven: each nonterminal on the stack points to its node in the partially built syntax tree. When the nonterminal is replaced by one of its RHS, nodes for the symbols on the RHS are added as children to the nonterminal node

217

in which tokens are grouped ea often represented inname a parse token such as . The id is short for identifier. The value 1 is

Building the syntax tree

ymbol table produced by the compiler. This table is used to pass

he token . In reality it is probably mapped to a pair, whose second hat there are many different identifiers so we need the second component, mbol =. I Each stack element points to a subtree of the syntax tree n en . right. I When performing a reduce, a new syntax tree is built with g and is discussed further in subsequent chapters. It is mapped to e something. On the one hand there is onlyat one 3 so root we could just use nonterminal and thethepopped-o↵ stack elements ammar containing rules as bethe can be a difference between how such this should printed (e.g., in an error hases) and how it should be stored (fixed vs. float vs double). Perhaps the le where an entry for "this kind of 3" is stored. Another possibility is to

Bottom-up parsing:

.

the as children

Note: I

In practice, the concrete syntax tree is not built but rather a

In C, most blanks are non-significant. rlly removed during scanning. simplified (abstract) syntax tree I

Depending on the complexity of the compiler, the syntax tree might

rs, and the various symbols and punctuation without using recursion

evenalso notthe behierarchical constructed ression (expr). Note decomposition in the figure on the right.

ng) parsing is somewhat arbitrary, but invariably if a recursive definition is involved,

g. ch tokens are grouped

represented in a parse

d the syntax tree with operators as interior nodes and rator. The syntax tree on the right corresponds to the parse

epresents ansuch assignment expression not an assignment statement. In C an containing rules as railing semicolon. That is, in C (unlike in Algol) the semicolon is a statement Syntax analysis

218

For your project The choice of a parsing technique is left open for the project You can either use a parser generator or implement the parser by yourself Motivate your choice in your report and explain any transformation you had to apply to your grammar to make it fit the constraints of the parser Parser generators: I I I I

Yacc: Unix parser generator, LALR(1) (companion of Lex) Bison: free implementation of Yacc, LALR(1) (companion of Flex) ANTLR: LL(*), implemented in Java but output code in several languages ...

http://en.wikipedia.org/wiki/Comparison_of_parser_generators

Syntax analysis

219

An example with Flex/Bison Example: Parsing of the following expression grammar: Input ! Input Line Input ! ✏

Line ! Exp EOL Line ! EOL Exp !

num

Exp ! Exp + Exp Exp ! Exp

Exp

Exp ! Exp ⇤ Exp Exp ! Exp/Exp Exp ! (Exp) https://github.com/prashants/calc Syntax analysis

220

Flex file: calc.lex %{ #define YYSTYPE double /* Define the main semantic type */ #include "calc.tab.h" /* Define the token constants */ #include %} %option yylineno /* Ask flex to put line number in yylineno */ white [ \t]+ digit [0-9] integer {digit}+ exponent [eE][+-]?{integer} real {integer}("."{integer})?{exponent}? %% {white} {} {real} { yylval=atof(yytext); return NUMBER; } "+" { return PLUS; } "-" { return MINUS; } "*" { return TIMES; } "/" { return DIVIDE; } "(" { return LEFT; } ")" { return RIGHT; } "\n" { return END; } . { yyerror("Invalid token"); } Syntax analysis

221

Bison file: calc.y Declaration: %{ #include #include #include #define YYSTYPE double extern char *yytext; extern int yylineno; extern FILE *yyin; %}

/* Define the main semantic type */ /* Global variables of Flex */

Definition of the tokens and start symbol %token %token %token %token

NUMBER PLUS MINUS TIMES DIVIDE LEFT RIGHT END

%start Input

Syntax analysis

222

Bison file: calc.y Operator associativity and precedence: %left PLUS MINUS %left TIMES DIVIDE %left NEG

Production rules and associated actions: %% Input:

/* epsilon */ | Input Line

; Line: END | Expression END { printf("Result: %f\n", $1); } ;

Syntax analysis

223

Bison file: calc.y Production rules and actions (continued): Expression: NUMBER { $$ = $1; } | Expression PLUS Expression { $$ = $1 + $3; } | Expression MINUS Expression { $$ = $1 - $3; } | Expression TIMES Expression { $$ = $1 * $3; } | Expression DIVIDE Expression { $$ = $1 / $3; } | MINUS Expression %prec NEG { $$ = -$2; } | LEFT Expression RIGHT { $$ = $2; } ;

Error handling: %% int yyerror(char *s) { printf("%s on line %d - %s\n", s, yylineno, yytext); }

Syntax analysis

224

Bison file: calc.y Main functions: int main(int argc, char **argv) { /* if any input file has been specified read from that */ if (argc >= 2) { yyin = fopen(argv[1], "r"); if (!yyin) { fprintf(stderr, "Failed to open input file\n"); } return EXIT_FAILURE; } if (yyparse()) { fprintf(stdout, "Successful parsing\n"); } fclose(yyin); fprintf(stdout, "End of processing\n"); return EXIT_SUCCESS; } Syntax analysis

225

Bison file: makefile How to compile: bison -v -d calc.y flex -o calc.lex.c calc.lex gcc -o calc calc.lex.c calc.tab.c -lfl -lm

Example: >./calc 1+2*3-4 Result: 3.000000 1+3*-4 Result: -11.000000 *2 syntax error on line 3 - * Successful parsing End of processing

Syntax analysis

226

The state machine Excerpt of calc.output (with Expression abbreviated in Exp): state 9 6 Exp: 7 | 8 | 9 | 10 |

Exp . Exp . Exp . Exp . MINUS

$default

state 11 6 Exp: Exp PLUS . Exp

PLUS Exp MINUS Exp TIMES Exp DIVIDE Exp Exp .

NUMBER MINUS LEFT

reduce using rule 10 (Exp)

Exp

state 10 6 Exp: Exp . PLUS Exp 7 | Exp . MINUS Exp 8 | Exp . TIMES Exp 9 | Exp . DIVIDE Exp 11 | LEFT Exp . RIGHT PLUS MINUS TIMES DIVIDE RIGHT Syntax analysis

shift, and go to state 3 shift, and go to state 4 shift, and go to state 5

shift, shift, shift, shift, shift,

and and and and and

go go go go go

to to to to to

state state state state state

go to state 17

11 12 13 14 16 227