Parsers: Terminology. Parsing. Parsers: Terminology. Parsers: Terminology. Parsers: Terminology. Compilers & Translators

Compilers & Translators Parsers: Terminology •  G : Grammar G=(Vn,Vt,P,S) –  N is a set of non-terminals –  T is a set of terminals –  P is a set of ...
2 downloads 0 Views 198KB Size
Compilers & Translators

Parsers: Terminology •  G : Grammar G=(Vn,Vt,P,S) –  N is a set of non-terminals –  T is a set of terminals –  P is a set of productions –  S is the start symbol

Parsing Terminology LL(1) Parsers Overview of LR Parsing

•  A simple grammar: G=({a,b}, {S,A,B}, {S  A B $, A  A a, A  a, B B b, B  b}, S) 1

Parsers: Terminology

Parsers: Terminology G=({a,b}, {S,A,B}, {S  A B $, A  A a, A  a, B B b, B  b}, S)

G=({a,b}, {S,A,B}, {S  A B $, A  A a, A  a, B B b, B  b}, S)

•  Productions (rewriting rules) tell how to derive strings (from other strings). We will use the standard BNF form. •  P = {

•  Vocabulary V of terminal (Vt) and nonterminal (Vn) symbols •  Vn = {S,A,B}

–  Non-terminals are symbols that are on the lefthand-side of a production –  Non-terminals are constructs in the language that are recognized during parsing

•  Vt = {S,A,B}

–  Terminals are tokens recognized by the lexer –  They correspond to symbols in the textual representation of the program

2

3

SAB$ AAa Aa B B b Bb }

4

Parsers: Terminology SAB$ AAa Aa B B b Bb

Given a start rule, productions tell us how to rewrite a non-terminal into a different set of symbols. By convention, first production applied has the start symbol on the left, and only one such production.

To derive the string a a b b b we can do the following rewrites:

SAB$AaB$aaB$aaBb$ aaBbb$aabbb$ 5

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

•  Strings are composed of symbols –  A A a a B b b A a is a string –  Greek letters are used to refer to strings of terminals and non-terminals

•  L(G) is the language produced by the grammar G. –  For the example grammar, L(G) is a+b+$ –  L(G) for the example grammar is also a regular expression •  All regular expressions can be expressed as grammars for context free languages •  Some tools exploit this by doing scanning using a parser. 6

1

Compilers & Translators

Chomsky Hierarchy •  •  •  • 

Regular grammars •  All productions are of the form:

Regular grammars Context free grammars Context sensitive grammars Unrestricted grammars

•  P a B •  P a

–  Or of the form: •  P a B •  P a

–  Productions of the form •  P λ"

Can appear if P never appears on the RHS of a production. –  Correspond to DFAs 7

Context free grammars

8

Context sensitive grammars

•  What we’ve been looking at, productions of the form

•  Productions are of the form –  αΑβαγβ

–  P β where β is a string of terminals and non-terminals –  Correspond to push-down automata

•  Where α, β can be empty, but γ cannot –  Productions of the form •  P λ"

Can appear if P never appears on the RHS of a production.

•  Correspond to deterministic turing machine with a bounded tape 9

10

Leftmost Derivation

Unrestricted grammars •  Left and right hand sides of productions are unrestricted. •  Corresponds to a Turing Machine

•  Rewriting of a given string starts with the leftmost symbol Exercise: do a leftmost derivation of input program F(V+V) given the Grammar: 1: E → Prefix ( E ) 2: E → V Tail 3: Prefix → F 4: Prefix → λ 5: Tail → + E 6: Tail → λ Draw the parse tree 11

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

12

2

Compilers & Translators

Micro in Standard BNF

1 Program 2 Statement-list 3 StatementTail 4 StatementTail 5 Statement 6 Statement 7 Statement 8 Id-list 9 IdTail 10 IdTail 11 Expr-list 12 ExprTail 13 ExprTail 14 Expression 15 PrimaryTail 16 PrimaryTail 17 Primary 18 Primary 19 Primary 20 Add-op 21 Add-op 22 System-goal

::= BEGIN Statement-list END ::= Statement StatementTail ::= Statement StatementTail ::= λ ::= ID := Expression ; ::= READ ( Id-list ) ; ::= WRITE ( Expr-list ) ; ::= ID IdTail ::= , ID IdTail ::= λ ::= Expression ExprTail ::= , Expression ExprTail ::= λ ::= Primary PrimaryTail ::= Add-op Primary PrimaryTail ::= λ ::= ( Expression ) ::= ID ::= INTLITERAL ::= PLUSOP ::= MINUSOP ::= Program SCANEOF

What is parsing Compare this to slide 39, Aug25.pdf A ::= B | C A ::= B A ::= C

•  Parsing is recognizing members in a language specified/defined/generated by a grammar •  When a construct (corresponding to a production in a grammar) is recognized, a typical parser will take some action. –  In a compiler, this action generates an intermediate representation of the program construct –  In an interpreter, this action might be to perform the action specified by the construct. Thus if a+b is recognized, the value of a and b would be added and placed in a temporary variable.

A ::= B {C} A ::= B tail tail ::= C tail tail ::= λ

13

Another simple grammar

Parsing this grammar

PROGRAM  begin STMTLIST $ STMTLIST  STMT ; STMTLIST STMTLIST  end STMT  id STMT  if ( id ) STMTLIST

PROGRAM  begin STMTLIST $ STMTLIST  STMT ; STMTLIST STMTLIST  end STMT  id STMT  if ( id ) STMTLIST An observation: •  to parse the STMT in STMTLIST  STMT ; STMTLIST it is necessary to parse either STMT  id or STMT  if ( id ) STMTLIST •  Which of STMT  id or STMT  if ( id ) STMTLIST to parse can be determined by finding out if the next token is id or if –  I.e what production the next input token matches –  This is the FIRST set of the production

A sentence in the grammar: begin if ( id ) if ( id ) id ; end ; end ; end $ What are the terminals and non-terminals of this grammar?

14

15

16

Top-down and Bottom-up Parsers

Another example SAB$ AxaA String to parse AyaA Aλ Bb •  Consider S ⇒ A B $ ⇒ x a A B $ ⇒ x a B $ ⇒ x a b $ •  When parsing x a b $ we know from the goal production we need to match an A, and from the next token being an x and FIRST(x a A) to apply A  x a A •  The parser matches(x), matches(a) and now needs to parse an A. •  How does it know which A to match, I.e. to match A  λ? –  When matching the right hand side of A  λ, the next token is a nonterminal that follows A –  Tokens that can follow are are the set FOLLOW(A) 17

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

•  Top-down parsers use left-most derivation •  Bottom-up parsers use right-looking parse Notation: –  LL(1) : Leftmost deriv. with 1 symbol lookahead –  LL(k) : Leftmost deriv. with k symbols lookahead –  LR(1) : Right-looking deriv. with 1 symbol lookahead 18

3

Compilers & Translators

Towards Parser Generators

Grammar Analysis Algorithms

The main issue: as the parser read the source program tokens, it needs to decide what productions to use.

First(α) = { a ∈ Vt | α ⇒* aβ} U {λ | if α ⇒* λ} Follow (A) = {a ∈ Vt | S ⇒+ …Aa…} U {λ | if S =>+ ...A} S: start symbol of the grammar a: a terminal symbol A: a non-terminal symbol α,β: any string (composed of terminals & non-terminals)

Step 1: find the tokens that can tell which production P (of the the form A → X1 ... Xm) applies

=> derived in 1 step =>+ derived in 1 or more steps =>* derived in 0 or more steps

Predict(P) : if not (λ in First(X1 ... Xm)) return First(X1...Xm) else return (First(X1 ... Xm) - λ) U Follow(A)

typically α is a production RHS

19

Parse Table (LL(1))

20

Building the Parse Table

Step 2: building the parse table.

T[A][t] initialize all fields to “error” Foreach A: Foreach P with A on its LHS: Foreach t in Predict(P) : T[A][t] = P

Given some non-terminal Vn, and a terminal Vt, the parse table tells us what production P to use (or that we have an error.)

More formally: T : Vn x Vt  P U {Error} Since we start with a goal, we always have a non-terminal Vn.

Exercise: build the parse table for Micro

21

Building Recursive-Descent Parsers from LL(1) Parse Tables Given the parse table we can create a program that writes the recursive descent parse procedures discussed earlier. Remember the algorithm on page 33 & 34. (If the choice of production is not unique, the parse table tells us which one to take.)

22

A Stack-Based Parser Driver for LL(1) Given the parse table, a stack-based algorithms looks much simpler than the generator of a recursive-descent parser. The basic algorithm is 1 push the RHS of the production onto the stack 2 pop a symbol. If it’s a terminal, match it; 3 if it’s a non-terminal, take its production according to the parse table and goto 1

However there is an easier method...

Algorithm on page 121 23

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

24

4

Compilers & Translators

Turning Non-LL(1) into LL(1) Grammar

Including Semantic Actions in a Stack-Based Parser Generator

consider :

•  Action symbols are simply pushed onto the stack as well. •  When popped, the semantic action routines are called.

stmt ::= if then endif stmt ::= if then else end if It is not LL(1) because it has a common prefix We can turn this into: stmt ::= if then ::= end if ::= else endif

25

Left-Recursion

26

Removing Left Recursion

E ::= E + T is left-recursive (the LHS is also the first symbol of the RHS) Example:

How would the stack-based parser algorithm handle this production?

E→E+X E→X

E → E1 Etail E1 → X Etail → + X Etail Etail → λ

This can be simplified

(Algorithm on page 125) 27

Solving the If-Then-Else Problem

If-Then-Else Problem (a motivating example for LR grammars)

•  The ambiguity exists at the language level as well. The semantics need to be defined properly:

If x then y else z If a then if b then c else d this is analogous to a bracket notation when left brackets >= right brackets: [ [ ] ([i]j, i>=j)

e.g., “the then part belongs to the closest matching if”

Grammar: S → [ S C S→λ C→]

28

S → [S [[]

SSλC or SSCλ

ambiguous

C → λ 29

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

S → S1 S1 → [ S1 ] S1 → λ

This grammer is still not LL(1), nor is it LL(k) Show that this is so. 30

5

Compilers & Translators

Parsing the If-Then-Else Construct

LR Parsers

LL(k) parsers can look ahead k tokens.

A Shift-Reduce Parser: •  Basic idea: put tokens on a stack until an entire production is found. •  Issues:

(LL(k) will not be discussed in this class, but be able to –  explain in English what LL(k) means and –  recognize a simple LL(k) grammar.)

For the If-Then-Else construct, a parsing strategy is needed that can look ahead at the entire RHS of a production (not just k tokens) before deciding what production to take. LR parsers can do that.

–  recognize the end point of a production –  find the length of the production (RHS) –  find the corresponding nonterminal (i.e., the LHS of the production) 31

Data Structures for Shift-Reduce Parsers

32

Example of Shift-Reduce Parsing Consider the simple Grammar: → begin end $ → SimpleStmt ; → begin end ; → λ

At each state, given the next token, •  a goto table defines the successor state •  an action table defines whether to –  shift (put the next state and token on the stack) –  reduce (a RHS is found, process the production) –  terminate (parsing is complete)

Shift Reduce Driver Algorithm on page 142, Fig 6.1

33

LR Parser Generators

34

LR(k) Parsers

(OR: HOW TO COME UP WITH GOTO AND ACTION TABLES)

Basic idea: •  shift tokens onto the stack; at any step keep the set of productions that match the read-in tokens. •  Reduce the RHS of recognized productions to the corresponding nonterminal on the LHS of the production by replacing the RHS tokens on the stack wit the LHS non-terminal. 35

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

LR(0) parsers: •  no lookahead •  predict which production to use by looking only at the symbols already read. LR(k) parsers: •  k symbol lookahead •  most powerful class of deterministic bottom-up parsers (in terms of grammars read) 36

6

Compilers & Translators

Terminology for LR Parsing •  Configuration: A → X1 . . . Xi • Xi+1 . . . Xj •  Configuration set:

Configuration Closure Set

• marks the point to which the production has been recognized

•  Include all configurations necessary to recognize the next symbol after the •. •  For example:

all the configurations that apply at a given point in the parse. For example: A → B • CD A → B • GH T→B•Z

S→ E$ E→ E + T | T T→ ID | (E)

closure0({S→ • E$})={ S→ • E $ E→ • E+T E→ • T T→ • ID T→ • (E) }

37

38

Building the Characteristic Finite State Machine (CFSM)

Successor Configuration Set

•  Nodes are configuration sets •  Arcs are go_to relationships

•  Starting with the initial configuration set s0 = closure0({S→ • α $}),

an LR(0) parser will find the successor, given a (next) symbol X.

Example:

X can be either a terminal (a token from the scanner) or a nonterminal (the result of a reduction)

State 0:

S’→ S$ S→ ID

•  Determining the successor s’ = go_to(s,X) : 1. pick all configurations in s of the form A → β • X γ 2. take closure0 of this set

S’→ • S$ S → • ID

State 1: ID

S

S → ID • State 2:

S→S•$

State 3: $

S→S$•

39

Building the go_to Table

Building the Action Table

•  Building the go_to table is straightforward from the CFSM: For the previous example the table looks like this: State Symbol 0 1 2 3

40

ID $ S 1 2

Given the configuration set s: •  We shift if the next token matches a terminal after the • in some configuration in A→ α • a β ∈ s and a ∈ Vt , else error

•  We reduce production P if the • is at the end of a production

3

B→ α • ∈ s and production P is B → α

41

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

42

7

Compilers & Translators

Exercise

LR(0) and LR(k) Grammars •  For LR(0) grammars the action table entries just described are unique. •  For most useful grammars we cannot decide on shift or reduce based on the symbols read. Instead, we have to look ahead k tokens. This leads to LR(k). •  However, it is possible to create an LR(0) grammar that is equivalent to any given LR(k) grammar (provided there is an end marker). This is only of theoretical interest because this grammar may be very complex and unreadable.

•  Create CFSM, go_to table, and action table for 1: S→ E$ 2: E→ E + T 3: E→ T 4: T→ ID 5: T→ (E)

S→ E$ E→ E + T | T T→ ID | (E)

43

44

Configuration Set Closure for LR(1)

LR(1) Parsing •  LR(0) parsers may generate –  shift-reduce conflicts (both actions possible in same configuration set) –  reduce-reduce conflicts (two or more reduce actions possible in same configuration set)

•  The configurations for LR(1) are extended to include a lookahead symbol A → X1 . . . Xi • Xi+1 . . . Xj , l l ∈ Vt ∪ {λ} Lookahead symbol

Configurations within a closure that differ only in the lookahead symbol are combined:

A → X1 . . . Xi • Xi+1 . . . Xj , {l1…lm}

S→ E$ E→ E + T | T T→ ID | (E)

closure1({S→ • E$, {λ})={ S→ • E$ , {λ} E→ • E+T , {$} E→ • T , {$} T→ • ID , {$} T→ • (E) , {$} E→ • E+T , {+} E→ • T , {+} T→ • ID , {+} T→ • (E) , {+} }

45

Merge sets that differ only in lookahead

Goto and Action Table for LR(1)

S→ E$ E→ E + T | T T→ ID | (E)

closure1({S→ • E$, {λ})={ S→ • E$ , {λ} E→ • E+T , {$} E→ • T , {$} T→ • ID , {$} T→ • (E) , {$} E→ • E+T , {+} E→ • T , {+} T→ • ID , {+} T→ • (E) , {+} }

46

closure1({S→ • E$, {λ})={ S→ • E$ , {λ} E→ • E+T , {$,+} E→ • T , {$,+} T→ • ID , {$,+} T→ • (E) , {$,+} }

47

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

•  The function goto1(configuration-set,symbol) is analogous to goto0() for LR(0) •  Goto table is also created the same way as for LR(0). •  The Action table makes the difference. The lookahead symbol is used to decide if a reduction is applicable. Hence, the lookahead symbol resolves possible shift-reduce conflicts. 48

8

Compilers & Translators

Example: LR(1) for G3

•  Exercise:

Problems with LR(1) Parsers LR(1) parsers are very powerful. However, •  The table size can grow by a factor of | Vt | •  Storage-efficient representations are an important issue. Example: Algol 60 (a simple language) includes several thousand states.

S→ E$ E→ E + T | T T→T*P|P P→ ID | (E)

–  create states and the goto table –  create the action table –  explain how you see that this is LR(1) and not LR(0) –  Hint – look at state 7 49

Solutions to the LR(1) Size Problem

50

Exercise •  Determine if G3 is an SLR Grammar:

Several parser schemes similar to LR(1) have been proposed •  LALR: merge certain states. There are several LR optimization techniques (will not be discussed further). •  SLR (simple LR): build a CFSM for LR(0) then add lookahead. Lookahead symbols are taken from the Follow sets of a production.

Hint: the states 7 and 11 have shift-reduce conflicts. Can they be resolved by looking at the Follow set? Follow(E) = {$,+,)} Shift is on “*” Can distinguish between the shift/reduce action 51

52

We have covered ... •  Scanners, scanner generators •  Parsers: –  Parser terminology –  LL(1) parsing and parser generation: building stack-based parsers, including action symbols. –  Overview of LR parsers: shift-reduce parsers. CFSM. Basics of LR(1). 53

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

Semantic Processing

54

9

Compilers & Translators

Properties of 1-Pass Compilers •  efficient •  coordination and communication of passes not an issue •  single traversal of source program restricts semantics checks and actions. •  no (or little) code optimization (peephole optimization can be added as a separate pass) •  difficult to retarget, architecture-dependent. Architecture-dependent and independent decisions are mixed.

Some “Philosophy” About the Structure of Compilers at First.

55

1-Pass Analysis + 1-Code Generation Pass •  More machine independent •  Can add optimization pass •  There is an intermediate representation (IR) that represents the analyzed program. It is input to the code generator. •  Each pass can now be exchanged independently of each other

56

Multi-Pass Analysis

Analysis Code Generation

•  Note: The use of “analysis” in this context is different than in optimization papers.

•  Scanner can be a separate pass, writing a stream (file) of tokens. •  Parser can be a separate pass writing a stream of semantic actions. •  Analysis is very important in all optimizing compilers and in programming tools •  Advantages of Multi-Pass Analysis: –  can handle Languages w/o variable declarations (need multi-pass analysis for static semantics checking) –  no “forward declarations” necessary –  For memory bound applications, can make very small compilers this way.

57

58

Families of Compilers

Multi-Pass Synthesis We view a compiler as performing two major tasks.

•  Compilers that can understand multiple languages.

Analysis understanding syntax and semantics of the source program.

–  Syntax analysis has to be different. –  Some program analysis passes are generic. –  The choice of IR influences the range of analyzable languages.

Synthesis generating the output (usually the target code)

•  Simple multi-pass synthesis: code-generation + peephole optimization •  Several optimization passes can be added •  Split into machine independent and dependent code generation phases is desirable •  Importance of early multi-pass compilers : space savings. 59

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

C C++ Java Fortran compiler

•  Compilers that generate code for multiple architectures. –  Analysis and architecture-independent code compiler generation can be the same for all machines. –  Example: GNU C compiler. GCC uses two IRs: a treeX86 Sparc Mips oriented IR and RTL.

60

10

Compilers & Translators

Deleted slides

Families of Compilers • 

Within a given part of the compiler (e.g.C the frontend (analysis) part of the compiler, or the backend (synthesis) part, the IR may be transformed into a form more appropriate to the job to be done – 

C++ Java Fortran

IBM w-code uses stack based language for its IR •  •  • 

Each pass translates from w-code into a passspecific IR Passes use (fairly complicated) trees of expressions, lists of statements, etc. Passes may be written in different languages

•  Gone, but not forgotten

compiler

compiler X86 Sparc Mips

61

Let’s first derive a sentence in the language (1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST (3) STMTLIST  end (4) STMT  id (5) STMT  if ( id ) STMTLIST

62

begin if ( id ) if ( id ) id ; end ; end ; end $ •  How does a parser recognize this string? •  Two approaches –  Look at the next n input symbols, and predict what production to apply

Always start with the goal production, I.e. production that starts with the start symbol.

•  These are LL(n) parsers •  Recursive descent parsers are a form of LL parsing •  Often easier to write, less general

PROGRAM  begin STMTLIST $  begin STMT ; STMTLIST $  begin if ( id ) STMTLIST ; STMTLIST $  begin if ( id ) STMT ; STMTLIST ; STMTLIST $  begin if ( id ) if ( id ) STMTLIST ; STMTLIST ; STMTLIST $  begin if ( id ) if ( id ) STMT ; STMTLIST ; STMTLIST ; STMTLIST $  begin if ( id ) if ( id ) id ; STMTLIST ; STMTLIST ; STMTLIST $  begin if ( id ) if ( id ) id ; end ; STMTLIST ; STMTLIST $ begin if ( id ) if ( id ) id ; end ; end ; STMTLIST $ begin if ( id ) if ( id ) id ; end ; end ; end $

–  Look at a string of symbols and decide (using an FSA) when they match a production •  These are LR parsers •  Get to look at all terminals derived from a production to recognize the production •  Most general context free parsers

63

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $ •  Parser knows the goal production is

64

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $ •  While parsing the goal production, sees STMTLIST (1) PROGRAM  begin STMTLIST $

(1) PROGRAM  begin STMTLIST $

•  Can trivially determine the first non-terminal in the production is begin •  Thus, given all (one) productions with PROGRAM as the left-hand side, if parser sees begin on the input, can predict production 1 will used. •  Parser reads begin, predicts production 1, and then begins parsing using production 1 •  First parse action - match begin on input with begin in the production. 65

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

•  There are two STMTLIST productions: (2) STMTLIST  STMT ; STMTLIST (3) STMTLIST  end

•  Which STMTLIST production applies next? We want to predict based on the next input symbol. –  What is the first terminal that can be seen in production (3) STMTLIST  end

–  What is the first terminal that can be seen in production STMTLIST  STMT ; STMTLIST

–  This is a harder problem, and requires computing a FIRST set

66

11

Compilers & Translators

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $ •  While parsing the goal production, sees STMTLIST

FIRST(STMT ; STMTLIST) •  What is the first terminal that can be seen in production

(1) PROGRAM  begin STMTLIST $

STMTLIST  STMT ; STMTLIST

•  There are two STMTLIST productions:

•  The first symbol in this production is STMT, so the first terminal will be the first terminal in STMT

(2) STMTLIST  STMT ; STMTLIST (3) STMTLIST  end

(4) STMT  id (5) STMT  if ( id ) STMTLIST

•  Which STMTLIST production applies next? We want to predict based on the next input symbol.

•  First terminal in production 4 will be id •  First terminal in production 5 will be if

–  What is the first terminal that can be seen in production (3) STMTLIST  end

end –  What is the first terminal that can be seen in production

•  Thus FIRST(STMT ; STMTLIST) is {id, if}

STMTLIST  STMT ; STMTLIST

–  This is a harder problem, and requires computing a FIRST set

67

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $

68

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $ •  Now parsing STMTLIST

•  Are parsing the goal production, and see STMTLIST (1) PROGRAM  begin STMTLIST $

•  There are two STMTLIST productions:

(1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id, if}

(2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id, if} (3) STMTLIST  end FIRST(end) = {end}

•  There are two STMT productions: (4) STMT  id (5) STMT  if ( id ) STMTLIST •  Which STMT production applies next? We want to predict based on the next input symbol.

•  Which STMTLIST production applies next? We want to predict based on the next input symbol. –  –  –  – 

Look at the first sets of the productions that can be applied See which one has a terminal that matches the input Apply that production Production 2 in this case

–  Look at the first sets of the productions that can be applied –  What is FIRST(id)? What is FIRST(if ( id ) STMTLIST)? 69

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $ • 

70

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $

Now parsing STMTLIST (1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id, if}

• 

• 

if}

(1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST (5) STMT  if ( id ) STMTLIST

There are two STMT productions: (4) STMT  id (5) STMT  if ( id ) STMTLIST Which STMT production applies next? We want to predict based on the next input symbol.

•  •  •  • 

FIRST(STMT ; STMTLIST) = {id,

FIRST(if ( id ) STMTLIST) = {if}

Match the if in the input with the if in the production Match the ( in the input with the ( in the production Mattch the id in the input with id in the production Match the ) in the input with ) in the production

–  Look at the first sets of the productions that can be applied –  What is FIRST(id)? What is FIRST(if ( id ) STMTLIST)?

FIRST(ID) = {id} FIRST(if ( id ) STMTLIST) = {if} –  Given the next symbol of if, we apply production 5,

STMT  if ( id ) STMTLIST 71

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

72

12

Compilers & Translators

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $

if}

(1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST (5) STMT  if ( id ) STMTLIST

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $ (1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id, if} (5) STMT  if ( id ) STMTLIST FIRST(if ( id ) STMTLIST) = {if} (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id, if}

FIRST(STMT ; STMTLIST) = {id,

FIRST(if ( id ) STMTLIST) = {if}

•  We again have to pick between two STMLIST productions: (2) STMTLIST  STMT ; STMTLIST (3) STMTLIST  end

FIRST(STMT ; STMTLIST) = {id, if} FIRST(end) = {end}

•  With the next input of if, the parser predicts production 2

•  We again have to pick between two STMT productions: (4) STMT  id (5) STMT  if ( id ) STMTLIST

FIRST(id) = {id} FIRST(if ( id ) STMTLIST) = {if}

•  And again, based on the first symbol, pick 5. Matching the terminal on the input with the terminals at the front of the production, we match if ( id ) 73

74

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $

if}

if}

(1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $ (1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id, if} (5) STMT  if ( id ) STMTLIST FIRST(if ( id ) STMTLIST) = {if} (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id, if} (5) STMT  if ( id ) STMTLIST FIRST(if ( id ) STMTLIST) = {if} (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id, if} (4) STMT  id FIRST(id) = {id}

FIRST(STMT ; STMTLIST) = {id,

(5) STMT  if ( id ) STMTLIST FIRST(if ( id ) STMTLIST) = {if} (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id, (5) STMT  if ( id ) STMTLIST

•  We again have to pick between two STMT productions:

FIRST(if ( id ) STMTLIST) = {if}

(4) STMT  id (5) STMT  if ( id ) STMTLIST

•  We again have to pick between two STMTLIST productions: (2) STMTLIST  STMT ; STMTLIST (3) STMTLIST  end

• 

FIRST(STMT ; STMTLIST) = {id, if} FIRST(end) = {end}

FIRST(id) = {id} FIRST(if ( id ) STMTLIST) = {if}

•  The next symbol is id, and so production 4 is predicted

Since the next symbol is id, it is again production 2

•  • 

id in the production is matched against id in the input The production STMT  id is recognized.

75

76

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $

if}

if}

if} • 

(1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $

FIRST(STMT ; STMTLIST) = {id,

if}

(5) STMT  if ( id ) STMTLIST FIRST(if ( id ) STMTLIST) = {if} (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id,

if}

(5) STMT  if ( id ) STMTLIST FIRST(if ( id ) STMTLIST) = {if} (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id,

if}

The parser now matches against ;

(1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST

FIRST(STMT ; STMTLIST) = {id,

(5) STMT  if ( id ) STMTLIST FIRST(if ( id ) STMTLIST) = {if} (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id, (5) STMT  if ( id ) STMTLIST FIRST(if ( id ) STMTLIST) = {if} (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id,

•  We again have to pick between two STMTLIST productions: (2) STMTLIST  STMT ; STMTLIST (3) STMTLIST  end

• 

77

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

FIRST(STMT ; STMTLIST) = {id, if} FIRST(end) = {end}

Since the next symbol is end, it is production 3

78

13

Compilers & Translators

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $ (1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST (5) STMT  if ( id ) STMTLIST (2) STMTLIST  STMT ; STMTLIST (5) STMT  if ( id ) STMTLIST (2) STMTLIST  STMT ; STMTLIST (3) STMTLIST  end

•  •  • 

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $

FIRST(STMT ; STMTLIST) = {id, if} FIRST(if ( id ) STMTLIST) = {if} FIRST(STMT ; STMTLIST) = {id, if} FIRST(if ( id ) STMTLIST) = {if} FIRST(STMT ; STMTLIST) = {id, if} FIRST(end) = {end}

if}

if}

The parser matches against the end, and recognizes production 3 With STMTLIST recognized (production 3), the STMTLIST of production 2 is also recognize. Production 2 recognizes another STMTLIST, which is the last symbol to be recognized in production 5

•  • 

(1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST

FIRST(STMT ; STMTLIST) = {id,

(5) STMT  if ( id ) STMTLIST FIRST(if ( id ) STMTLIST) = {if} (2) STMTLIST  STMT ; STMTLIST FIRST(STMT ; STMTLIST) = {id,

The parser matches the ; predicts production 3 (STMTLIST  end) for STMTLIST, matches against the end, and then recognizes the STMTLIST production (2) The STMTLIST recognized by production 2 is the last symbol in production 5, thus the STMT of production 5 is recognized.

79

Parsing with a LL parser begin if ( id ) if ( id ) id ; end ; end ; end $

80

Test Sept. 21- evening 7:30-9pm EE 270 •  Let me know if you have a conflict -- I will schedule makeups.

(1) PROGRAM  begin STMTLIST $ (2) STMTLIST  STMT ; STMTLIST

–  Let me know by this Friday midnight if at all possible •  The ; is matched

•  Open book, open notes •  Errors are non-cumulative, I.e. if I say use the result of question X to answer Y, your answer to Y needs to be correct given the answer to X.

•  The end causes production 3 to be predicted, and recognized, for STMTLIST.

–  Production 3 is parsed, causing STMTLIST to be recognized –  Production 2 is now recognized •  The STMTLIST of 2, and the $ on input allows PROGRAM to be recognized. 81

What will be covered

82

What will be covered

•  What will be covered:

•  Chapter 3, scanning

–  Chapter 1, 2 -- introduction –  Know the difference between syntax and semantics –  Know what a 1 pass compiler does –  Know the parts of a compiler –  Know why we need a symbol table

–  Regular expressions, RE to NDFA, NDFA to DFA, DFA to minimal DFA –  How to handle reserved words –  What cannot be recognized by a scanner (for example (i)j i > j

•  Will not cover 3.4, most of 3.5

83

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

84

14

Compilers & Translators

Grammars and parsers

LL(1)

•  Grammars (chapter 3)

•  Given a first and follow set, be able to compute a predict set (first and follow may have been computed by you.) •  Given a predict set (perhaps computed by you) create and LL(1) parse table •  Show how a stack based parser will work

–  What is the difference between a grammar and the language recognized by a grammar –  What is a non-terminal and terminal –  What is a production –  Be able to tell if a string is produced by a grammar, and to create an parse tree for a string –  Be able to compute first and follow sets 85

86

LR(0) •  Be able to build and LR(0) CFSM (characteristic finite state machine) •  Given the CFSM, give the action and goto table for an LR(0) grammar •  Given a CFSM or action and goto tables, be able to show the steps in parsing a string with a LR(0) grammar •  Given a CFSM, be able to tell if a grammar is not LR(0) •  Note that LR(1), LALR(k), SLR(k), etc. will not be covered. 87

copyright 2002 Purdue University, Prof. R. Eigenmann, Prof. S. MidkiffCopyright 2002 Prof. R. Eigenmann, Prof. S. Midkiff

15