Syntax/semantics ƒ Program program execution ƒ Compiler/interpreter ƒ Syntax Grammars Regular expressions Syntax diagrams Automata Scanning/Parsing

ƒ Meta models

8/31/2008

IF 33110/4110 - 22007

ƒ ƒ ƒ ƒ ƒ

1

Program program execution

8/31/2008

Semantics S ti

IF 33110/4110 - 22007

S t Syntax

2

Syntax Semantics ƒ A description of a programming language consists of two main components: ƒ Syntactic rules: what form does a legal program have. ƒ Semantic rules: what the sentences in the language mean.

8/31/2008

IF 33110/4110 - 22007

ƒ Static semantics: rules that may be checked before the execution of the program, e.g.: • All variables must be declared. • Declaration and use of variables coincide (type check). ƒ Dynamic semantics: rules saying what shall hapen during (as part of) f) the th execution ti off the th program, e.g. in i tterms off an operational ti l semantics, that is a semantics that describes the behaviour of a (idealised) abstract prosessor/machine performing a program, or by mapping pp g to something g else ((but well-known and well-defined). )

3

Compiler/interpreter Program (source)

ƒ ƒ

tokens

Parser

Parse/ (Abstract) Syntax Tree

An interpreter reads a program and simulates its operations. Both are based upon a syntax tree representation of the program

8/31/2008

Static Semantic Checker

Abstract Syntax Tree decorated

Code Generator

Machine/ Byte Code

IF 33110/4110 - 22007

ƒ A compiler translates a program to another l language, ttypically i ll a machine language or to a language g g for a virtual machine.

Scanner

Virtual tua Machine

I t Interpreter t

Machine

4

Syntax described by BNF-grammars production, rule n e+e e-e d nd d 0 1

nonterminal

terminal

9 terminal

IF 33110/4110 - 22007

e :: ::= e ::= e ::= n ::= n ::= d ::= d ::= ... d ::=

metasymbol 8/31/2008

5

Extended BNF ƒ In Extended BNF we can use the following metasymbols on the righthand side: alternatives

?

optionality p y

*

zero or more times

+

one or more times

{...}

grouping symbols

IF 33110/4110 - 22007

|

e ::= n | e + e | e - e n ::= d | nd d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 8/31/2008

6

e

Derivation of sentences ƒ

ƒ

e ::= n | e + e | e - e n ::= d | nd d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 8/31/2008

e

n n

d

d

0

e

e

+

e

n

1 e

n

n

d

n

d

5

d

1

e

-

e

10

e

+

15

-

IF 33110/4110 - 22007

ƒ

The possible sentences in a language defined by a BNF-grammar are those that emerge by following this procedure: 1. Start with the start symbol (e). 2 For each nonterminal symbol (e 2. (e, n n, d) exchange this with one of the alternatives on the right hand side of the p production defining g this nonterminal. 3. Repeat §2 until only terminal symbols. Thi is This i called ll d a derivation d i ti from f the th start t t symbol to a sentence, and it may be represented by a parse tree / syntax tree Removing unnecessary derivations and nodes like * and + gives an abstract syntax tree

d 2

1

e 12

10 - 15 + 12 7

10 - 15 + 12 ? e

e

-

e

10

e

+

15

10 – (15 + 12) = 10 – 27 = -17 8/31/2008

e

e

e

12

10

-

+

e

e

12

15

IF 33110/4110 - 22007

e

(10 – 15) + 12 = -5 + 12 = 7 8

Unambiguous/ Ambiguous grammars ƒ If every sentence t in i the th language l can be b d derived i db by one and d only l one parse tree, then the grammar is unambiguous (entydig), otherwise it is ambiguous (flertydig). e ::= 0 | 1 | e + e | e - e | e * e IF 33110/4110 - 22007

ƒ Ambiguity g y handled by yp precedence and associativityy rules e e

e

8/31/2008

e

e e

e

e

1

e

-

1

+

1

1

-

1

e

+

0 9

8/31/2008

v := e | s ; s | if b then e s | if b then e se else se s x |y |z v |0 |1|2|3|4 e=e

IF 33110/4110 - 22007

s ::= v ::= e ::= b ::=

10

x:=1; y:=2; if x=y then y:=3 s

s

s

x

:=

e s

s

1 v y

8/31/2008

IF 33110/4110 - 22007

v

:=

e

if

b

2 e x

then

v =

e y

y

s

:=

e 3

11

if b1 then if b2 then s1 else s2 is s2 executed?

s

if

b1

then

b1 = true t b2 = false

s

if

b2

then

s1

then else

s1 s2

s

if

b1

no

b1 = false b2 = true

then

s

else

s2

no yes

8/31/2008

if

b2

then

s1

IF 33110/4110 - 22007

yes y

12

s

s

s

s

s IF 33110/4110 - 22007

s v y

:=

e if

2

b

th then

v v

:=

e

8/31/2008

x

1

e x

=

e y

y

s

:=

e 3

13

Alternatives to grammars ƒ Syntax diagrams ƒ Automata ƒ Deterministic ƒ Non-deterministic

8/31/2008

IF 33110/4110 - 2007

ƒ Meta models

14

Syntax diagram exp

exp

+

term

*

num

term term

IF 33110/4110 - 22007

term

num

8/31/2008

::=

+ |

::=

* | 15

Deterministic automata

{ - { 0 | 1 } | 0 | 1 } {0 | 1}* -

8/31/2008

-010

IF 33110/4110 - 22007

ƒ Transitions marked with terminals, one start state and a number of stop states ƒ Recognizes a string in the language if the terminals represent a valid sequence of transitions ending up in a stop state upon reading the last symbol

16

ƒ If not in the language – e.g. not a valid sequence of transitions

IF 3110/4110 - 22007

-

-0-1 8/31/2008

17

identifier ::= letter { letter | digit }*

IF 3110/4110 - 2007

8/31/2008

18

Yet another example

Allowed words are 0 1 101 0.10 100.1010 10.1

:= 0 | 1

:= ε | . := 0 | 1 | 0 | 1 ::= { 0 | 1 { 0 | 1 }* } { . { 0 | 1 } + }? 0 .

1 0

001 10. .01

IF 33110/4110 - 22007

However, not allowed with l di 0 or ”d leading ”decimalpoint” i l i t” without preceeding or following digits, so the following is not allowed:

ibp ::= 1 ibp | 0 ibp | bp

0 1 19

8/31/2008

1

0 .

1

0

0

1 IF 33110/4110 - 22007

1

ε

ε

ε 20

8/31/2008

ε

Types of languages ƒ Regular languages (type 3) ƒ A BNF-grammar with one nonterminal on the left side of all productions and only terminal symbols on the right hand side, possibly with a nonterminal as the last symbol symbol. ƒ Analysable with automata (used in scanners) ƒ A BNF BNF-grammar grammar with one nonterminal on the left side of all productions ƒ Almost all programming languages defined this way ƒ Analysable with parsers

ƒ Type 1 languages («contekst-sensitive») ƒ require that the right hand side is at least of the same length as the left hand side. Makes it possible to define name binding and type information. Seldomly used.

IF 33110/4110 - 22007

ƒ Context-free languages (type 2)

ƒ Type 0 langauges: no restrictions. restrictions ƒ Only of theoretical interest. 8/31/2008

21

Scanning Program (source)

Scanner

tokens

Parser

8/31/2008

BEGIN

IDENT

LPAR

TEXT

RPAR

END

begin

OutText

(

”Hallo”

)

end

ƒ A scanner is normally y constructed as a deterministic automata

IF 33110/4110 - 22007

ƒ A scanner groups characters to symbols called tokens

22

Parsing

::=

+ |

::=

* |

IF 33110/4110 - 22007

ƒ To check that a sentence (or a program) is syntactically correct, that is to construct the corresponding syntax tree. tree ƒ In general we would like to construct the tree by reading g the sentence once, from left to right. ƒ Example grammar

ƒ This example p shows that p parsing g may y not be done by y means of deterministic automata 8/31/2008

23

Top-down parsing The parse tree is constructed downwards, that is we start with the start symbol and try to derive the actual sentence by selecting appropriate rules: + |

::=

* | exp exp

term

IF 33110/4110 - 22007

::=

term

8/31/2008

num

*

num

+

num

24

Bottom-up parsing

p ::=

p + |

::=

* |

exp exp

term

IF 33110/4110 - 22007

The tree is constructed upwards. Starts by finding part of the sentence the corresponds to the right h d side hand id off a production d i and d reduces d this hi part off the sentence to the corresponding nonterminal. goal is to reduce until the start symbol. y The g

t term num

8/31/2008

*

num

+

num

2 5

LL(1)-parsing ƒ LL(1)-parsing is a top-down strategy with a left derivation from the start symbol. ƒ Recursive R i d descentt

ƒ For each terminal in the right hand side: Check that the next symbol (from the scanner) is this terminal. ƒ For each nonterminal in the right hand side: Call the corresponding method. method

ƒ When the method is called, the scanner shall have as its next symbol the first symbol of the corresponding production. ƒ When the method is finished, the scanner shall have as its next symbol the first symbol after the sentence. 8/31/2008

IF 33110/4110 - 22007

ƒ To each nonterminal there is a method. ƒ The method takes care of the rule for for this nonterminal, and may call other methods methods.

26

Example

8/31/2008

IF 33110/4110 - 22007

→ → + → | | i t → ? i bl → ! → = p →+| → | → v → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 static void assignment() { → + variable(); readSymbol('='); variable(); operator(); operand(); }

27

Example

IF 33110/4110 - 22007

→ → + → | | → ? → ! → = → + | → | → v → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 static void stmt() () { → + if (checkSymbol('v')) { assignment(); } else if (checkSymbol('?')) { input(); } else if (checkSymbol('!')) { output(); } 8/31/2008

28

Meta models ƒ Alternative to grammars and syntax trees ƒ Object model representing the program (not the execution) D

A

A

B

C

→ | |

IF 33110/4110 - 22007

D

n..m 1 0..1 0..* *

statement

assignment

8/31/2008

if-then-else

while-do

2 9

Why meta models?

ƒ Meta models often include binding and type information in addition to the pure abstract syntax tree

8/31/2008

IF 33110/4110 - 22007

ƒ Inspired by abstract syntax trees in terms of object structures, interchange formats between tools ƒ Not all modeling/programming tools are parser parser-based based (e.g. wizards) ƒ Growing interest in domain specific languages, often with a mixture of text and graphics

30

Example Metamodel

IF 3110/4110 - 2007

8/31/2008

31