Syntax/semantics Program program execution Compiler/interpreter Syntax Grammars Regular expressions Syntax diagrams Automata Scanning/Parsing
Meta models
8/31/2008
IF 33110/4110 - 22007
1
Program program execution
8/31/2008
Semantics S ti
IF 33110/4110 - 22007
S t Syntax
2
Syntax Semantics A description of a programming language consists of two main components: Syntactic rules: what form does a legal program have. Semantic rules: what the sentences in the language mean.
8/31/2008
IF 33110/4110 - 22007
Static semantics: rules that may be checked before the execution of the program, e.g.: • All variables must be declared. • Declaration and use of variables coincide (type check). Dynamic semantics: rules saying what shall hapen during (as part of) f) the th execution ti off the th program, e.g. in i tterms off an operational ti l semantics, that is a semantics that describes the behaviour of a (idealised) abstract prosessor/machine performing a program, or by mapping pp g to something g else ((but well-known and well-defined). )
3
Compiler/interpreter Program (source)
tokens
Parser
Parse/ (Abstract) Syntax Tree
An interpreter reads a program and simulates its operations. Both are based upon a syntax tree representation of the program
8/31/2008
Static Semantic Checker
Abstract Syntax Tree decorated
Code Generator
Machine/ Byte Code
IF 33110/4110 - 22007
A compiler translates a program to another l language, ttypically i ll a machine language or to a language g g for a virtual machine.
Scanner
Virtual tua Machine
I t Interpreter t
Machine
4
Syntax described by BNF-grammars production, rule n e+e e-e d nd d 0 1
nonterminal
terminal
9 terminal
IF 33110/4110 - 22007
e :: ::= e ::= e ::= n ::= n ::= d ::= d ::= ... d ::=
metasymbol 8/31/2008
5
Extended BNF In Extended BNF we can use the following metasymbols on the righthand side: alternatives
?
optionality p y
*
zero or more times
+
one or more times
{...}
grouping symbols
IF 33110/4110 - 22007
|
e ::= n | e + e | e - e n ::= d | nd d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 8/31/2008
6
e
Derivation of sentences
e ::= n | e + e | e - e n ::= d | nd d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 8/31/2008
e
n n
d
d
0
e
e
+
e
n
1 e
n
n
d
n
d
5
d
1
e
-
e
10
e
+
15
-
IF 33110/4110 - 22007
The possible sentences in a language defined by a BNF-grammar are those that emerge by following this procedure: 1. Start with the start symbol (e). 2 For each nonterminal symbol (e 2. (e, n n, d) exchange this with one of the alternatives on the right hand side of the p production defining g this nonterminal. 3. Repeat §2 until only terminal symbols. Thi is This i called ll d a derivation d i ti from f the th start t t symbol to a sentence, and it may be represented by a parse tree / syntax tree Removing unnecessary derivations and nodes like * and + gives an abstract syntax tree
d 2
1
e 12
10 - 15 + 12 7
10 - 15 + 12 ? e
e
-
e
10
e
+
15
10 – (15 + 12) = 10 – 27 = -17 8/31/2008
e
e
e
12
10
-
+
e
e
12
15
IF 33110/4110 - 22007
e
(10 – 15) + 12 = -5 + 12 = 7 8
Unambiguous/ Ambiguous grammars If every sentence t in i the th language l can be b d derived i db by one and d only l one parse tree, then the grammar is unambiguous (entydig), otherwise it is ambiguous (flertydig). e ::= 0 | 1 | e + e | e - e | e * e IF 33110/4110 - 22007
Ambiguity g y handled by yp precedence and associativityy rules e e
e
8/31/2008
e
e e
e
e
1
e
-
1
+
1
1
-
1
e
+
0 9
8/31/2008
v := e | s ; s | if b then e s | if b then e se else se s x |y |z v |0 |1|2|3|4 e=e
Transitions marked with terminals, one start state and a number of stop states Recognizes a string in the language if the terminals represent a valid sequence of transitions ending up in a stop state upon reading the last symbol
16
If not in the language – e.g. not a valid sequence of transitions
However, not allowed with l di 0 or ”d leading ”decimalpoint” i l i t” without preceeding or following digits, so the following is not allowed:
ibp ::= 1 ibp | 0 ibp | bp
0 1 19
8/31/2008
1
0 .
1
0
0
1 IF 33110/4110 - 22007
1
ε
ε
ε 20
8/31/2008
ε
Types of languages Regular languages (type 3) A BNF-grammar with one nonterminal on the left side of all productions and only terminal symbols on the right hand side, possibly with a nonterminal as the last symbol symbol. Analysable with automata (used in scanners) A BNF BNF-grammar grammar with one nonterminal on the left side of all productions Almost all programming languages defined this way Analysable with parsers
Type 1 languages («contekst-sensitive») require that the right hand side is at least of the same length as the left hand side. Makes it possible to define name binding and type information. Seldomly used.
IF 33110/4110 - 22007
Context-free languages (type 2)
Type 0 langauges: no restrictions. restrictions Only of theoretical interest. 8/31/2008
21
Scanning Program (source)
Scanner
tokens
Parser
8/31/2008
BEGIN
IDENT
LPAR
TEXT
RPAR
END
begin
OutText
(
”Hallo”
)
end
A scanner is normally y constructed as a deterministic automata
IF 33110/4110 - 22007
A scanner groups characters to symbols called tokens
22
Parsing
::=
+ |
::=
* |
IF 33110/4110 - 22007
To check that a sentence (or a program) is syntactically correct, that is to construct the corresponding syntax tree. tree In general we would like to construct the tree by reading g the sentence once, from left to right. Example grammar
This example p shows that p parsing g may y not be done by y means of deterministic automata 8/31/2008
23
Top-down parsing The parse tree is constructed downwards, that is we start with the start symbol and try to derive the actual sentence by selecting appropriate rules: + |
::=
* | exp exp
term
IF 33110/4110 - 22007
::=
term
8/31/2008
num
*
num
+
num
24
Bottom-up parsing
p ::=
p + |
::=
* |
exp exp
term
IF 33110/4110 - 22007
The tree is constructed upwards. Starts by finding part of the sentence the corresponds to the right h d side hand id off a production d i and d reduces d this hi part off the sentence to the corresponding nonterminal. goal is to reduce until the start symbol. y The g
t term num
8/31/2008
*
num
+
num
2 5
LL(1)-parsing LL(1)-parsing is a top-down strategy with a left derivation from the start symbol. Recursive R i d descentt
For each terminal in the right hand side: Check that the next symbol (from the scanner) is this terminal. For each nonterminal in the right hand side: Call the corresponding method. method
When the method is called, the scanner shall have as its next symbol the first symbol of the corresponding production. When the method is finished, the scanner shall have as its next symbol the first symbol after the sentence. 8/31/2008
IF 33110/4110 - 22007
To each nonterminal there is a method. The method takes care of the rule for for this nonterminal, and may call other methods methods.
Meta models Alternative to grammars and syntax trees Object model representing the program (not the execution) D
A
A
B
C
→ | |
IF 33110/4110 - 22007
D
n..m 1 0..1 0..* *
statement
assignment
8/31/2008
if-then-else
while-do
2 9
Why meta models?
Meta models often include binding and type information in addition to the pure abstract syntax tree
8/31/2008
IF 33110/4110 - 22007
Inspired by abstract syntax trees in terms of object structures, interchange formats between tools Not all modeling/programming tools are parser parser-based based (e.g. wizards) Growing interest in domain specific languages, often with a mixture of text and graphics