Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3 Describing Syntax and Semantics ISBN 0-321-33025-0 Language • Alphabet: A finite set of letters. Example Σ ={a, b, c, d} • Language: A ...
3 downloads 3 Views 546KB Size
Chapter 3 Describing Syntax and Semantics

ISBN 0-321-33025-0

Language

• Alphabet: A finite set of letters. Example Σ ={a, b, c, d}

• Language: A subset of the set Σ * which is the set of all strings of letters from set Σ (* is called Kleene’s star, this definition allows the null sequence). • Grammar: A rule for generating a subset of Σ *. • Syntax: Form of allowed strings (sentences) in a language. Syntax is related to grammatical correctness. What its program looks like (expressions, statements and program units). Example Syntax of C if statement is: if () Comp4730/Melikyan/Spring06

1-2

Language • Semantics: Meaning of sentences.(expressions, statements and program units)

Example : Semantics of the above statement is: Evaluate the ; if the value is true then the embedded is selected for execution. Sentence or statement: a string of language The syntax rules: which strings of characters from the alphabet are in language

Lexemes: Meaningful units which compose a sentence (words in a language). The lexemes of programming language include its identifiers, literals, and special words.

Comp4730/Melikyan/Spring06

1-3

Language • Token: Category of lexemes Example: A sentence

if index > 2 then count := 17 ;

Is broken into the following lexemes by the lexical analyzer part of a compiler. The lexemes are then classified into the following tokens. Token identifier (id) constant (const) relational operator (relop) assignment operator (:=) end of statement (;) if then

Comp4730/Melikyan/Spring06

Lexemes index ,count 2 ,17 > := ; if then

1-4

Steps of compiler Lexical analyzer: The first phase of a compiler. It reads the stream of characters making up the source program from left-to-right and groups into tokens. Parser (syntactic analyzer): The second phase of a compiler. It groups tokens hierarchically into phrases with collective meaning. Symbol table: A data structure containing a record (attributes such as type) for each identifier (user-defined name) in the program The lexical analyzer stores particular lexemes in a symbol table, and then passes only a categorized statement composed of token to a syntactic analyzer (parser) for the grammatical correctness of the statement to be checked. Comp4730/Melikyan/Spring06

1-5

LANGUAGE GENERATOR AND RECOGNIZER In general, language can be formally defined in two ways Recognizer: Either accepts or rejects an input string. The syntax analysis part of a compiler is a recognizer for the language the compiler translates. In this role the recognizer( syntax analyzer more general compiler) need is only to determine whether the given sentence belong to language Generator: A device to generate the sentences (strings of characters in the alphabet) of a language. Relation between a generator and a recognizer: Given a context free grammar equivalently BNF or generator), a recognizer by the grammar can be algorithmically constructed. We will discuss this relationship little later.

Comp4730/Melikyan/Spring06

1-6

Generative Grammar • Builds a sentence in a series of applications of well specified rules.

Example Productions (rules) → → → a OR → the → dog OR → cat → chases OR → meets

Derivation of a sentence

⇒ ⇒ ⇒ The ⇒ The dog ⇒ The dog chases ⇒ The dog chases ⇒ The dog chases a ⇒ The dog chases a cat

The symbol ⇒ is read “derives Comp4730/Melikyan/Spring06

1-7

Definition A generative grammar G is a quadruple (T, N, S, R), where T a finite set of terminal symbols N a finite set of nonterminal symbols S a unique starting symbol (S Î N) R a finite set of productions (or rules) of the form α ® β, where α and β are strings of nonterminals and terminals. The left hand “can be replaced by” the right hand during generation. Terminal: Actual words or tokens. Nonterminal: Abstraction of syntactic structural elements such as , , Derivation: Sentences in a language are generated by starting with the starting symbol and applying productions until we are left with nothing but terminals. A process of generating a sentence is called derivation. • A generative grammar derives a sentence by beginning with the start symbol and repeatedly replacing a nonterminal by the right side of a production for that nonterminal. • The set of token strings that can be derived from the start symbol forms the language defined by the generative grammar. Comp4730/Melikyan/Spring06

1-8

TYPES OF GRAMMARS (THE CHOMSKY HIERARCHY Let a, b,... be terminals; A, B,... be nonterminals; α, β,... be strings of terminals and nonterminals.

1. Regular (Type 3) Grammar • Allowed productions: A→a A → aB • Syntax of tokens is described by the regular grammar • Recognized by a finite state automaton (FSA): Comp4730/Melikyan/Spring06

1-9

Definition of FSM

A finite state machine is a quintuple M = (Q, Σ, f, q0 , F), where Q : a finite set of states Σ : a finite set of input symbols q0∈ Q : a starting state F ∈Q : a set of accepting state f: (Qx Σ) Æ Q : a state-transition function ⊆∉ (the domain of the function is a Cartesian product of the state and input sets; the range is the state set). It defines how M will change state given any combination of present state and input. In a functional form, a combination of current state qi ∈ Q and an input a∈ Σ produces an image qj according to a function, f(qi ,a) = qj . The FSA accepts a string x of input symbols if the sequence of transitions corresponding to the symbols of x leads to an accepting state. Comp4730/Melikyan/Spring06

1-10

Context Free (Type 2) Grammar

Allowed productions (production is unique free of context): AÆα • • Syntax of programming languages can be specified by context free • grammar with minor exceptions (cf. attribute grammar for static semantics). • Recognized by a pushdown automaton

Comp4730/Melikyan/Spring06

1-11

Definition of pushdown automaton A pushdown automaton M is a 7-tuple M = (Q, Σ, Γ, δ, q0 , Z0 , F) Q : a finite set of states Σ: a finite set of input symbols Γ: a finite set of stack symbols a starting state q0 ∈ Q: a starting stack symbol Z0 ∈ Γ: F ∈ Q: a set of accepting state δ : (Q ´ (Σ {ε}) ´ Γ) Æ Q ´Γ∗ A mapping from a Cartesian product of the state set, input set, and stack symbols to the state set. The machine M can change the state (called an ε-move) without any input. * is the Kleene star, denoting any number of repetitions.

The machine is nondeterministic, having some finite number of choices in each situation. The mapping is either of the followings: δ(q ∈ Q, a ∈ Σ, Z ∈ Γ) = {(q1 ∈ Q, γ1 ∈ Γ∗),...,(qm , γm )} or δ(q ∈ Q, ε, Z ∈ Γ) = {(q1 ∈ Q, γ1 ∈ Γ∗),...,(qm , γm )} Comp4730/Melikyan/Spring06

1-12

Grammar TYPE 0 and TYPE 1 Context Sensitive (Type 1) Grammar · Allowed productions (a nonterminal converts depending on context γ and δ): γAδ ® γβδ ( β may not be ε) · Recognized by a linear bounded Turing machine

Unrestricted (Type 0) Grammar Allowed productions α→β Recognized by a Turing machine (an abstract of a stored-program computer).

Comp4730/Melikyan/Spring06

1-13

Turing machine The machine consists of: i) a finite control like a CPU; ii) a "rewritable" tape like a memory; iii) a tape head that scans one cell of a tape at a time. The tape has a leftmost cell but is infinite to the right. Initially the n leftmost cell hold a string of input symbols. The remaining cells hold a blank symbol. 15 Given a combination of current state and input tape symbol, the machine can: i) change state; ii) print a symbol on the tape cell scanned, replacing what was written there; iii) move its head left or right one cell

Comp4730/Melikyan/Spring06

1-14

Definition

A Turing machine is a 7-tuple M = (Q, Σ, Γ, f, q0 , B, F) Q: a finite set of states Γ: a finite set of tape symbols B Î Γ: a blank Σ Í Γ: a finite set of input symbols not including B q0 Î Q: a starting state Z0 Î Γ: a starting stack symbol F Í Q: a set of accepting state f: (Q x Γ) ® Qx Γx {L, R} A next move function Comp4730/Melikyan/Spring06

1-15

ALL together

Comp4730/Melikyan/Spring06

1-16

BNF: Backus-Naur Form In midl 1950s Chomsky, a linguist introduced four (type 0 – type3)classes of generative devises ( or grammars). Two of these grammar classes context-free regular turned out to bee useful for describing the syntax of programming Languages The tokens can be described by regular grammars. Whole programming languages, with minor exceptions, --context-free grammars. 1959 Backus presented description of ALGOL-58 using new formal notations for specifying prog_lang syntax. This notation was later modified by Peter Naur u for ALGOL-60. That revisited version became as Backus-Naur Form or simply BNF. It is remarkable that BNF it was nearly identical to Chomskys generative device for context-free languages called context free grammars Comp4730/Melikyan/Spring06

1-17

Metalanguage A language that is used to describe another language ( prl). BNF is a metalanguage for prl’s. BNF uses abstractions for syntactic structures. Example: A simple assignment statement in C might be reperesented by abstraction < assign> And the actual definition of may be given by < assign> Æ = ( often is called production or rule) Abstrictions – nonterminals lexemes, tokens of the rules are called -terminals BNF description, or grammar, is simply a collection of rules Comp4730/Melikyan/Spring06

1-18

MORE BNF ( lists in FNF) Nonterminal symbols can have more than one distinct definitions. In BNF we use | as a logical OR . Multiple definition in a single rule by using logical OR: Example ( Pascal's if statement ) ® if then | if then else Recursion : a rule is recursive if LHS appears in its RHS. Using recursion we are able to describe variable length lists → identifier | identifier,

Comp4730/Melikyan/Spring06

1-19

Grammars For Simple Language and Derivations A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols)

Here is an example of Simple language ( 3.1 texbook ) → begin end → | ; → := → A | B | C → + | - |

The language has only one statement form: assignment. Programs, expressions, operations and variable names are specified Comp4730/Melikyan/Spring06

1-20

® begin end ® | ;

→ +

| ® := - ® A | B | C | Derivation ⇒ begin end ⇒ begin ; end ⇒ begin := ; end ⇒ begin A := ; end ⇒ begin A := + ; end ⇒ begin A := B + ; end ⇒ begin A := B + C; end ⇒ begin A := B + C; := end ⇒ begin A := B + C; B := end ⇒ begin A := B + C; B := end ⇒ begin A := B + C; B := C end

Comp4730/Melikyan/Spring06

1-21

Every string of symbols in the derivation is a sentential form. A sentence is a sentential form that has only terminal symbols. A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded. A derivation may be neither leftmost nor rightmost.

By choosing alternative RHSs of rules one will end up with different sentences. Even in Simple language has infinitely many sentences therefore all sentences cannot be generated in finite time

Comp4730/Melikyan/Spring06

1-22

PARSE TREE Derivation in a generative grammar is represented by a parse tree. It pictorially shows how the start symbol of a grammar derives a string in he language. A parse tree is a tree with the following properties: 1) Root is labeled with the starting symbol; 2) Each leaf is labeled with a terminal symbol (token); 3) Each internal node is labeled with a nonterminal symbol; 4) If A is the nonterminal labeling some interior node and X 1 , X 2 ,..., X n are the labels of the children of that node from left to right, then A --> X 1 X 2 ...X n is a production. Here X 1 , X 2, ..., X n stand for a symbol that is either a terminal or nonterminal. In other words, parent-children relation « production. Parse tree is a mathematically well-defined representation of a syntactic structure of a sentence

Comp4730/Melikyan/Spring06

1-23

(Example) Simple language

Comp4730/Melikyan/Spring06

1-24

AMBIGUITY—DANGLING ELSE PROBLEM A grammar is ambiguous if it generates a sentence for which there are two or more different parse trees. A usual solution to ambiguity is to introduce more nonterminals.

Dangling-else problem → if then |if thenelse → Ambiguous sentence ifthen ifthenelse

Comp4730/Melikyan/Spring06

1-25

Parse Tree for Ambiguous sentence

Comp4730/Melikyan/Spring06

1-26

Example 3.3(textbook)

Æ = Æ A|B|C Æ + | * |() |

Comp4730/Melikyan/Spring06

1-27

Two distinct parse trees for the same sentence, A = B + C * A (Figure 3.2)

Comp4730/Melikyan/Spring06

1-28

Solution A solution of the dangling-else problem: The rule for if construct in most languages is that an else clause is matched with the nearest previous unmatched then; between a then and its matching else, there cannot be an if statement without an else;Only (b) in the previous figure is legal. Two nonterminals and are introduced to distinguish the two different situations.

Comp4730/Melikyan/Spring06

1-29

Correct Grammar Æ| Æ ifthenelse |any non-if statement

Æ if then |ifthenelse

Note that between then and else only appears

Comp4730/Melikyan/Spring06

1-30

Parse Tree

Comp4730/Melikyan/Spring06

1-31

ASSOCIATIVITY OF OPERATORS

If an expression 9+5+2 is equivalent to (9+5)+2, we say that the operator + associates to the left because an operand with plus signs on both sides of it is taken by the operand to its left. The assignment operator = in C is right associative so that a=b=c is treated as a=(b=c)

Comp4730/Melikyan/Spring06

1-32

Right-associative assignment operator → =| → a | b | . . . | z When a production has its left-hand side also appearing at the beginning(end) of its right-hand side, the rule is said to be left(right)recursive. The left recursion specifies left associativity.

Comp4730/Melikyan/Spring06

1-33

OPERATOR PRECEDENCE Conventionally 9+5*2 is interpreted as 9+(5*2). We say that * has higher precedence than + if * takes its operands before + does. An operator with higher precedence attracts operands stronger. In other words, operator precedence = operands affinity. Precedence can be expressed in a grammar by creating multiple levels of nonterminals

Comp4730/Melikyan/Spring06

1-34

Ambiguous grammars for expression

.

Æ + | * | ÆA | B | C The expression A + B * C has two distinct derivations.

Comp4730/Melikyan/Spring06

1-35

Unambiguous Grammar by Layering Æ + | Æ * | Æ A | B | C

Comp4730/Melikyan/Spring06

1-36

Extended BNF

Extended BNF ( EBNF): 1. Optional parts are placed in brackets ([]) -> ident [ ( )] 2. Put alternative parts of RHSs in parentheses and separate them with vertical bars -> (+ | -) const 3. Put repetitions (0 or more) in braces ({}) -> letter {letter | digit}

Comp4730/Melikyan/Spring06

1-37

EXAMPLE: BNF and EBNF version of an Expression Grammar

BNF:

EBNF:

Æ + | - | Æ | * | / |

Æ { ( + | - ) } Æ { ( * | / ) }

Comp4730/Melikyan/Spring06

1-38

Syntax Graph The information in BNF and EBNF rules can be represented in a directed graph, which is called syntax graph. A separate graph is used for each syntactic unit. Put the terminals in circles or ellipses and put the nonterminals in rectangles; connect with lines with arrowheads

Comp4730/Melikyan/Spring06

1-39

The syntax graph and EBNF descriptions of the Ada if statement (Figure 3.6)

Comp4730/Melikyan/Spring06

1-40

Attribute Grammars

Static semantics: “Syntax” which cannot be described in the context free grammar (or BNF). Many of static semantic rules state type constraints such as type compatibility. Example) Is an integer variable assigned a real value? (In Pascal no; in C yes) - tedious in BNF. A variable must be declared before it is referenced (context!) – indescribable in BNF. Cfgs cannot describe all of the syntax of programming languages - Additions to cfgs to carry some semantic info along through parse trees Primary value of AGs: 1. Static semantics specification 2. Compiler design (static semantics checking) Comp4730/Melikyan/Spring06

1-41

Attribute

Attribute: Grammatical rules such as type constraint can be incorporated into a grammar by associating each grammar symbol (nonterminal or terminal) with a kind of variable called attribute which specifies, e.g.: i) The type of an instance of a variable; ii) Whether the variable has already been declared or not.

Comp4730/Melikyan/Spring06

1-42

Attribute grammar

Specifies: i) Ways to compute attributes given a production (attribute computation function). ii) Legal combination of attributes of nonterminals allowed in productions (predicate function) For each grammar symbol x there is a set A(x) of attribute values Each rule has a set of functions that define certain attributes of the nonterminals in the rule Each rule has a (possibly empty) set of predicates to check for attribute consistency

Comp4730/Melikyan/Spring06

1-43

Attribute computation function (formal)

The set A(X) consists of two disjoint sets S(X) – synthesized attributes and I(X) –inherited attributes. For a production X0 Æ X1 ...Xn the synthesized attribute of X0 are computed with a semantic function S(X0 ) = f(A(X1 ),...,A(Xn )) Inherited attributes of symbols Xj (1.. n) are computed with a semantic function I(Xj ) = f(A(X0 ),...,A(Xj-1 ))

Comp4730/Melikyan/Spring06

1-44

Attribute Grammars • Context-free grammars (CFGs) cannot describe all of the syntax of programming languages • Additions to CFGs to carry some semantic info along parse trees • Primary value of attribute grammars (AGs) – Static semantics specification – Compiler design (static semantics checking)

Comp4730/Melikyan/Spring06

1-45

Attribute Grammars : Definition • An attribute grammar is a context-free grammar G = (S, N, T, P) with the following additions:

– For each grammar symbol x there is a set A(x) of attribute values – Each rule has a set of functions that define certain attributes of the nonterminals in the rule – Each rule has a (possibly empty) set of predicates to check for attribute consistency

Comp4730/Melikyan/Spring06

1-46

Attribute Grammars: Definition • Let X0 → X1 ... Xn be a rule • Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define synthesized attributes • Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for i + | A | B | C

• actual_type: synthesized for and • expected_type: inherited for

Comp4730/Melikyan/Spring06

1-51

How are attribute values computed 1.

If all attributes were inherited, the tree could be decorated in top-down order.

2. If all attributes were synthesized, the tree could be decorated in bottom-up order. 3. In many cases, both kinds of attributes are used, and it is some combination of top-down and bottom-up that must be used.

Comp4730/Melikyan/Spring06

1-52

Synthesized attributes Used to pass semantic information up a parse tree, while inherited attributes pass semantic information downa tree. In the above example, actual_type lhs_type: synthesized attributes expected_type: inherited attribute Semantic information thus “flows” up and down in a parse tree!

Comp4730/Melikyan/Spring06

1-53

Predicate function (Example)

.actual_type = .expected_type

where attribute expected_type is the expected type for the expression, as determined by the type of the variable at the left hand side of the assignment statement. The only allowed derivations are those in which the predicates associated with every nonterminal are all true. Computing attributes requires the construction of a dependency graph to show attribute dependencies. (Example) Procedure name in Ada Syntax rule: Æ proc[1]end[2] Semantic rule: [1].string= [2].string

Comp4730/Melikyan/Spring06

1-54

Dynamic Semantics

Dynamic semantics: Meaning of a program. Corresponding to three different views of computation, there are three approaches to describing the meaning of a program. No single widely acceptable notation or formalism for describing semantics

View on computation Semantics i) mechanical view (imperative ) i) operational semantics ii) functional view (applicative ) ii) denotational semantics iii) logical view (declarative ) iii) axiomatic semantics The above formal semantics are still research subjects. Natural English description is usually employed in practice

Comp4730/Melikyan/Spring06

1-55

Operational semantics:

The meaning of a program is described by the execution of a machine. The change in the machine’s state (the contents of registers, program counter, and memory) when executing a given statement defines the meaning of that statement. Virtual machine: To make the operational semantics independent of individual hardware, a low-level virtual machine can be used which is implemented as a software simulation. (cf.Knuth’s MIX computer in his “The Art of Programming”.)

Comp4730/Melikyan/Spring06

1-56

Denotational Semantics Operational semantics uses a programming language (like an assembl language) to describe a language. Axiomatic semantics and denotational semantics use formal mathematical methods to describe semantics. Denotational semantics: Uses mathematical function to describe the meaning of a program. It defines for each language entity both a mathematical object and a function that maps instances of that entity onto instances of the mathematical object. The state of a program: In denotational semantics, the state s of a program is defined by a set of ordered pairs s = {(i1,v1 ), (i2,v2 ),...,(in,vn )}, where i is the name of a current variable, and the associated v is the current value of that variable. For each language construct, a mathematical function must be defined as follows:

Comp4730/Melikyan/Spring06

1-57

Axiomatic semantics Uses formal logic to prove the correctness of a program. Rather than the entire state of an abstract machine, each statements both preceded and followed by a logical expression (called predicate or assertion) that specifies constraints (precondition and postcondition) of program variables before and after the execution of the statement. Weakest precondition: Given a statement and a postcondition, the weakest condition is the least restrictive precondition that will guarantee the validity of the associated postcondition. (Example) {x > 0} sum = 2*x + 1; {sum > 1}

Comp4730/Melikyan/Spring06

1-58

If the weakest precondition can be calculated from the given postcondition for each statement of a language, then correctness proofs can be constructed starting from the last statement and by working backwards until the precondition to the start of the program is calculated. {precondition} ...; ...; ...; {postcondition} Imagine the postcondition for the entire program is that every defined variable has a value within the range of the corresponding type. The precondition tells us what is the required condition to satisfy the postcondition after the execution of the program.

Comp4730/Melikyan/Spring06

1-59

Predicate calculus Preconditions are calculated from postconditions by applying axioms and inference rules. Axiom: A logical statement that is assumed to be true. (Example) Axiom for an assignment statement. x := E The axiom for this statement is that given a postcondition Q, P = Qx®E which means that precondition P is computed as Q with all instances of x replace by E. For example, if we have an assignment statement with a postcondition, a := b/2 - 1 {a < 10} the weakest precondition is computed by substituting b/2 -1 in the assertion {a < 10}, i.e., b/2 - 1 < 10 or b < 22 So we obtain {b < 22} a := b/2 - 1 {a < 10}

Comp4730/Melikyan/Spring06

1-60

Inference rule

A method of inferring the truth of one assertion on the basis of other true assertions. (Example) Inference rule for a if-then-else statement {B and P} S1 {Q}, {(not B) and P} S2 {Q} {P} if B then S1 else S2 {Q} This reads “If {B and P} S1 {Q} is true and {(not B) and P} S2 {Q} is true, then the truth of {P} if B then S1 else S2 {Q} is inferred.” if (x > 0) y = y - 1; else y = y + 1; {y > 0} The precondition is {y - 1 > 0} and {y + 1 > 0}; that is {y > 1}. Comp4730/Melikyan/Spring06

1-61

CHAPTER III HOMEWORK

1.(Review Questions) Answer all the questions ( ## 1 - 17) on the page 170 from your textbook 2. Do all listed problems #2 a), c) #4 d), #6 b), #7 c), #8, #11, #12, #13. On page 170-172 ( 7th Edition of textbook ) Assigned 02/02 /06 Due 02/09/06 Please send the solutions via email to [email protected] and hand in hard copy by the beginning of the class

Comp4730/Melikyan/Spring06

1-62