Analysis and its Application

Diss. ETH No. 9328 A Framework for Syntactic and Morphological Analysis and its Application in a Text-to-Speech System A dissertation submitted to t...

Author: Amberly Melton

1 downloads 0 Views 5MB Size

Report

Download PDF

Recommend Documents

Elasticity and Its Application

Elasticity and its Application

ELASTICITY AND ITS APPLICATION

Elasticity and its Application

Elasticity and Its Application

Predicting Podcast Preference: An Analysis Framework and its Application

Tsallis Wavelet Entropy and Its Application in Power Signal Analysis

ROBUST KALMAN FILTER AND ITS APPLICATION IN TIME SERIES ANALYSIS

Wine and its analysis

5 Elasticity and its Application

Analysis of the Conservation Law practical aspects for its application

Chapter 5: Elasticity and Its Application

LESSON 3. Elasticity and Its Application

Design and Its application of Microprocessor

Validity and Suggestions for Its Application

Managerial Economics. Elasticity and Its Application

Chapter 5: Elasticity and Its Application

Chapter 3 Elasticity and Its Application

THE JORC CODE ITS OPERATION AND APPLICATION

GPR and Its Application to Environmental Study

Aeration and its application in groundwater purification

PROCEDURAL FAIRNESS ITS SCOPE AND PRACTICAL APPLICATION

Performance Evaluation through Data. Envelopment Analysis Technique and Balanced. Scorecards Approach and its Application in Bank

ECONOMIC ANALYSIS OF LAW : THE CONCEPT AND ITS APPLICATION IN THE LAW AND PUBLIC POLICY

Diss. ETH No. 9328

A Framework for

Syntactic and Morphological Analysis and its Application in a Text-to-Speech System A dissertation submitted to the

SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZÜRICH

for the

degree

of

Doctor of Technical Sciences

presented by THOMAS RUSSI

Dipl. El.-Ing. ETH 13, 1960 of Andermatt, Switzerland

born December Citizen

aecepted

on

the recommendation of

Prof. Dr. W. Prof. Dr. A.

Guggenbühl, Kündig,

1990

*-*

corresponds ras+st

to

a

surface

z:z

or

_._

Otherwise,

s.

rast, fliess+st

«-+

_¦;_*,

t:t.

:0>

16

Chapter

The

epenthesis rule describes the

?-+

widmA+st

The deleted

*-*

(realized

an

e

tense)

in the surface and

endings.

widmest

feature

the type of stem is encoded form5 with special symbols which are (and phonetic)

morphological

into the citation

and past

arbeitA+st *-* arbeitest, wartA+st «-» wartest, leidet, hiessC+st *-* hiessest ebnA+st *-* ebnest,

Examples: leidA+t

insertion of

(present

form between verb stems

Formalisms

2.

as

null

indicating

symbols)

in the surface form.

In the

previous example, the *-+ Operator was used to define that an morpheme boundary in the surface form if and if of the restriction holds. Although this Operator is by context one only far the most frequently used, there are two other Operators which can be used as well. The Operators have the same meaning as in Koskenniemi e can

be inserted at the

[Kos83b,

p 37

ff]:

context restriction:

a :

b

The lexical character

—?

o

LC

RC

matches the surface character

it is in the context of LC and RC. The

pair

.

when

only

a:b cannot appear in

any other context. surface coercion:

a :

b«— LC

In the context LC and

surface character b and

combined rule: This is

a

a :

b

*-*

RC, a lexical nothing eise.

LC

character

a

matches

o

in that context.

In the next

give

a

section, we relate two-level ruies with finite procedural interpretation for two-level ruies.

5The encoding

mic) string

of

morphological

basically

features into the lexical

is somewhat awkward and introduces

eral modifications have been

add

an

co¬

matches the

in the context LC and RC and that a:b

only

only pair allowed

a

RC

It states that the lexical character

surface character b

and

only

combination of the context restriction and surface

ercion ruies.

is the

RC

(graphemic

redundancy

automata

and

phone¬

in lexical entries. Sev¬

proposed ([Eme88], [Bea88b], [Bea88a], [Tro90]),

additional mechanism to the two-level ruies to

access

which

lexical features.

17

Two-Level Formalism

2.1.

Ruies and Finite Automata

2.1.3

regulär expressions to state in declarative manner strings pairs consisting of a lexical and a surface symbol. There are two basic approaches to processing regulär expressions. One possibiUty is to have them processed directly by an interpreter. This approach is pursued by Bear [Beei86], who implemented em extended version of the two-level model. In Bear's system, ruies are directly inter¬ preted as constraints on pairings of surface strings and lexical strings. The second approach is to apply a well-known theorem of automata theory, which says that, for every regulär expression r, a deterministic automaton can be constructed which accepts the language L(r) (see, for example, Hopcroft [HU79, p 28 ffj). The compiling of two-level ruies into finite automata was put forward by Koskenniemi [Kos83b] and is pursued in this project as well. The description of such a Compiler does not lie within the scope of this dissertation (see, for example, KartunTwo-level ruies

use

the set of

nen

of

[KKK87]). However,

and transition

•

The

graphs

operational

we

introduce the definitions of finite automata

for the

following

reasons:

semantics revealed

by

the finite automaton nota¬

tion is contrasted to the declarative notation of the two-level rule. This leads to

a

better

understanding

of the

procedural interpre¬

tation. •

The transition network formalism described in Section 2.3 is based

the concept of finite automata.

on

following sections we shall strive to use the same symbols to same things. We adopt the notation of Hopcroft [HU79] as possible. Unless it is stated otherwise, the reader may assume

In the

denote the far

as

that:

1.

Q

is the set of states of

the

symbols

2. E is

3. 6 is 4. F is

an

a

a

an

q and p, with

automaton, qo is the initial state, and or

without

input aiphabet; symbols

state transition funetion. set of final states.

a

subscripts,

and b

are

are

states.

input symbols.

Chapter

18

5. w,

_

and

z are

(consisting

of

strings zero

symbols;

of input

(DFA)

consists of

set of transitions from state to state that

a

symbols. We formedly define

a

DFA

Definition 2.1 A deterministic

(Q,E,6,q0,F)

as

finite

is

a

finite

set

of states,

(2)

E

ts a

finite

set

of input symbols,

(3)

6 is

-

a

finite set of

occur on

input

follows:

automaton

(DFA)

M is

a

5-tuple

where

(1) Q

a

denotes the empty string

symbols).

A deterministic finite automaton states and

.

Formalisms

2.

possibly partial funetion,

-

mapping from Qx_

to

Q, called

state

transition

(4)

0, by recursively traversing the network A. The transition from state s2 to .3 recognizes the strings {cm}, m > 0, by traversing the network C. The network grammar Gl recognizes the language: transition from state si to state s2

L(M3)

{xe {a,b,c}* |

=

x

=

anbncm and n,m >

0}

An RTN grammar not only speeifies the set of strings a language encompasses, but also assigns to each string a constituent strueture tree.

Figure

2.4 shows

a

constituent strueture tree for the

string aaabbbcc.

Constituent strueture trees represent three kinds of information

syntactic

strueture of

1. The hierarchical nance

relation).

a

on

the

string:

grouping

of the

string

into constituents

(domi¬

34

Chapter

y\\ a

b

C

i

A

c

a 2.4:

b

Formalisms

k

yy a

Figure

A

2.

c

b Constituent strueture tree

for the string

aaabbbcc

The UTN Formalism

2.3.

2. The

grammatical type

3. The

left-to-right

35

of each constituent.

order of the constituents

(precedence relation).

advantages over DFAs. Com¬ monly oecurring subpattems expressed as named networks, and be modular networks. In addition, RTNs into can large grammeirs split reflect the recursive strueture of language in a natural way. RTNs have certain obvious notational can

be

RTNs are equivalent to context-free grammars in their generative ca¬ pacity. For example, the network grammar of Figure 2.3 cem be mapped into a strongly equivalent context-free grammeir G (Vr, Vn,S,P), =

where

(1)

the terminal

(2)

the nonterminal

(3)

the start

(4)

the grammeir ruies

aiphabet

symbol

is

Vt

aiphabet is 5 €

are

=

{_,&,_},

is Vn

=

{S,A,C},

Vn and

R=

^

S^AC

C^Cc

A—taAb

C

—*

c

A->ab

limitations in

specifying the syntax of natural languages. First, there are linguistic phenomena which exceed the generative capacity of context-free grammars. For example, crossserial ordering of subordinate clauses in Swiss German6 can be formally stated as the string ambncmdn, which cannot be expressed by RTNs or type-2 ruies. Second, other frequent linguistic phenomena, for ex¬ ample, case-gender-number agreement between determiners and nouns in German, can be expressed as RTNs (or context-free ruies) only by introdueing a large number of transitions (or ruies). This obscures the However, RTNs have

severe

real nature of agreement. 6An example of cross-serial dependency in Swiss German is the subordinate clause ...

Jan säit, dass

This is n

an

dative

d 'Chind

em

a

Hans

es

Huus händ welle laa

hälfe

aastriche

aecusative NPs followed

by by m aecusative-demanding verbs and n dative-demanding detailed description including a proof, see Shieber [Shi87].

NPs,

verbs. For

mer

instance of the pattern

followed

NP™NPjV™ VJ1,

m

36

Chapter

overcome

two

Formalisms

Unification-Based Transition Networks

2.3.2

To

2.

the limitations of

RTNs,

we

have extended the concept in

important respects:

1. Terminal

and

nonterminal

(atomic) symbols,

symbols pairs

but name-term

are or

no

longer

monadic

feature structures.

2. In addition to the linear

precedence and immediate dominance topology of the networks, additional con¬ terminals and constituents can be speeified by equations.

relations encoded in the straints between

using

unification

These extensions

considerably increase the generative power of the formalism, which now includes indexed and fully context-sensitive gram¬ mars, without changing the simplicity and declarativeness of RTNs. We have we

developed

two variants of the UTN formalism. The variant

describe first is based

logic complex as

on

the notation of terms of first-order

described in Section 2.2.2. feature structures

as

predicate

The second variant is based

on

described in Section 2.2.3.

To

explain the two variants of the UTN formalism, we use the gram¬ (see Figure 2.5), a transition network grammar consisting of four networks for simple Germern sentences. Network 5, the top-level net¬ work, speeifies an (infinite) set of sentences consisting of a noun phrase (NP) and a verb phrase (VP). The NP consists of mar

•

G2

(optional) determiner, zero, one or more eidjectives and e.g., der sternenübersäte Himmel (the star-spangled sky), an

Herbert,

a

noun,

or

•

a

proper name, e.g.,

•

a

pronoun, e.g.,

•

a recursively defined noun phrase followed by a prepositional phrase (PP), e.g., umweltfreundliche Autos mit niedrigem Ben¬ zinverbrauch (non-polluting cars with low petrol consumption).

er

(he),

or

or

2.3.

The UTN Formalism

37

>QPrePfrQ NP»Q

Figure

2.5:

Transition network grammar G2

for simple German

sen¬

tences

The VP consists of lowed

by

an

em

NP. A VP

intransitive verb

can

also have

a

or

of

a

transitive verb fol¬

number of PPs attached to it.

This grammar recognizes sentences such as Die berühmte Astronomin beobachtet den sternenübersäten Himmel im Observatorium mit dem Ra¬

dioteleskop (The famous woman astronomer observes the star-spangled sky in the observatory with the radio telescope). Appendix C contains the code for this example grammeir. There is one version of this gram¬ mar based on name-term pairs and a second version based on feature structures.

UTN and First-Order Terms

The first variant of the UTN formalism is based

•

sets of name-term

pairs

to

on

the notation of

represent terminals and constituents

and •

unification

equations

to

speeify

constraints that must be satisfied

between terminals and constituents.

38

Chapter

Terminals and constituents name-term

represented

pair>

»(»{"("

::=

an

(unordered)

symbol and is (a symbol prefixed by "?") or an optional in peurentheses (infix notation). is a

as

set of

pairs

aBj

S

and

w

6

FIRST^)}

The REACHABILITY relation 3. holds between two B if there is

a

in the

dominated

string

Definition 3.5 LetR A £ N and B any sentential

symbols

A and

derivation from A to B such that B is the first element

_

by A.

More

formally:

(_V, S, M,S)

=

(N U E).

be

a

B is reachable

network, 4- Sa in

recursive transition

from A, Ä&B

A

form.

The FIRST relation is

a

subset of the REACHABILITY relation.

The REACHABILITY relation closure of the left-corner

can

also be defined

relation, i.e.,

as

the transitive

all

tuples consisting of the sym¬ bol of the left-hand side emd the first symbol of the right-hand side of all grammeir mies in em e-free contex-free grammar. For example, in grammar G2 (see Figure 2.5), the REACHABILITY relation includes the following tuples: {(S, NP), (S, det), (NP, noun),...}. The above relations

can

be

precomputed

and stored for

a

specific

At parse time, this information can be used to effieiently the rule invocation strategy. In the next two sections, we describe

grammeir.

guide four top-down relations.

and four

These

bottom-up strategies which make use of eight strategies have been implemented in our

these chart

parser.

3.3.1

Top-Down Strategies

Top-down parsing can be viewed as finding a derivation for an input string. Beginning with the start symbol, nonterminal symbols are re¬ placed step by step by the right-hand sides of the corresponding ruies until the string consists of terminal symbols only. Top-down parsing can also be regarded as construeting a parse tree for the input string starting from the root and creating the tree in preorder. There

several top-down strategies (see, for example, [AU72], but quite inefficient top-down algorithms based on general [ASU86]), recursive or backtracking algorithms and nonrecursive algorithms based on LL(k) tables, where k indicates the number of lookahead symbols. are

64

Chapter

Between the

which

general (exponential) algorithms based

be used to parse context-sensitive

can

(ünear) LL(k)-algorit_ms, which are, subset of the context-free languages, there is

ficient

Algorithms

3.

on

backtracking,

languages,

however,

and the ef¬

restricted to

a

class of

quite efficient general context-free languages. These algo¬ rithms, sometimes also called tabular parsing methods [AU72], belong to the family of chart parsing algorithms, which are especially well-suited to parse natural languages.

algorithms

In the

starting

which

can

following,

a

parse

we

present four top-down rule invocation strategies,

with the most

simple but leeist efficient one, which is a pure topThe three strategies use the FIRST and FOLLOW other strategy. relations to pmne search paths which do not lead to a parse. down

Strategy

Tl

(top-down)

Strategy

Tl is the

simplest top-down

rule

invocation strategy discussed here. After initializing the chart with an inactive edge [i,i + l,Cj,Cj,e], for eeich input word a< of category Ci,

edge [1,1, S,e, X] is added to the chart for each transition (qs,X,p). hypotheses predict that the input string will be parsed as a constituent of type S. The top-down parser proeeeds as follows: For every pedr of active and inactive edges, the fundamental rule is apphed. In addition, each time an active edge "seeking" as next symbol a nonterminal X is eidded to the chart, an empty active edge of category X is eidded to the chart at the vertex where the active edge ends (unless it is already in the chart). The fundamental mle and the prediction of new, empty active edges are apphed until no more edges a new

active

These initial

can

be added to the chart.

If the chart contains

one or more

inactive

edges of type 5 (i.e., [l,n,5,a, e]) that span the entire chart, the input string has been recognized. Otherwise, the string does not belong to the language defined by the grammar. Figure 3.7 shows a simplified version of the recognition algorithm. In

implementation, an edge can be extended to contain a parse tree, thereby turning the recognizer into a parser. In addition, an effi¬ cient indexing scheme can be used instead of a simple hst to maintain the set of active and inactive edges. Furthermore, a second data stmc¬ ture called an agenda can be used to störe the tuples of active and inactive edges to which the fundamental rule is to be apphed. Depend¬ ing on whether the agenda is implemented as a Stack, queue or sorted list, the algorithm behaves as a depth-first, breadth-first or heuristic an

3.3.

Chart

Algorithm:

65

Parsing

3

Input: A recursive transition network (RTN) -1-203 ...an with a< 6 input string w =

Output: An inactive edge [1, n Method: Initialize the set of

steps (2)

and

(3)

until

1,5, a, e]

+

an

active

edge

to set I

fedlure

input string, add an edge edges I. For each transition and p 6 Qs of Ms, add an edge of

[i,j, A,a,B] is eidded to set /, (qß,X,p) of Mb, a new active edge ej

=

(unless be

=

tive

Figure

this

edge is already

an

inactive

in set

Top-down

chart

I).

For each

=

3.7:

an

S

edge. [i,j,A,a, e] edge efc [k,i,B,ß, A] of set _" and for each (6*(QB,ßÄ),X,p), add a new edge [k,j,B,ßA,X]

3. Let et-

and

of the

every transition

[_/,_•',5,e,__]

(N, S, M, S)

by performing step (1). Repeat edges can be added to the set I.

[i,i + l,Ai,Ai,e] to the set (qs, X,p) with X € (S U N) [1,1,5, €,__"] toset/. add, for

=

I

edges

no new

1. For every terminal et,-

2. Whenever

or

R

parsing algorithm

ac¬

transition to set L

66

Chapter

search

3.

Algorithms

algorithm.

Strategy T2 (top-down with selectivity) Grammars for natural languages tend to have a large branching faetor, as, for a nonterminal A, there are frequently several ruies which expand A. It is often possible to restrict the number of alternatives if it is known which set of terminals can

derive the first nonterminal of the

network).

transition

This is

right-hand

side of

exactly the information

a

a

rule

(or

predictive top-

down parser uses to select one of a set of alternative ruies [ASU86]. Each time the parser enters a transition network of category A, each

edge [.,.,__,e,.B] is tested to see whether it cem derive the input symbol a,- by examining whether a< e FIRST(B). Therefore, step (2) two of the top-down algorithm of Figure 3.7 is modified as follows:

active

[i,j, A, a, B] is added to set /, eidd, for every transition (qi,,X,p) of Mb, a new active edge [j,j,B,e,X] to set I if aj £ FIRST(B). Whenever

an

active

edge

e

=

Remark:

This strategy is similar to

ever, it is

more

predictive LL(k) parsing.

How¬

general because it parses all context-free grammars. It to Kay's "directed top-down" scheme [Kay82], a directed

corresponds top-down strategy that uses the FIRST relation to test whether the next input symbol is in the FIRST set of the active edge each time an empty active edge is created.

Strategy T3 (top-down with lookahead) The use of the FIRST relation significantly reduces the number of useless active edges. The applieation of the FOLLOW relation can be used in a similar way to reduee the number of useless inactive edges. This is important, since inactive edges not only use storage but may also trigger new active edges. Each time an inactive edge [i,j,A,a,e] is added to the chart, it is tested whether the next input symbol aj (to the right) is in the FOLLOW set of the nonterminal A. Therefore, step (3) of the top-down algorithm (see Figure 3.7) is modified as follows: Let e,-

edge

=

ej_

[i,j,A,a,e] be [k,i,B,ß,A],

=

an

if

inactive

edge.

S*(qs,ßA)

£

For each active

Fß (a final

State

3.3.

Chart

Parsing

67

reached) and aj £ FOLLOW(B), add an inactive edge [k, j,B,ßA,c] to set I; eise, if 6*(qß,ßA) & Fb (non-final state reached), add, for each transition (6*(qs,ßA),X,p), an active edge [k, j,B,ßA,X] to set I. is

Remark: This

with

a

strategy corresponds one-symbol lookahead.

Strategy

T4

most directed

gies

(top-down strategy

to the

algorithm

with lookahead and

is obtained

T2 and T3. This leads to

a

of

Earley [Ear72]

selectivity)

by combining algorithm

very efficient

FIRST and FOLLOW relations whenever

The

the features of strate¬

active

that

uses

the

inactive

edge is added to the chart. Steps (2) and (3) of the top-down algorithm of Figure 3.7 are replaced by steps (2) and (3) of strategies T2 and T3, respectively. Remark:

an

or an

Predictive

top-down parsing has been proposed by several ([Kay82], [Wir87]). Top-down parsing with lookahead is described by Earley [Ear72]. However, the combination of prediction and lookahead has never been studied. Based on our experiments (see Chapter 4), a most directed strategy, such as strategy T4, seems to outperform other strategies. researchers

3.3.2

Bottom-Up Strategies

Bottom-up strategies can be considered to construet a parse tree for an input string beginning at the leaves (bottom) and working up to the root (top). Shift-reduce algorithms [ASU86] are among the bestknown bottom-up strategies that reduee an input string to the start symbol by creating a right-most derivation in reverse. A subclass of the shift-reduce family often used to implement parsers for programming languages are the LR(k) algorithms, which are basically non-backtrack shift-reduce parsers whose shift and reduee actions are guided by an FA. Besides the general backtracking-based bottom-up algorithms capable of handling all context-sensitive languages and the special shift-reduce algorithms capable of handling only a subset of the context-free lan¬ guages (called LR languages), there are quite efficient algorithms to recognize general context-free languages. These algorithms belong to

Chapter

68

3.

Algorithms

parsing methods [AU72]. In the foUowing, we de¬ algorithm, a type of bottom-up rule invocation strategy. We start with the simplest but least efficient one and continue with improved versions. the class of tabular

scribe four variants of the left-corner

Algorithm:

4

Input: A recursive tremsition network RTN R _ia2a3... an with _,• £ S input string w

=

(_V,E,M,5)

and

an

=

Output: An inactive edge [1,»' + l,5,a,e] Method: Initialize the set of

steps (2) and

(3)

until

edges

no new

or

failure

by performing step (1). Repeat edges cein be added to the set I. /

input string, add an edge [*,»' + input items _,- and for of all Mb € Af, add a new active (qB,A,p)

1. For every terminal a,- of the

1, .li, Ai,e]

to the set of edges I. For all

each transition

edge [i, i, B, c, Ai]

to set i\

[i,j, A, a, e] is ewlded to set I, add, for every tremsition (qß,A,p) of Mb £ M, a new active edge [i,i,B,e,A] to set I (unless this edge is edready in set

2. Whenever

an

inactive

edge

e

=

[i,j,A,a,e] be an inactive edge. For each ac¬ [k,i,B,ß,A] of set / and for each transition (6*(QB,ßA),X,p), add a new edge [k,j,B,ßA,X] to set I.

3. Let e,tive

=

edge

Figure

et,

=

3.8:

Bottom-up chart parsing algorithm

Strategy Bl (left-corner) Before describing the left-corner algo¬ rithm, we introduce some terminology. The left corner of a rule is the leftmost symbol (terminal or nonterminal) on the right side. Similar, the left

corner

terminals

a

of

a

transition network is the set of terminals and

network

can

start with.

non-

We often refer to the transitive

using the term reachability relation parsing is to index each transi¬ tion network by its left corners. When a phrase is found, networks that have that phrase as their left corner eure tried in turn by looking for closure of the left-corner relation as

well.

The basic idea of left-corner

3.3.

Chart

phrases

Parsing

69

that span

remaining paths through the network. Roughly, in peirsing, the left comer of a transition network is recognized bottom-up and the remainder of the network is recognized top-down. Figure 3.8 shows the algorithm for left-corner parsing. A left-corner parser traverses the parse tree bottom-up and inorder. left-corner

Strategy strategies

(bottom-up

B2

with

top-down filter)

Bottom-up

often propose constituents that do not match higher-level constituents. This is a severe problem for grammars that have many common right factors. If, for example, the NP network has two paths

which derive det the

noun

and noun, this network is triggered twice on (the man), once on der and once on Mann.

der Mann

input string Bottom-up parsers are overproductive in edges that do not attach to phrases on the left. Directed bottom-up parsing avoids this problem by a teehnique that is the dual of predictive parsing. Directed bottom-up parsing is somewhat like running a top-down parser in parallel. Each time an inactive edge is added to the chart, it is tested whether there is an active edge at the start-vertex of the inactive edge which can be extended by the inactive edge. Step (2) of the bottom-up algorithm is modified

as

follows:

Whenever

an

[i,i,B,e,A]

edge [i, j, A, a, e] is added to set I, add, (g_j, A,p) of Mb £ M, a new active edge i" if there is em active edge [k,i,C,a,D]

inactive

for each transition to set

and D$tA.

(bottom-up with lookahead) Left-corner parsing optimized in another way by using a kind of lookahead similar to that of strategy T3. Each time an inactive edge is added to the chart, it is tested whether the next input symbol to the right of the inactive edge is in the FOLLOW set of that edge. Step (2) of the bottom-up algorithm is modified as follows: Strategy

can

B3

also be

edge e [i,j, A, a, e] is added to set I for every transition (qs,A,p) FOLLOW(A), add, a active new M, edge \j, k,B,e, A] to set I.

Whenever and aj £ of Mb £

an

inactive

=

Chapter

70

Strategy

head)

B4

3.

Algorithms

(bottom-up

The most efficient

with top-down filtering and looka¬ bottom-up algorithm is obtained by com¬

bining the top-down filter of strategy B2 and the lookahead of strategy B3. Step (2) of the algorithm of Figure 3.8 is modified in the following way:

edge e [i,j,A,a,e] is added to set I for add, FOLLOW(A), every transition (qu, A,p) a active new M, edge [i,i,B,e,Ä] to set I if there active edge [k, i, C, a, D] and DMA.

Whenever

an

inactive

=

and _j £ of Mb £ is

an

Remark: This strategy is similar to Tomita's extended version of the edgorithm [Tom86] which can be used to parse general context-free

LR

languages.

Computational Complexity

3.3.3

presented eight rule invocation strategies parsing. In this section, we discuss the computational complexity of chart parsing, i.e., its worst-case asymptotic time and space complexity. Time complexity is a measure for the number of elementary mechanical Operations executed as a funetion of the input. Space complexity is a measure of the memory that is required to störe intermediate results as a funetion of the size of the input. To indicate complexity, we use the 0-notation9. In order to analyze the complexity of chart parsing, we restate the algorithm in a form revealing the parallelism between context-free parsing emd matrix multiplication. This was originally shown by Martin et al. [MCP87]. Without loss of generedity, we assume that the grammeir is in Chomsky Normal Form [AU72]. Edges between vertex -i and vertex Vj consist of all possible combinations of edges from vertices u,- to vk and edges from vertices vk to Vj as created by the applieation of the fundamental rule of chart parsing. In the

previous sections,

we

within the framework of chart

9The use of the O-notation for upper bound wipes out constants from complexity For example, an algorithm with complexity 8n3 + 5n is O(n'). More formally, we say that a funetion / is "of order g" ox 0(g) iff there exists positive constants c and fc such that, for all n > fc, |/(n)| < c|_f(n)|. formulas.

3.3.

Chart

Parsing

71

chart(i,j) The chart

:=

Ui"

"{"

char>

symbol>

expr>

Appendix

Syntax

B

of UTN

Formalism

::=

_}

:: "(" -[} ")" =

::=

::=

":term"

| ":graph"

-(}

::=

"(" {} ")"

::=

::=

"(" ")"

"(" {} ")"

107

Appendix

108

I

Syntax of

B.

UTN Formalism

::=

I |

::=

"(" "cat" ")"

::=

"(" "call" ")"

::= "(" "jump" I

")"

::=

"(" "reply" ( ) ")"

pairs>

::=

feature>

::=