University of Pennsylvania
ScholarlyCommons Technical Reports (CIS)
Department of Computer & Information Science
5-1-1981
Construction Methods of LR Parsers Karl Max Schimpf University of Pennsylvania
Follow this and additional works at: http://repository.upenn.edu/cis_reports Part of the Electrical and Computer Engineering Commons Recommended Citation Karl Max Schimpf, "Construction Methods of LR Parsers", . May 1981.
University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-80-40. This paper is posted at ScholarlyCommons. http://repository.upenn.edu/cis_reports/725 For more information, please contact
[email protected].
Construction Methods of LR Parsers Abstract
This paper presents five different LR parser generators and an error recovery method which is derived directly from the LR parser. The parsers presented include the original LR (1) parser defined by Knuth. The SLR(1) and LALR(1) parsers defined by DeRemer, and the weak and strong compatible LR parsers presented by Pager. All five parsers have been implemented by the author using two programs. Furthermore, the implementation of the SLR (1) parser generator includes an error recovery method and produces an SLR(1) parser with error recovery built in. Disciplines
Electrical and Computer Engineering Comments
University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-80-40.
This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/725
UNIVERSITY OF PENNSYLVANIA THE MOORE SCHOOL OF ELECTRICAL ENGINEERING SCHOOL OF ENGINEERING AND APPLIED SCIENCE CONSTRUCTION METHODS OF LR PARSERS Karl Max Schimpf Philadelphia, Pennsylvania May 1981
A thesis presented to the Faculty of Engineering and Applied Science of the University of Pennsylvania in partial fulfillment of the requirements for the degree of Master of Science in Engineering for graduate work in Computer and Information Science.
Aravind ~ o s h i
-
abstract
T h i s p a p e r p r e s e n t s f i v e d i f f e r e n t LR p a r s e r g e n e r a t o r s and a n e r r o r r e c o v e r y method which i s d e r i v e d d i r e c t l y from t h e LR p a r s e r .
The p a r s e r s p r e s e n t e d i n c l u d e t h e o r i g i n a l
LR(1) p a r s e r d e f i n e d by Knuth, p a r s e r s d e f i n e d by DeRemer,
T h e S L R ( 1 ) a n d LALR(1)
a n d t h e weak a n d s t r o n g
c o m p a t i b l e LR p a r s e r s p r e s e n t e d by P a g e r .
A l l five parsers
h a v e b e e n i m p l e m e n t e d by t h e a u t h o r u s i n g two p r o g r a m s . Furthermore,
t h e i m p l e m e n t a t i o n of
t h e SLR(1) p a r s e r
g e n e r a t o r i n c l u d e s a n e r r o r r e c o v e r y method and p r o d u c e s a n SLR(1) p a r s e r w i t h e r r o r r e c o v e r y b u i l t i n .
Table
of
contents
: Introduction
Chapter
: The construction of LR(1)
Chapter
parsing tables 11.1 LR(1)
grammars
11.1.1
Derivations
11.1.2
Languages generated by context free grammars
11.2
Sentential forms and their viable prefixes
11.3 LR(1)
characteristic automata
11.4 Construction of LR(1) parsers : Methods for reducing states in
Chapter
LR(1) 11.1 SLR(1)
parsers
parsers
11.2 LALR(1)
parsers
11.3 Pager's
Weak compatibility
11.4 Pager's
Strong compatibility
: An error recovery method for LR parsers
Chapter Chapter
1
: Implementation
V.l
Representation of the parsing tables
V.2
SLR(1)
implementation
V.2.1
Input grammar
V.2.2
Running the SLR(1) parser constructor
V.2.3
Interpretation of the output file
V.2.4
Conflict resolution
V.2.5
Size restrictions
iii
V . 3 LR(l), LALR(l), w e a k a n d s t r o n g c o m p a t i b l e parser generators
92
V-3.1
Input g r a m m a r
92
V.3.2
R u n n i n g t h e program
96
Appendix
A
: S a m p l e P A S C A L s k e l e t o n f o r u s e of t h e
SLR(1) p a r s i n g t a b l e s Appendix
g
: S a m p l e P A S C A L s k e l e t o n f o r u s e of t h e
LR(l), LALR(l), Weak and Strong c o m p a t i b i l i t y parser g e n e r a t o r s References
103
I
Chapter
Introduction
I t i s a w e l l known f a c t t h a t - o f a l l string
parsers,
l a r g e s t c l a s s of are
quite
the
class
first
LR
deterministic
parsers
recognize the
context f r e e languages
EKnu651.
LR p a r s e r s
powerful and a r e a b l e t o recognize v i r t u a l l y a l l
programming l a n g u a g e s i n were
of
the
introduced
existance by
Knuth
v e r s i o n known a s a n L R ( 1 ) p a r s e r .
today.
These
parsers
[Knu65] w i t h h i s o r i g i n a l Unfortunately,
h i s method
r e q u i r e s e x t e n s i v e r e s o u r c e s and h e n c e i s i m p r a c t i c a l t o u s e f o r p a r s i n g any programming l a n g u a g e . Several alternative parsing presented
which
reduce
the
methods resource
p r o d u c i n g more p r a c t i c a l LR p a r s e r s . accomplish
this
result
a c c e p t e d by t h e p a r s e r s .
have
since
been
requirements
thus
Some o f
these
by r e d u c i n g t h e c l a s s of
The r e s u l t i s a r e d u c t i o n
parsers languages in
the
number
of p a r s e s t a t e s b u i l t and h e n c e a n o v e r a l l r e d u c t i o n
i n the resource requirements. type
of
LR
parser
are
p r e s e n t e d by D e R e m e r
T h e m o s t common f o r m s o f t h i s
the
SLR(1)
and
LALR(1) p a r s e r s
[DeR69].
A n o t h e r f o r m of r e s o u r c e r e d u c t i o n u s e d by is
a
method
parser.
by
t h e s e s t a t e m i n i m i z a t i o n methods
reductions
have
been
[ P a g 7 7 a , P a g 7 7 b l c a l l e d weak a n d s t r o n g
Pager
c o m p a t i b l e LR p a r s e r s . state
parsers
of p e r f o r m i n g s t a t e m i n i m i z a t i o n on t h e LR(1)
Two o f
proposed
LR
I n these parsers,
he
t o m a i n t a i n t h e power of
resticts
the
t h e LR(1) p a r s e r
a n d h e n c e t h e r e s u l t a n t p a r s e r r e c o g n i z e s t h e same c l a s s
of
l a n g u a g e s a s t h e o r i g i n a l LR(1) p a r s e r . T h i s p a p e r p r e s e n t s f i v e d i f f e r e n t LR p a r s e r g e n e r a t o r s and
a n e r r o r r e c o v e r y method which i s d e r i v e d d i r e c t l y f r o m
t h e LR p a r s e r . LR(1)
parser
The p a r s e r s p r e s e n t e d i n c l u d e defined
by
Knuth
[Knu65],
LALR(1) p a r s e r s d e f i n e d b y DeRemer strong
compatible
LR
[DeR69]
,
the
p a r s e r s p r e s e n t e d by Pager
Furthermore,
recovery
method
recovery b u i l t in.
and
produces
SLR(1) and
implementation
[Pag77a].
author
t h e i m p l e m e n t a t i o n of
parser generator includes the
original
and t h e weak and
A l l f i v e p a r s e r s h a v e b e e n i m p l e m e n t e d by t h e
two p r o g r a m s .
the
of
using
t h e SLR(1) an
error
a n SLR(1) p a r s e r w i t h e r r o r
The method of compatible
LR
construction parsers,
of
the
presented
unfortunately only provides a
by
partial
a l g o r i t h m s which b u i l d t h e s e p a r s e r s .
weak Pager
the
basic
p r e s e n t s Pager's simplifies
nature
of a
t h e comprehension of
a more c o m p l e t e e x p l a n a t i o n of
[Pag77a] of
,
the
These algorithms a l s o which
the algorithms.
algorithms i n
strong
explanation
c o n t a i n minor i n c o n s i s t a n c i e s and o m i s s i o n s obscure
and
modified t h e code.
tend
to
This paper
notation
which
It a l s o provides
the algorithms,
and
includes
a few minor a l g o r i t h m s o m i t t e d by P a g e r . T h e p r o b l e m w i t h LR p a r s e r s , when u s e d i n
is
that
compiler,
t h e y a r e d e s i g n e d a s a s y n t a c t i c method which o n l y
decides i f the
a
the given input s t r i n g belongs t o a
c l a s s a c c e p t e d b y t h e LR p a r s e r .
i l l e g a l i n p u t symbol i s found, failure.
However,
the
Hence,
parser
language
in
once t h e f i r s t
stops
reporting i t is
when a c o m p i l e r p a r s e s a p r o g r a m ,
a d v a n t a g e o u s t o h a v e t h e c o m p i l e r r e p o r t a s many
additional
errors as possible. I n o r d e r t o i m p r o v e t h e LR use
in
a
compiler,
syntactic error errors.
this
recovery
Furthermore,
paper scheme
additional
routines
also to
capabilities presents
recognize
for
a purely additional
t h e method h a s b e e n d e s i g n e d s o t h a t
i t can be d i r e c t l y i n c o r p o r a t e d
no
parser's
are
i n t o t h e LR p a r s e r .
Hence,
necessary i n order t o perform
e r r o r r e c o v e r y and p a r s e t h e r e s t of
the input.
The method u s e d i n t h i s p a p e r t o h a n d l e e r r o r
recovery
i s b a s e d on t h e m e t h o d u s e d by P e n n e l l o a n d D e R e m e r
[P&D79],
which has a s e p a r a t e e r r o r r e c o v e r y
includes
error
correction. of
symbol,
verify
and
fragments" error
the
input, that
starting only
it
(substrings derivable
recovery
method
that
c o n t r o l s t r a t e g y used i s t o s e a r c h
The
t h e remainder
routine
the
illegal
of
"viable
consists
its
from
presented
from
in
grammar).
this
paper h a s been
implemented u s i n g t h e SLR(1) p a r s e r a s i t s b a s i s . the
is
method
general
enough
e a s i l y b e a p p l i e d t o any of
The
However,
t h a t t h e same method c o u l d
t h e o t h e r LR
parsers
presented
i n t h i s paper. C h a p t e r two s t a r t s b y s e t t i n g up for
context
preliminary
f r e e languages and d e r i v a t i o n s .
This notation
i s u s e d t o d e s c r i b e t h e b a s i c s t r a t e g y u s e d by The
last
sections
of
the
chapter
parsers.
LR
cover
c o n s t r u c t i o n methods which w i l l y i e l d t h e
notation
the
LR(1)
actual
parser
as
its result. C h a p t e r t h r e e d e s c r i b e s how implemented
parser
each
of
the
constructors are built.
other
The SLR(1) a n d
LALR(1) c o n s t r u c t i o n m e t h o d s a r e p r e s e n t e d u s i n g characteristic Pager's weak
automaton
n o t i o n of
and
strong
as
four
the
LR(0)
t h e i r basis f o r construction.
compatibility, compatibility,
the
definitions
of
both
and t h e a l g o r i t h m s used i n
c o n j u n c t i o n w i t h t h e c o n s t r u c t i o n of
t h e s e two
parsers
are
also described. C h a p t e r f o u r d i s c u s s e s t h e e r r o r r e c o v e r y method and a n algorithm
which
takes
in
parser with e r r o r recovery. parser
is
used
a n LR p a r s e r and p r o d u c e s a n LR It
also
explains
how
an
LR
t o p a r s e a n i n p u t s t r i n g and d e c i d e i f t h e
s t r i n g i s d e r i v a b l e f r o m t h e grammar u s e d t o g e n e r a t e t h e LR parser. C h a p t e r f i v e c o n c l u d e s t h e p a p e r by d i s c u s s i n g the
two
programs used f o r t h e implementation.
c o n s t r u c t s a n SLR(1) The o t h e r p r o g r a m , of c o m p a t i b i l i t y ,
parser with error
recovery
One p r o g r a m built
u s i n g o u r m o d i f i c a t i o n of P a g e r ' s c a n b u i l d e i t h e r a n LR(l),
o r s t r o n g l y compatible LR parser.
briefly
LALR(l),
in.
concept weakly
Chapter
The construction -
of
the LR(1) parsing tables
This chapter describes how created.
In
LR(1)
parsing
tables
are
order to do this, let me start out by setting
up some preliminary notation.
11.1 LR(1) -
Grammars
A Context-Free Grammar (denoted CFG) G is a quadruple G = ( N
,
T
,
P
,
S ) where
T is a finite alphabet of terminal symbols; N is a finite alphabet of nonterminal symbols; ( N U T) is the finite set of grammar symbols; S is a nonterminal symbol in N , called the
start symbol;
and
P is a finite set of pairs ( A , g ) ,
called productions,
such that A 6 N and 2 6 (N U T)
*
A production (A,g) will be denoted in the form A -> g. there is a special start production S ->
S' where S'
S does not occur in any other production also
a
in
P.
Also
6 N and
There
is
symbol $ 6 T, which denotes the end of the
special
string being parsed, and does not appear in any production. For notational convenience, upper case letters will used
to
denote
be
nonterminal symbols, lower case letters to
denote terminal symbols, underlined upper
case
letters
to
denote grammar symbols, and underlined lower case letters to denote strings of grammar symbols ( The symbol
11.1.1
= will be
strings
=> : (N U T) {
G = ( N
*
x (N U T)
I B
(gBc,&)
*
,
T
,
P
,
S
with
B
production B ->
a
6 N;
the
relation
*
a,b,c (N U T) ;
b
nonterminal
b
let
),
be defined by the set of pairs
in P)
In other words, given any string in symbol
(N U T) in
*
of
N
the
and
in P, we say that the string
transitive
and
&.
gBc
Also, let f > and
transitive
reflexive
form
given the
string & in a one step derivation using B ->
will be denoted as p B c => the
* ).
reserved to denote the empty string.
and B ->
the
(N U T)
Derivations
Given a CFG
pBc,
in
derives
b.
*=>
closures
This denote of =>
respectively.
w e can d e f i n e another r e l a t i o n
From t h e a b o v e r e l a t i o n , which
implies
o r d e r i n g of
an
: (N U T)
new r e l a t i o n = > R
{aBc => R -a b c I aBc I n o t h e r words,
=>
R
*
=>
the rewrite steps.
x (N U T)
&
*
Let t h i s
be defined as the s e t
*
and 2 6 T )
is t h e one
step
derivation,
when
the
derivation is applied t o the rightmost nonterminal occurring i n the string
aBr.
*
a n d =>
Let f>R
denotes
R
the
transitive
a n d t r a n s i t i v e r e f l e x i v e c l o s u r e s o f =>R, r e s p e c t i v e l y .
L a n g u a g e g e n e r a t e d by 2 c o n t e x t - f r e e
11.1.2
G i v e n a CFG G = ( N
,
T
g e n e r a t e d b y G i s t h e s e t of L(G)
Note: -
*
P
,
S ),
the
language
L(G)
strings
S = > & , " 6 T
*
}
T h e o r d e r i n w h i c h => i s a p p l i e d h a s n o e f f e c t o n t h e
resulting L(G),
={a 1
,
grammar
terminal
g e n e r a t e d by G ,
string
produced.
Bence
could be a l t e r n a t i v e l y be
t h e language defined
as
t h e set
Using t h e above d e f i n i t i o n s , loosely defined a s follows:
a n LR(1)
grammar
can
be
An LR(1)
-a
grammar is a CFG G, such that each string
(derived
6 L(G)
via
a rightmost derivation) can be
parsed deterministically in a single scan from left
to
right, having the ability to look ahead one symbol from the point of scanning.
11.2 Sentential -
An LR(1) to
be
forms
and
their viable prefixes
parser, when scanning the input (of
parsed),
More
string
is essentially looking for a match with one
or more strings that can be derived symbol.
a
formally,
from
LR(1)
the
CFG's
the
parser
is
start
trying to
recognize a sentential form which is an element in the set
a I
{
*
S = > g 5 and 5 6 (N U T)
*
)
In recognizing a sentential form, the LR(1) really
interested
parser
in knowing whether it has scanned enough
of the input string such that a reduction can be that
is,
when
where
a,b 6
this
information,
the
*
sentential
(N U T) ;
-c
*
6 T ;
and
a reduction of
as
finding the handle.
pair ( l & j , B
->
the
of
length
the string
b
b) such that
performed,
form is the string 2 = B -> b 6 P.
Knowing
b to B can be made to get
the rightmost derivation string that 2 came from. known
is
This
is
The handle is defined as the
*
S =>
abc. The denotes R the handle, which states the position where
can be reduced to B using B ->
b.
The
string
ab -
is
called
the
viable
prefix or characteristic string
[A&U77]. Using the above
definitions,
it
is
characterize what an LR(1) parser does. from left to right, finding
it,
the
looking string
for
from
unscanned looking
the
input, for
viable the
another
easy
to
It scans the input,
viable
prefix.
Once
is reduced with the corresponding
production of the viable prefix. derived
a
fairly
Using the
prefix
parser viable
reduced
concatenated
repeats prefix.
the
string
with
above
the
process
This continues until
either the input has been reduced to the
start
symbol,
or
failure occurs by not finding any legal viable prefixes.
11.3 LR(1) -
Characteristic Automaton
It is fundemental result that viable from
CFG's
are
regular.
prefixes
Therefore a deterministic finite
automaton, called the characteristic automaton can
derived
for
CFG,
a
be built to recognize the set of legal viable prefixes.
Furthermore, once built, the LR(1)
the
characteristic
has
been
parser can be directly derived from it.
Let a marked production where A ->
automaton
be
of
the
ab is a production in P, and
form
A ->
a
"."is assumed
a symbol not in the set of grammar symbols
(N U T).
.b to be
These
marked
p r o d u c t i o n s w i l l b e u s e d t o d e n o t e "how much" o f
production's string
A -> 2
r i g h t hand s i d e
being
.b
scanned
scanned.
has
been
Hence
represents the fact that
recognized
the
marked
the
LR(1)
in
the the
production parser
has
w h e r e 2 i s some s t r i n g t h a t o c c u r r e d
the string
before the s t r i n g 2 i n the input. Expanding t h i s t o i n c l u d e a s e t of let
an
A -> 2
[ A -> 2
item be defined a s the pair
.b is
denoting
the
a marked p r o d u c t i o n ,
set
of
look-ahead
.b
,
symbols,
LA] w h e r e
and LA is a s u b s e t
of
T
a l l t e r m i n a l symbols which can f o l l o w
t h e p r o d u c t i o n and i s c a l l e d t h e
set of
lookahead
symbols.
I t e m s , e s s e n t i a l l y , d e s c r i b e two t h i n g s :
i ) What p o r t i o n of occur
at
the
a production's
end of
r i g h t hand
side
can
t h e s e t of v i a b l e p r e f i x e s b e i n g
described
i i ) What p o s s i b l e s y m b o l s c a n production's can
follow
right the
hand viable
immediately
follow
the
s i d e (and hence what symbols prefix
with
the
given
production). E a c h s t a t e of of
all
state,
t h e c h a r a c t e r i s t i c automaton i s
i t e m s w i t h t h e same v i a b l e p r e f i x .
For example,
if
set
When b u i l d i n g a
t h e r e m u s t b e a way t o i n s u r e t h a t a l l i t e m s ,
given s t a t e , a r e included.
the
for
a
t h e r e i s an i t e m
i n the s t a t e with B ->
is
i n P,
p r o d u c t i o n B -> formed
the
with
marked
. -c f o r
that
state.
The
viable
prefix,
new m a r k e d p r o d u c t i o n , w i l l h a v e t h e s a m e
the
items
and
t h e n t h e r e must b e a n i t e m w i t h t h e marked
prefix as the original i t e m . such
. Bb
A -> 5
production
T h e p r o c e s s of
is called closing the s t a t e .
to close a state,
including
Rowever,
it is also necessary t o
i n order
describe
p r o p a g a t e lookaheads t o t h e added i t e m s .
all
how
To d o t h i s ,
to
define
the function f i r s t ( & ) a s follows: first(=)
= ( a
*
1 5 => a s , a 6 T)
Using t h e above d e f i n i t i o n ,
the closure
of
a
set
of
items I (denoted a s c l o s u r e ( 1 ) ) can be constructed using t h e rules : i) Every i t e m i n I i s a l s o i n c l o s u r e ( 1 ) ii) I f
t h e i t e m [ A -> 2
. Bb
a n d B ->
i n P,
a 6 LA
then
item
[B ->
the
,
LA] i s i n c l o s u r e ( I ) ,
. c,
first(ba)]
is
in
closure(I).
example
2.1
L e t t h e CFG G h a v e t h e s e t of p r o d u c t i o n s :
w h e r e S -> A i s t h e s t a r t p r o d u c t i o n . of
the
item
set
{ [ S ->
.A
,
Then t h e c l o s u r e
{$)I)
is
the
set
The characterisitc automaton G is built from the set of states
constructed above with the transitions being grammar
symbols.
The path to a given state will then spell a
legal
prefix for some sentential form. The algorithm initial
(shown
below)
starts
by
setting
the
to the closure of the start production, then
state
taking each state just
built,
determines
the
transitions
from the state as follows:
i) for each grammar symbol [A -> 2
.
, LA]
is
X
in (N U T) set.
the
item
in the state, there is a unique
transition, labeled X , to the state containing the item [A -> &
.b
,
LA1
the grammar symbol
ii)
if
[A -> 2
obtained by shifting the dot across
5.
. , LA]
is
in
the
state,
transition should be produced for that item.
then
no
for
Algorithm
constructing
a CFG G = ( N
input:
output:
the
characteristic automaton
,T , P ,
S )
a set C, of states, and the function
GOT0 : (set of items) x (N U T) ->(set of items), which defines the characteristic automaton.
Method:
The two
procedures
below,
initiated
by
calling
is a unique symbol in T which
denotes
ITEMS ( G );
procedure ITEMS(G1; begin ->
C := closure([S {where " $ "
. S',($)I);
the end of the string to parse) repeat for each symbol
X
do add -
set of items I
in
C,
and
such that J = GOTO(1,X)
each
grammar
is not empty and
J to C;
until no more sets of items can be added to C
f u n c t i o n GOTO(1,X); bepin let J -
b e t h e s e t of
[ A ->
aX
[ A -> 2
.b
5
items
LA] s u c h t h a t
. =,LA]
i s i n I;
return closure(J) ; end
-9
L e t t h e c o r e of of
a s t a t e b e t h e s e t of
items
in
either
a
state,
t h e two f o l l o w i n g f o r m s : i ) i i )
[ S ->
[A ->
S'
b
.
, {$)I c , LA]
where
b #
2
I t c a n b e s h o w n t h a t by c l o s i n g t h e c o r e
of
all e x a m p l e s i n
t h e origonal s t a t e can be retrieved.
Hence,
t h i s p a p e r w i l l only s h o w t h e c o r e of
each s t a t e .
example
2.2
Construction
of a
characteristic a u t o m a t o n
Let the C F G G b e defined by t h e s a m e set of productions as
in
example
2.1.
Then,
t h e LR(1)
characteristic
a u t o m a t o n of the grammar G is as follows:
w h e r e the transition ars are defined by GOT0
11.4
Construction
of
LR(1)
Parsers
Using the characteristic automaton, can
be
directly generated.
as a quintuple M = ( K K
is
a
,
LR(1)
parser
Let an LR(1) parser be defined
action
finite
the
, goto ,
set
action : K x T -> {shift j
I
,
G
of
start ) where
parser
states;
j 6 K)
U {reduce p 1 p 6 P) U {error) defines the parsing action table; soto : K x N -> K U {error) defines the parsinq goto table;
G is a CFG such that L(G) is the class of languages to recognize; and start is the initial state. The set of parser states K accept action(H,$)
which =
is
reduce S - >
contains
the S'.
H,
state Also,
a
the
this
definition,
an
LR(1)
state
such
that
action
parsing tables are enough to define an LR(1) Using
special
and
goto
parser.
parser
can
constructed using the following algorithm [A&U77,Gal791:
be
Algorithm
for
constructing LR(1) parsing tables
The characteristic automaton CG = (C,GOTO)
input:
for a CFG G;
output:
a parsing table (possibly
with
conflicts
if
the
grammar G is not LR(1))
from
the
parser
characteristic
will
corresponds state.
i)
... ,I,)
Let C = {11,12,
method:
be
be a set of sets of
automaton
labelled
2
to the set of items I
items
CG.
The states of the
n
where
state
i
State 1 is the initial
i
The parsing actions are:
If
GOTO(Ii,a)
[A->&.
=Ij;
ii) If [A -> 2 action(i,a)
=
a b , LA] 61i
where
then action(i,a)
. ,LA]
reduce
-
a 6 T
and
shift j
in Ii, then for each a 6 LA,
set
A ->
iii) All entries of action not rules are set to error.
defined
by
the
above
The pot0 transition for state i is constructed using the two rules :
i) if GOTO(IiyA) goto(i,A)
= =j 'where
A is a
nonterminal,
then
= j
ii) All other entries of goto, not defined by the first rule, are set to error
example defined
Let the LR(1)
characteristic
as in example 2.2.
automaton
be
Using the above algorithm,
the two parsing tables produced are:
action
a
b
+---------------+---------------+---------------
$
+
I
shift 3
I
error
I
reduce A->=
I
1
error
I
error
I
reduce S->A
I
3
1
shift 4
I
reduce A->=
(
error
I
4
1
shift 4
1
error
1
shift 7
1
error
1
error
1
shift 8
I
error
I
1
error
1
error
I
reduce A->aAb
+I
1
error
1
reduce A->aAb
I
error
1
1 2
5 6
7 8
+---------------+---------------+---------------+
+---------------+---------------+---------------+
error
+---------------+---------------+--------------+---------------+---------------+---------------+
+---------------+---------------+---------------
I
+
S
+---------------+---------------
A
1 I error I 2 +---------------+--------------2 1 error I error
+---------------+---------------
3 4 5
+ I + I +
1
error
I
5
1
1
error
1
6
1
error
I
error
+ I +
1
error
1
error
1
+---------------+---------------+ +---------------+---------------
I
+---------------+---------------
6
I
7 1 error I error +---------------+--------------8
1
I
error
+ 1 +
error
+---------------+---------------
From the above algorithm, one can tell directly when
CFG
G does not produce an LR(1) language.
This occurs when
action is not a function but only a relation, words,
whenever
a
or
in
other
there is more than one possible action for
some input
pair.
conflicts.
The two types of conflicts that can exist are i)
shift/reduce and
These
ii)
multiple
redueelreduce
respectively denoted as S/R and R / R .
entries
are
conflicts,
known
which
as
are
Chapter
Methods
for
III
reducing s t a t e s
LR(1) p a r s e r s
LR(1) p a r s e r s h a v e t h e n i c e p r o p e r t y t h a t t h e y u s e d f o r p a r s i n g most- p r o g r a m m i n g l a n g u a g e s . t h e p a r s e r s produced f o r t h e s e grammars, described
in
the
previous
considered useful. proposed
Hence,
which w i l l r e d u c e t h e s i z e of
the
(proposed
have
t h e s e methods.
m e r g i n g s t a t e s of
Pager[Pag77al)
Two
use
The
of
languages.
the
s t a t e s by other
two
conditions
for
a LR(1) p a r s e r w h i l e m a i n t a i n i n g t h e
p o w e r t o r e c o g n i z e LR(1)
been
t h e p a r s e r produced.
t h e language accepted. by
method
too l a r g e t o be
(SLR(1) a n d LALR(1)) r e d u c e t h e number o f
r e d u c i n g t h e s i z e of methods
using are
be
Unfortunately,
several modifications
T h i s c h a p t e r d i s c u s s e s f o u r of methods
chapter,
can
full
111.1 S L R ( 1 ) p a r s e r s
similar
The SLR(1) p a r s i n g t a b l e c o n s t r u c t i o n i s q u i t e to
that
of
the
LR(1).
main d i f f e r e n c e i s t h a t t h e
The
p a r s e r p r o d u c e d i s b a s e d on a c h a r a c t e r i s t i c a u t o m a t o n no
(i.e.
lookahead
simplification reduces,
an in
LR(0)
general,
with
automaton). the
total
This
number
of
states created. To b u i l d a n S L R ( 1 ) p a r s e r , t h e lookahead set l e a v i n g j u s t this definition,
r e d e f i n e a n i t e m by removing t h e marked p r o d u c t i o n .
the rules to close a set
of
SLR
Under
items
I
become:
i ) every i t e m i n I is also i n closure(1) ;
i i ) If
t h e i t e m A -> g
a n d B ->
b
. Bc i s
i n closure(I),
6 P
t h e n t h e i t e m B ->
.b
is also i n closure(1);
The p r o c e d u r e t o b u i l d t h e c h a r a c t e r i s t i c a u t o m a t o n a r e also simplified.
T h e s e p r o c e d u r e s a r e as f o l l o w s :
f u n c t i o n GOTO(1,X); begin l e t J b e t h e s e t of
7
A -> g
. Xb
.b
i t e m s A ->
i s i n I and
X
such t h a t
i s a grammar s y m b o l ;
return closure(1); end
-9
p r o c e d u r e ITEMS(G); begin C := c l o s u r e ( S ->
. S');
repeat f o r each -
s e t of i t e m s I i n C ,
a n d e a c h grammar s y m b o l
J
= GOTO(1,X)
do add -
end
such t h a t
i s n o t empty and J
6
C
J t o C;
u n t i l no more s e t s of -9
X
i t e m s can be added t o C
3.1
example
Let a CFG
productions
in
G
be
example
defined 2.1.
by
the
Then
set
an
of
LR(0)
characteristic automaton is:
The SLR(1) method does decide
what
not
use
a
Instead, it uses a method
lookaheads,
which
in
fact
lookaheads will be included. FOLLOW : N ->
to
guarantees
approximate that
This is done by
However, in
the
set
the
order
the of
function
to
terminal symbol $ must be included.
the definition of FOLLOW, it is assumed additional
to
2T which computes all symbols which can follow
a given nonterminal symbol. the
set
reduction to use once a viable prefix has been
recognized.
FOLLOW,
lookahead
production
of the form S"
nonterminal and does not appear FOLLOW is defined as
in
any
that
compute Hence for
there
is
-> S$ where S" production
an is a
in
P.
FOLLOW(X) = { a
I Y
* ax&,
=>
for a l l Y 6 N
where a = f i r s t ( b ) )
example
3.2
U s i n g t h e CFG G d e s c r i b e d i n
example
2.1,
t h e FOLLOW s e t s a r e :
Using t h e c h a r a c t e r i s t i c FOLLOW
the
SLR(1)
parsing
automaton
and
the
function
t a b l e can be created using t h e
following algorithm:
SLR(1) p a r s i n g t a b l e c o n s t r u c t i o n a l g o r i t h m
t h e S L R ( 1 ) c h a r a c t e r i s t i c a u t o m a t o n CG = (C,GOTO)
input:
f o r t h e CFG G.
output:
a parsing table (possibly
with
conflicts
if
not
SLR(1))
method: from
the
L e t C = {I1,
... , I n ) b e
characteristic
automaton
p a r s e r w i l l be l a b e l e d 1,2, to
the
set
of
items
i n i t i a l s t a t e b e s t a t e 1.
t h e s e t of s e t s
*..
.
=i
CG.
of
items
The s t a t e s o f
the
,n w h e r e s t a t e i c o r r e s p o n d s A s w i t h LR(1)
parsers,
let the
The parsing actions are defined as follows:
i) If A
->a . b c
-
GOTO(Ii,b)
.
ii) If A ->
set action(i,b)
6 I
i
I
where b 6 T and then action(i,a)=j
j
is in I i then for = reduce
each
b 6 FOLLOW(A)
A -> 2
iii) all entries not defined by i) or ii)
are
set
to
error
The goto transitions are defined by the following two rules:
i) If GOTO(II,A)
=
then goto(i,A)
I
j = j where
A 6 N
ii) all other entries of goto, not defined by
i),
are
set to error
example example
3.3
Using the LR(0) characteristic automaton in
3.1,
and
the FOLLOW sets in example 3.2,
SLR(Z) parser is defined by the following tables:
the
action
b
a
+---------------+---------------+---------------+
$
error I r e d u c e A->= I shift 3 I +---------------+---------------+--------------2 1 error I error I r e d u c e S->A +---------------+---------------+---------------+ 3 1 shift 3 I r e d u c e A->= 1 r e d u c e A->= +---------------+---------------+--------------error 4 1 error 1 shift 5 I +---------------+---------------+--------------5 1 error I r e d u c e A->aAb 1 r e d u c e A->aAb
1
+---------------+---------------+---------------
S
+---------------+--------------1
I
error error
I I
A
+
2
I
error
error
4 5
error I error I +---------------+---------------
I
error
I
error
+---------------+---------------
I
+ I +
I
+ 1
+I I + +I
111.2
LALR(1)
parsers
A second type of simplification similar to is
the
LALR(1)
SLR(1)
parser invented by DeRemmer [DeR69].
algorithms for computing LALR(1) presented
the
parsers
have
since
[LLH71,AEH72,A&U77,DeR72,Alp76,Pag77b].
Many been
The main
difference from SLR(1) is a concise and more accurate method for
computing
FOLLOW.
the
set
The same LR(0)
of
definition
than
the function
characteristic automaton can be used
to construct either an LALR(1) The
lookaheads
of
the
of an SLR(I) parser. LALR(1)
lookahead
function
LA : state x P -> {t C T) is defined as follows:
and t = first(c) and the string
ba
is
a prefix for the state k)
example characteristic
Using
the
automaton,
CFG
g,
from
function LA is defined as follows:
and example
the 3.3,
LR(0) the
The construction of the LALR(1) as
parser is exactly
the
same
an SLR(1) except that the action function is computed as
follows :
. a h 6 Ii where
i) If A -> 2
a 6 T and
GOT0 (Ii,a) =I then action(i,a)=j
ii) If A
->a
.
is in Ii then for each
a 6 LA(i, A-> A_)
set action(i,a)
=
reduce A -> g
iii) all entries not defined in i) and ii) are set to error -
example example
3.4,
3.5
Using the LR(0)
characteristic automaton in
3.1, and the function LA as defined in example
the LALR(1) parsing tables are:
action
1
I
shift 3
I
error
(
r e d u c e A->g
+I
error
I
error
I
r e d u c e S->A
I
+---------------+---------------+---------------
3
1
shift 3
I
r e d u c e A->=
1
error
4
1
error
I
shift 5
I
error
+---------------+---------------+---------------+ 5 1 error 1 r e d u c e A->aAb I r e d u c e A->aAb +---------------+---------------+---------------
1
I
error
I
2
I
2
1
error
I
error
I
3
1
error
I
4
I
4
1
error
1
error
1
5
1
error
1
error
I
The s e t of l a n g u a g e s d e f i n e d b y
LR(l),
SLR(l),
LALR(l),
a r e known t o form a h i e r a r c h y a s f o l l o w s :
SLR(1)
2
LALR(1)
-C
LR(1)
1
+I
and
111.3 Pager's
weak c o m p a t i b i l i t y
I n t h e p r e v i o u s two s e c t i o n s , of
r e s t r i c t i o n s on t h e c l a s s
l a n g u a g e s were i m p o s e d t o r e d u c e t h e n u m b e r of
t h e LR(1) p a r s e r . states
may
be
Pager
[Pag77al shows t h a t t h e
reduced
without
affecting
the
states in number
of
class
of
languages accepted. T h e m o d i f i c a t i o n i n t r o d u c e d b y weak c o m p a t i b i l i t y i s i n the
c o n s t r u c t i o n of
section
t h e LR(1) c h a r a c t e r i s t i c a u t o m a t o n ( s e e
In
11.3).
the
algorithm
for
constructing
the
automaton t h e r e i s t h e statement: f o r each -
s e t of i t e m s I i n C ,
i s n o t empty and J
s u c h t h a t GOTO(1,S) do add -
6
C
two s t a t e s a r e s i m i l a r
in
form,
they
b e r e p r e s e n t e d by a s i n g l e s t a t e , a n d t h e r e f o r e s i m i l a r
c o p i e s of deciding
a
state
whether
can two
be
removed.
states
can
c o m p a t i b i l i t y c r i t e r i o n and states
called
a
merge.
the For
be
action
The
criterion
combined of
contain
o t h e r f o r m s of compatibility.
the
same
t h e LR(1) c o n s t r u c t i o n ,
s e t of i t e m s .
for
is called
combining
s t a t e s a r e c o m p a t i b l e i f t h e y a r e s i m i l a r i n form, they
X
J t o C;
I n t h i s statement i f can
a n d e a c h grammar s y m b o l
that
two two
is,
P a g e r h a s f o u n d two
c o m p a t i b i l i t y w h i c h h e c a l l s weak a n d
strong
Unfortunately,
changing
the
compatibility
from the LR(1) case can cause problems. two states satisfy Pager's the
states
may
compatibility
criterion
In particular, when criteria,
merging
necessitate a propagation of lookaheads to
states already created, which in turn will modify the merged state which caused the original propagation.
However, these
problems can be resolved using the following algorithm:
Algorithm
for
constructinq
an &&
compatible
characteristic automaton
input:
output:
a CGF G and a compatibility function compatible.
a set C, of states, and the function
GOT0 : (set of items) x (N U T) ->
(set of items), which
defines the characteristic automaton.
method:
the three procedures below, initiated by calling
ITEMS' ( G ) ;
f u n c t i o n GOTO(1,s) ; benin
let J -
b e t h e s e t of
->
[A
[ A ->
aX a
.
.
items
b ,
LA]
s e t .
Xb ,
LA]
is i n I;
return closure(J) ; end
-3
p r o c e d u r e ITEMS' ( G ) ; benin
C
: 3
c l o s u r e ( [ S ->
.
S'
, A [A -> A
{[A ->g
.b .b
.b
,
,
LA1] 6 S 1
,
LA2] 6 S2
and for all items [A ->
LA^
a
u LA^] I
.b
there exists an item [A -> 3 for all items [A ->
.b
,
LA1]
.b
,
6 S1
LA2] 6 S2 and
LA2] 6 S2
there exists an item [A -> 2 Then, according to Pager's
,
.b
,
LA1] 6 Sl)
definition, two states S
1
and
S2
are weakly compatible if
i) S 1 and S2 only have common marked productions in That is, if [A -> 3
their item part.
then there exists an item [A -> 3 if item [A -> item [A -> 2
b ,
a
. 1 , LA1]
. a , LA2]
,
,
LA1] 6 S1
LA2] e S2 and
LA2] 6 S2 then there exists an 6 S1
ii) for each pair of items [A -> 2
l a -> r
.b
.b
.b
,
LA1] 6 S 1 and
6 S2, then at least one of the
following is true: a)
LA,^ LA^
=
fi
b) L A l f) LA2 # $ and there exists an item [ B -> 2
LA,^
.
LA1'
,
+$
LA1']
6 S 1 such that
r\ LA2
C ) LA1
[A -> 2
LA,^
# $ and t h e r e e x i s t s a n i t e m
.b
,
LA2']
# $
LA,'
Condition a ) states that i f the
states
which
6 S2 s u c h t h a t
t h e r e a r e no i t e m s
h a v e a common l o o k a h e a d s y m b o l , t h e n t h e
merge c a n n o t p r o d u c e any c o n f l i c t s , not
a R/R c o n f l i c t .
produce
and i n
particular
t h e y h a v e common m a r k e d p r o d u c t i o n s .
r e s u l t of m e r g i n g w o u l d o n l y p r o d u c e a S/R existed
in
one of
conditions is:
b ,
L A 2 ] , [B ->
LA^^ LA2
# d and e i t h e r
R/R
LA^^ LA2 #
conflict
if
b
$,
arising
ab
p r o d u c t i o n s A -> only
,
LA1']
6 S1
2 ,
LA2']
6 S2
LA^^
#
LA1'
-
+>g w
the only possible
from
a n d B ->
and
d
merging
&.
LA^'
# (4,
if
conflict
a
is
t h e l o o k a h e a d s on t h e
However,
this
can
occur
f > R g , p r o d u c i n g a common s u b s t a t e
where b o t h p r o d u c t i o n s w i l l be r e d u c i b l e .
LA^^
or
# $
LA2'
Since
conflict
the
In
[ A ->
n
Therefore,
t h e unmerged s t a t e s b e f o r e m e r g i n g ) .
LA1], [B -> 5
LA2
merged
it
b
5
be
if
c o n d i t i o n b ) and c ) t h e s e t of [ A ->
can
i t is a l s o impossible
(Note:
t o i n t r o d u c e S/R c o n f l i c t s s i n c e t h e s t a t e s w i l l only i f
between
i n addition
1f > R y
and
By
condition
+= > R
b)
*
w 6 T , y, -
t h e n t h e r e must a l r e a d y e x i s t a s t a t e w i t h a R / R c o n f l i c t on some Hence,
symbol if
a 6 LAl
r\
LAl'.
Similarly
t h e language is indeed LR(l),
for condition c).
t h e n i t must b e
the
case
that
therefore conditions a),b) no
*
b +->R y ; d +=> R w';
conflicts
~ , y '6 T ;
and y # y'
and c) are sufficient to
,
and
inssure
will be produced if the language generated by
the grammar is indeed LR(1). For example, let a CFG productions
in
figure
be
defined
3.1.
with
LR(1)
The
the
set
characteristic
automaton contains 38 states (shown in part in figure Under
of
3.2).
weak compatibility, states 8 and 12 can not be merged
since the items [X->a.AE,{d)l the common lookahead symbol d .
6 12 and [Y->a.B,{d)]
6 8 have
However, for example, states
30 and 33 are in fact weakly compatible.
It can be shown that the LR(1)
size
of
a
weak
cornpatable
parsing table will contain a number of states that is
somewhere between that of LALR(1)
and LR(1)
parsing tables.
figure 3 . 2
111.4 Strong compatibility Pager's
strong compatability adds one condition to weak
compatibility
guarantees the production of a LALR(1)
which
parser if the language generated by the grammar is Otherwise
it
will
produce an LR(1) parsing table with the
number of states greater than the number of states by
the
LALR(1)
LALR(1).
produced
method but less than the number produced by
the LR(1) method. Strong compatibility requires that merged
if
they
have
a
common
no
two
descendant
in
characteristic automaton wh.ich will introduce R/R
states
be
the LR(1) conflicts
when the two states are merged. For creates
example, (in
figure 3.2. because
the
the
past) States 8 items
grammar
the
LR(1)
and
12
presented
by
figure
3.1
characteristic automaton in are
[X->a.AE,{d)l
not
weakly
compatible
6 12 and [Y->a.B,{d)J
have a common lookahead symbol "d".
6 8
If these two states are
merged (and hence causing merges of states (20,28), (18,26), (17,251, (16,241, (29932)s
(31,3419 (30,33),
and
pair
(35,37)
where
each
are common descendants) the
resulting states of the automaton would have Hence these two states, according to Pager's in fact strongly compatible.
(19,27),(36,38)
no
conflicts.
definition, are
On t h e o t h e r h a n d ,
3.3
which
creates
(in
a u t o m a t o n i n f i g u r e 3.4. causing
common
r e s u l t i n two R / R
l e t t h e grammar b e part)
the
that
LR(1)
figure
characteristic
M e r g i n g s t a t e s 7 a n d 10 ( a n d hence
descendants
14
a n d 18 t o b e m e r g e d ) w o u l d
c o n f l i c t s o n t h e s y m b o l s "a"
i t s descendant s t a t e .
of
and
"b"
in
Hence t h e s e s t a t e s w i l l n o t b e merged
under s t r o n g c o m p a t i b i l i t y .
f i g u r e 3.3
f 4: [Y->ab., {a)] I B - > b * y{b) I
figure 3 . 4
T h e way i n w h i c h two i t e m s ( f r o m d i f f e r e n t s t a t e s ) produce
a
common s t a t e w i t h a R / R
c a n d e r i v e t h e same s u b s t r i n g .
S1
conflict is if
That i s , i f
the
can
two i t e m s
two
states
a n d S 2 a r e t o b e m e r g e d s u c h t h a t t h e r e e x i s t s two i t e m s
. b , LA1] LA^^ L A l ; -b
[ A -> 2
6 S1
f 6
+>g w
-
and and
[ B -> 5
.4
+ > R 11,
.
,
LA2]
6 S2
where
t h e n t h e two s t a t e s
h a v e common d e s c e n d a n t s s u c h t h a t a m e r g e w i l l i n t r o d u c e R / R conĀ£l i c t s .
For example, the reason that states 7 and 10 (in figure
3.4)
6 7
could not be merged is that the items IX->a.B,{d)l
and [Y->a.b,{d)]
6 10 have a common lookahead symbol d,
and
the strings B and b both rewrite to the string b. The search for a common substring between
two
states,
when necessary to try all possible combinations of rewrites, involves as much work as However,
it
is
not
building necessary
combinations of rewrite rules.
all to
descendant expand
This fact
all
can
states. possible
be
seen
by
understanding how expansion of the nonterminals is performed in building the characteristic automaton.
[A -> g
item
. Xb
,
LA]
is
-b
*
=>R 2 ,
where X -> 2 in P and
closed,
d 6 LA, it will create the item [X ->
. c , first(hd)].
If
it is clear that the elements in the lookahead set
LA will be propagated to the new item. if
That is, when the
-b bB 2 ,
On the
other
hand,
the definition of the function first indicates
that any element d 6 L A is not in first(2d).
Hence, in this
case, the lookaheads defined by first(bd) are independent of
LA and does not effect states derived Stated
differently,
the
only
rewrites
performed are those which are applied which
occur
at
the
end
of
marked
restriction on the number of possible at,
is
what
Pager
calls
a
from
stronq
(denoted =>SR) and is defined as:
to
the
new
that the
item.
should
be
nonterminals
productions. derivations
to
This look
rightmost derivation
-aBc
=>
-
SR abc iff
i ) ~ = g ii) aBy = > R
abc
Pager has derived a procedure(Pag77al two
which
if
checks
items, having a common lookahead symbol, will produce a
R/R
shared descendant containing a feels
that
the
conflict.
The
algorithm presented by Pager is opaque, as
well as slightly incorrect, and that the algorithm paper
(see
author
page
49)
has
been
in
this
corrected and modified to
clarify its nature. The procedures
algorithm
is
which
presented
tries
all
using
possible
two
co-recursive
strong
rightmost
derivations to see if the two given marked productions yeild a
common
descendant
state where two different productions
will be reduced (since this is the conflict
can
be
produced).
trivial cases (i.e.
only
way
that
an
R/R
The procedure C H E C K looks for
cases where no rewrites
are
necessary
to determine the result) while the procedure nontrivialcheck checks those cases requiring rewrites in order to
determine
the wanted criteria. One possibility that procedure C H E C K handles is is
impossible
for
it
two items, with or without rewrites, to
produce a common descendant. and (2) B -> 2.-
if
That is,
let
(1)
be two marked productions where
A -> 2 . B
Assume t h a t t h e s e two m a r k e d p r o d u c t i o n s c a n d e r i v e a common substring
be the case t h a t do
w i l l produce a R/R
which
*
Xf
=>
R
conflict.
*
w a n d Yg =>
-
R
w -
.
Then i t must
S i n c e b o t h f and g
not d e r i v e 2, t h e lookaheads can not propagate through f
a n d g. the
B u t t h e n , b y t h e way LR(1)
string
scanning the derived
from
derived string
Xf
X
w i l l
be
derived
from
f.
from
must
be
X # Y,
are
Hence
x.
i t i s i m p o s s i b l e f o r any i t e m s of
which w i l l produce R / R
X
before
any
string
S i m i l a r l y , any
t h e form
p r o d u c e a common s u b s t r i n g ( a n d h e n c e a
generated,
reduced t o
of t h e form
s t r i n g d e r i v e d f r o m Yg m u s t b e o f since
parsers
common
Therefore, t h i s form t o descendant)
conflicts.
T h e s e c o n d t r i v i a l c h e c k i n t h e p r o c e d u r e CHECK, i s the
two
marked
productions
i m m e d i a t e l y i n d i c a t e a common
d e s c e n d a n t which w i l l p r o d u c e R / R c o n f l i c t s i f merged. is, i f
t h e two i t e m s a r e o f
if
t h e f o r m ( 1 ) A -> g.=Wf
That
and ( 2 )
B -> a.EZg w h e r e
ii)
& 6 (N U T) a n d
i i i ) W,Z It i s c l e a r ,
the
items
6 N a n d W,Z
?>R g
* =>R
2
under t h e above c o n d i t i o n s , (3)
[ A ->
& J
. Wf
t h a t t h e c l o s u r e of
,
LA1]
and
(4)
[B ->
abX
[W ->
.2
. Zg
,
LA2]
and
, Q1
Hence t h i s c a s e
will (6)
w i l l
[Z
produce
->
produce
.2 a
items
the
(5)
, Ql w h e r e Q = L A 1 r \ ~ ~ 2 . common
descendant
where
c o n f l i c t s w i l l be produced. In a l l other cases,
some
rewriting
necessary
is
and
procedure nontrivialcheck is called t o handle these cases. One p o s s i b i l i t y , two
that requires rewriting,
m a r k e d p r o d u c t i o n s a r e of
is
when
the
and ( 2 )
t h e f o r m ( 1 ) A->=.bXf
B->c.bYg w h e r e
*
i ) X 6 N and X =>R
iii)
Y
6 (N U T);
In t h i s case,
3
in
a
X m u s t r e w r i t e t o some s t r i n g
However,
t h i s t h e same
p r o d u c t i o n X ->
a
X->.h- a n d B - > & . a produce R/R
derivable
h
where
as
1#
testing
if
from
there
2 such t h a t t h e items
w i l l s h a r e a common d e s c e n d a n t w h i c h
can
conflicts.
A second p o s s i b i l i t y
i t e m s of
a n d 1 # X
o r d e r t o p r o d u c e a common s t r i n g ( a n d h e n c e a common
descendant). exists
e
handled
t h e f o r m ( 1 ) A->a.bXf
in
and ( 2 )
nontrivialcheck
B->c.m w h e r e
are
i ) X 6 N ii)
g
(N U T);
6
*
a
and X
# Z
iii) f =>R 5
i v ) no p r o d u c t i o n
X->.h
X->h,
and B - > & . a
where
=,
exists
common s t r i n g d e r i v a b l e f r o m Xf
and
must b e of
implies
X#Z,
that
any
t h e form X g w h i l e
a n y common s t r i n g d e r i v a b l e f r o m Zg m u s t b e o f this
that
w i l l h a v e a common d e s c e n d a n t
I n t h i s c a s e , b e c a u s e of c o n d i t i o n i v )
But
such
t h e form
G.
t h a t t h e y c a n n o t d e r i v e t h e same s t r i n g
and hence can n o t have a s h a r e d descendant. The l a s t p o s s i b i l i t y c h e c k e d c h e c k e d by
t h e f o r m ( 1 ) A->g.bX a n d ( 2 ) B - > g . b Y The
XfY.
only
derive a
common
However,
this
p r o d u c t i o n s of marked
procedure
i s t h e c a s e when t h e m a r k e d p r o d u c t i o n s a r e
aontrivialcheck of
the
way
X,Y
6 N
and
t h a t t h e s e two m a r k e d p r o d u c t i o n s c a n
descendant
is
where
the
is
*
X =>
if
same a s t e s t i n g i f
t h e f o r m X->g a n d Y - > t
p r o d u c t i o n s A->&.X
R
and Y - > . t ,
w
-
and
-W .
t h e r e e x i s t s two
such t h a t o r X->.s
w i l l p r o d u c e a common d e s c e n d a n t w h i c h c a n
*
Y => R
either
the
a n d B->&.Y,
contain
an
R/R
c o n f l i c t from merging. For e f f i c i e n c y , t h e p r o c e d u r e
nontrivialcheck
uses
a
special global function t r i e d : N x ( m a r k e d p r o d u c t i o n s ) ->
boolean.
B e f o r e t h e t o p c a l l t o p r o c e d u r e CHECK i s made, is
the function
set t o f a l s e f o r a l l p o s s i b l e i n p u t s , and i t w i l l r e t u r n
false the first time it is After
that,
anytime
called
with
any
will
checking
if
prevent
a
input.
the function is again called with the
same set of arguments, it will return true. function
given
the
nonterminal
Therefore, this
procedure nontrivialcheck from will
rewrite
to
match
some
particular marked item. Finally, it is assumed that on the top CHECK(A ->
a
. 2'
,
B ->
b
. b')
the
conditions hold: i) A -> ii)
s'P
. a' 2 and
f 8 ->
b
B' #
2
.
b'
level
call
following
of two
Co-recursive procedures for 2 -
procedure check(A -> q
B -> {note:
to check
shared descendant
. ala2...a
n'
. blb2...b
a i y b i 6 (N U T);
m
A,B 6 N;
)
: boolean;
a y p 6 (N
U
* )
begin s:= maximum i s.t. t:=
maximum i set.
a ia i+l
...a n
bibi+l ...b,
match:= maximal i sat.
ai
?>g 2 ;
2;
bi;
then check:=false else -if match> max(s,t)
then check: =true else if s>t -
then check:=nonrrivialcheck( B ->
1
A -> g
. b1b2...bm,t . a l a2...a n ' s'match)
else check:=nontrivialcheck( A -> 2
B -> end
-9
p
. a l a*...a n ' s . blb2 . . .
bmyt,match)
-> g
p r o c e d u r e nontrivialcheck(A
ala2...an,s,
. blb2. *.bm,t,
b
B ->
match) : {note:
boolean;
s 5 t)
begin
terminate:=false; repeat if (match then -
-(sol))
< 0)
or
(s=t)
nontrivialcheck:=false; else if (as 6 N)
7 -
terminate:=true;
or
not tried(as,B -
s-1
bbl--b
.
bs..-b m )
then f o r each -
production C ->
s o t o a PC, 5
s
#
e,
and
s-1
.
6 P
C - > . c # B -> b b l " . b
do -
if check(C B ->
->
b s . . .b
. c,
bb l . . . b
s-1
m
. bs .'bn)
then nontrivia1check:ptrue;
else -if
(sat)
and
(match-1~s) and bt 6 N
and check (B -
...b s-1 bs. ..bn' ...as-1 . as ...an
-> bb
A -> a a
)
then nontrivialcheck:=true;
terminate:=true
fi*
-9
until terminate; end
-3
Using the above,
two
states
S1
and
S2
are
strong
6 S1 then there
exists
compatible if
i) If the item [A -> 2 an
item
- > a .b9LA21
. -b,LA21
[A -> [A ->
[A
a
. b,LAll
6 S2
then
6 S2
there
and
if
exists
the
item
an
item
b,LA1] 6 S1
ii) for each quadruple of items [A ->
a
[A -> 2
. -b,LAll, [B ->
. P,LA21,[B
&
-> &
&LA;]
. &,LA;]
6 SI, 6 S2
either a) weak compatibility between the items hold or
b)
b
and & do not share a descendant.
C h a p t e r JJ
An E r r o r -
In
the
R e c o v e r y Method
previous
were
constructions
two
T h e d o w n f a l l of
designed
only
to
chapters,
discussed,
parsers.
for && P a r s e r s
a l l
five
of
which
a l l LR p a r s e r s i s
decide i f
different produce
that
they
the
unfortunate
used i n a compiler,
is
found,
be
more
the parse stops with failure. to
have
This
t h a t when s u c h a p a r s e r i s
result
once t h e f i r s t i l l e g a l
desirable
are
the given input is legal, that
i s , b e l o n g s t o t h e l a n g u a g e g e n e r a t e d by i t s g r a m m a r . causes
LR
the
terminal However,
parse
report
symbol i t would
as
many
additional errors as possible. S e v e r a l people have schemes
proposed
for
LR
[G&R75,D&R77,P&D79,O'H76,Pen77,P&D781. only
deal
various
error
recovery parsers
This
chapter
w i l l
w i t h one s u c h method, which i s a m o d i f i c a t i o n of
t h e one presented
by
DeRemmer
and
Pennello[P&D79].
The
algorithm
presented
h e r e d i f f e r s from t h i e r s i n t h a t i t i s
i n c o r p o r a t e d i n t o t h e LR p a r s e r a n d d o e s n o t
attempt
error
correction. In order t o describe e r r o r recovery, we f i r s t how
a n LR p a r s e r w o r k s .
qoq l . . o q
n
describe
L e t a p a t h b e a s e q u e n c e of s t a t e s
such t h a t f o r each s t a t e q
i'
one of
the
following
conditions hold:
i) $ o t o ( q i y X )
= qi+l
-
ii) a c t i o n ( q i , a )
f o r some X 6 N
qi+l
A path w i l l be denoted a s
where
ai
states
6 (N U T )
such
path
[qo:g].
then the path
that
either
g o t ~ ( q ~ , ~ , =a ~ q i ). t o p : p a t h ->
f o r some a 6 T.
Also,
let
T h a t i s , i f 2 = a l a 2...a
[qO:g] i s t h e s e q u e n c e of a ~ t i o n ( q ~ - ~ ~qia ~ ) o r P
t h e r e s u l t of
s t a t e be defined a s t h e
is qoqlo..q n
.
n
state
the function
qn
F i n a l l y , whenever t h e p a t h
f r o m t h e s t a r t s t a t e ( o f t h e LR p a r s e r ) i t
where
the
[ q : ~ ]b e g i n s
w i l l
simply
be
denoted a s [a]. T h e b a s i c c o n t r o l of decision
function
df
a LR p a r s e r c a n b e d e f i n e d b y
: p a t h x T ->
the
(path U{reiect,accept))
as f o l l o w s :
i) d f ( [ a l , b )
=
[abl
some s t a t e j 6 K.
if action(top([=l),b)
= shift j
for
i i ) d f ( [ ~ l , b )= d f ( [ a A l , b ) if
a c t i o n ( t o p ( [awl ) , b ) = r e d u c e
aw # -
S when b = $
i i i ) d f ([Sl , $ )
= reduce S
->
S'
n o t d e f i n e d by r u l e s i ) t h r o u g h i i i )
The a l g o r i t h m t o implement t h e a b o v e d e c i s i o n
i s simply as follows:
procedure parse(df,input); begin p a t h : = [ s t a r t ,=I ; repeat t:=next
t e r m i n a l symbol from i n p u t ;
path:-df
(path, t ) ;
u n t i l (path = accept) or p r i n t path; end
and
is d e f i n e d a s r e j e c t f o r a l l p a i r s
([a] , b )
- 9
w,
= accept
i f action(top([SI),$)
i v ) df
A ->
(path = reject);
function
Note t h a t t h e v a r i a b l e p a t h i s i m p l i c i t l y used which
as
a
stack
h o l d s t h e p r e f i x of s e n t e n t i a l f o r m s b e i n g r e c o g n i z e d
by t h e p a r s e r .
The e r r o r r e c o v e r y s t r a t e g y d e s c r i b e s what t o do i f t h e parse
of
an
A s can be seen from
input results i n reject.
t h e p r e v i o u s a l g o r i t h m , LR p a r s e r s have that
they
stop
reading
error
would
nice
property
i n p u t immediately a f t e r t h e input
s t r i n g is found t o be i l l e g a l . an
the
The b e s t r e c o v e r y f r o m
such
b e i f t h e p a r s e c o u l d somehow b e r e s t a r t e d
s u c h t h a t a l l o t h e r e r r o r s made i n t h e i n p u t c o u l d b e p i c k e d up.
Unfortunately, t h i s strategy is r e a l l y unfeasible since
it carries the
implicit
assumption
of
knowing
what
the
w r i t e r m e a n t when h e w r o t e t h e s t r i n g t o b e p a r s e d . A much m o r e c o n s e r v a t i v e a p p r o a c h i s t o o n l y s t a t e w h a t
remaining
substrings
t o t h e g i v e n grammar. the
rightmost
-a
is
error
6 ( N U T)
a
of
the input a r e impossible according
That is, i f
string
w
derivation
such
*
*,
and
2 6 T
reported as an error.
6 T
t h e remaining i n p u t a f t e r
*
and t h e r e doesn't
that then
the
*
S =>
R-
awc
substring
for
w
exist a some
should be
For example,
c o n s i d e r t h e two p s e u d o PASCAL p r o d u c t i o n s
< s t m t > -> FOR < v a r > : = < e x p > TO < e x p > DO < s t m t > < s t m t > -> WHILE < e x p > D O < s t m t > with the erroneous input FOR X : = l
5 DO B E G I N J:==X;
w h e r e t h e t e r m i n a l s y m b o l "TO" Using
an
s y m b o l "5". that
parse,
LR
parsing
L:=X END; has a c c i d e n t l y been l e f t out. would s t o p a f t e r r e a d i n g t h e
A s one Looks f o r s u b s e q u e n t e r r o r s ,
it is clear
"5" i s a v a l i d s u b s t r i n g d e r i v a b l e f r o m S.
It is a l s o
c l e a r t h a t 5 can occur a t t h e following p o i n t s i n t h e
given
product ions < s t m t > ->
FOR < v a r > : = " < e x p > " TO < e x p > DO < s t m t >
< s t m t > -> FOR < v a r > := < e x p > TO " < e x p > " DO < s t m t >
< s t m t > -> WHILE " < e x p > " DO < s t m t > By e x p a n d i n g t h e s u b s t r i n g t o i n c l u d e t h e n e x t i n p u t s y m b o l , the
n e x t p o s s i b l e s u b s t r i n g t o t e s t w o u l d b e "5 DO1'.
t h e number of p o s s i b l e p o s i t i o n s of
this
string
Here,
has
been
reduced t o < s t m t > -> FOR < v a r > :=
< s t m t > ->
WHILE " < e x p > DO"
Continuing t h i s process, DO
BEGIN
< e x p > TO " < e x p > D O "
J:=X;
it is clear t h a t the
substring
L:=X END" c a n c o r r e s p o n d t o t h e f o l l o w i n g
positions i n the productions: < s t m t > ->
"5
FOR < v a r > := < e x p > TO " < e x p > D O < s t m t > "
< s t m t > -> WHILE l l < e x p > D O < s t m t > "
A t t h i s point,
string of
implies
t h e semicolon a t t h e end
of
the
parse
t h a t a r e d u c t i o n s h o u l d b e p e r f o r m e d by one One p o s s i b i l i t y i s
t h e above p r o d u c t i o n s .
to
take
the
s t r i n g recognized before t h e r e j e c t point,
a n d t o e i t h e r add
o r d e l e t e symbols t o produce a match
therefore
which in
reduction the
error
However,
the
one
BEGIN
s u b s t r i n g "5 DO deterministic
correction chosen
by
that
remove i t from f u r t h e r
method the
could
used
by
[P&D79].
a u t h o r assumes t h a t t h e
L:=X
J:=X;
string
decide
T h i s t y p e of e r r o r r e c o v e r y i s
t o choose.
fact
and
END"
is
the
be recognized,
consideration.
That
maximal and hence
is,
w i l l
it
r e s t a r t t h e p a r s e s t a r t i n g with t h e semicolon. The a b o v e recovery
example
method
described
method more e x p l i c i t l y ,
fact in
characterizes t h i s chapter.
l e t me s t a r t b y
the
error
To s t a t e t h e
defining
an
error
a s e t of LR p a r s e r s t a t e s , where e a c h e r r o r s t a t e
as
state
in
c o n t a i n s t h e s e t of L R p a r s e r s t a t e s t h a t t h e p a r s e m i g h t b e in.
restart s t a t e as a s p e c i a l e r r o r s t a t e c o n t a i n i n g
The
a l l t h e LR p a r s e r s t a t e s . The f i r s t s h i f t , through
the
illegal
i n error recovery, i s a terminal
symbol
that
forced
shift
produced
the
rejection.
T h i s s h i f t c a n b e v i e w e d a s a p a r a l l e l s h i f t , on
the
symbol
error
a,
from
all
LR p a r s e r s t a t e s I i n t h e
r e s t a r t s t a t e t o a l l s t a t e s J such t h a t a c t i o n ( 1 , a )
= J.
It
w i l l then t r y t o p a r s e t h e i n p u t where t h e p a r s e w i l l s t a r t ,
simultaneously,
from e a c h of
t h e LR p a r s e s t a t e s J
existing
a f t e r t h e f o r c e d s h i f t t h r o u g h t h e i l l e g a l symbol.
If along
t h e way,
will
a n y of
t h e s e p a r s e s produce an e r r o r ,
it
be
dropped from f u r t h e r c o n s i d e r a t i o n f o r s i m u l t a n e o u s p a r s i n g . One p o s s i b l e r e s u l t o f
w i l l b e d r o p p e d f r o m t h e s e t of
parses Under
this
derivation
condition, such
*
it is q u i t e l e g a l
Hence, symbol
input
can
not
R -awc
to
occur,
that
for
assume
is
there
the
no
parsed input
that
and
all
that
simultaneous parses.
clear
is
it
S =>
that
is
t h e above process
the
next
input
report i t a s an error.
S i n c e t h i s i s an e r r o r , t h e a l g o r i t h m w i l l t h e n r e s t a r t recovery
method
on
the
n e x t i n p u t symbol.
f i r s t a c t i o n on any e r r o r i s a f o r c e d s h i f t . to
guarantee
that
the
e r r o r recovery should not continue i f the
the
Note t h a t t h e
is
This
input is parsed.
remaining
x.
done Also,
illegal
terminal
t h e above e r r o r
recovery
s y m b o l was t h e e n d o f s t r i n g m a r k e r $ . The s e c o n d p r o b l e m i s t h a t i f process
is
to
be
m e r g e d i n t o t h e LR p a r s e r ,
p a r s e s h a v e t o b e made d e t e r m i n i s t i c . with
t h e a c t i o n f u n c t i o n f o r a s e t of
There is
s t a t e s c a n b e l u m p e d i n t o a new
function.
problem
the result
In this
case
it
that the action is deterministic, since resulting
clear
creating
no
states, if
for a l l possible inputs is a s h i f t entry.
is
the parallel
a
new e r r o r s t a t e .
set
of
states
and
hence
T h e same i s t r u e f o r t h e g o t o
Therefore, nondeterminism can only occur
if
the
action,
for
a
set
of states to be simultaneously parsed,
contain either i) shifts and reductions for the same input symbol ii) reductions for different productions for
the
same
input symbol (as shown in the previous example) Unfortunately, neither of these cases seem to be
resolvable
deterministically.
parse
was
performed,
the
If,
in
either
case,
the
allowed to continue and the next action was result
would
produce
two
different
above two conditions would prefixes.
Such
paths.
That is, the
result
in
disjoint
will
be
called
conditions
overdefined.
However, some decision still has to
be
remaining
Again, the conservative
input
can
approach was taken. becomes
be
parsed.
made
sentential
so
that
Whenever the input string being
overdefined,
the
parser
assumes
that
the
parsed
it is the
maximal substring it can recognize, and restarts
the
whole
error recovery process on the next input symbol. By merging the error-recovery into the LR parser, a new LR
parser
with
error
parsing
table
M
, pot0 ,
=
(
K ,actian
parser
M'
= (
where
with
K
,
error
K'
,
recovery
c
n
be built.
is
G
,
recovery action
,
the
start), be
poto
If an LR
then
defined
,
G
,
tuple let
as
start
the the
,
same tuple
init-error)
K,G, and start are defined as in M,
K' is a set of new states called error recovery states init-error is a state in
K'
denoted
as
the
restart
state of the error recovery method goto : (K U K')
x
action : (K U K')
N -> K U K' U {error) x T ->
{shift k I k 6 K ) U
{error,overdefined) U
{reduce p I p 6 P) Furthermore, the init-error state will be for
each
b 6 T,
action(init-error,b)
so
defined
= shift j
that
for
some
state j.
Each recovery state is a set of parsing states
K,
that
such
it
is
the
set
of
states
in
that can occur
simultaneously for the input string being parsed. Using the
above
LR
definition,
parsers
with
error
recovery can be built by the following algorithm:
Construction
parser with error recovery
LR parsing table M = (K,action,poto,G,start)
input:
output: start
of J&
LR parsing table M'
, init-error)
=
(K
,
K'
,action
,-
,G
,
method: begin {initialize state init-error) set K' -
to the single state containing the set {s 6 K)
and label it as init-error; for each let s -
a 6 T be the set
{j 6 K ( action(i,a)
=
shift j
for all i 6 init-error); if s -
is a singleton
then set else -if
s' to the element of s
s 6 K'
then set else add -
s' to that state in K' s to K'
and label the new state as s';
fi set action(init-error,a) od -
= se
for each let s -
X 6 N be the set
1 goto(t,X)
{j 6 K
= j
for all t 6 init-error); if s is empty then set poto(init-error,X) else if s is a singleton then set else -if
= error
s' to that element of s
s 6 K'
then set -
s'
to the state in K'
containing
S
else add -
s to K',
and set s' to its label
fi set $oto(init-error,X) -
=
s';
fi -
od {build each general error state) repeat for each -
state i 6 K' such that the parsing table
for that state is still undefined for each -
a 6 T
i f there -
e x i s t s two s t a t e s S 1 s S 2 6 i s e t .
[ A -> 2
[B -> 2
.
,
LA1] 6 S 1 w h e r e a 6 L A l
A ,
LA2] 6 S 2
where f i r s t ( 4 ) = a then s e t action(i,a) = overdefined -e l se i f t h e r e e x i s t s two s t a t e s -
w h e r e a 6 LA1
LA2 a n d A - > a
#
B->b
then set a c t i o n ( i , a ) = overdefined -e l se i f there exists a state s 6 i s.to [ A -> 2
then s e t --
. ,LA]
6 s w h e r e a 6 LA
a c t i o n ( i , a ) = r e d u c e A ->
else let s {j 6 K
be the set
I
action(t,a)
f o r a l l t 6 i);
= shift j
w
if s is empty then set action(i,a) -else if -
=
error
s is a singleton then set s'
else if --
to the element in s
s 6 K'
then set --
s'
to the state in K'
containing s else add -
s to R',
setting s' as the
label of the added state; fi set action(i,a) -
= shift s'
fi fi -
od for each let s -
X 6 N& be the set {j 6 K
I
goto(t,X)
for all t 6 i); if s is empty then set goto(i,X) -
= error
= j
else if -
s is a singleton
then set s' else -if s 6 K' then set s' -
to the element in s
to the state in K'
containing s else add -
s to K',
and set s' to its
label fig
-9
set goto(i,X) -
= s'
fi od od until no more states can be added to K' end -
Using the resulting LR,parser with error recovery, basic df'
control
: path x
can
be
handled using the decision function
T -> path as follows:
i) df0([q:a1 ,b) = [q:=bl when action(top([q:gl),b) j 6 (K U K')
the
= shift j for some
ii) df'([q:=l
,b)
=
df'([q:aAJ
,b) = reduce A ->
when action(top([q:aw]) aw = -
,b)
w,
and if
S then b # $
iii) df'([init-error:w],b)
=
df'([init-error:A],b)
when action(top([init-error:wl),b) = reduce
where
iv) df'(
v) df'(
a
#
[Sl , $ )
e
and b # $
=
accept
[init-error:S] , $ )
A -> E ,
= Reject
if action(top( [init-error:S] ) , $ ) = accept or I
overdef ined
vi) df8([q:&]
,$)
= reject
when action(top([q:=l),$)
vii) df' ( [init-error :a],b) where b P $,
=
= error
[init-error,b]
and
action(top([init-error,&]),b)
7
viii) df'
(
[ q : ~ ],b)
=
= overdefined
[init-error :bl
where b # $ and action(top([q:g])
, b ) = error
Note t h a t c a s e s v i ) o r v i i i ) r e p r e s e n t been
found
in
the
that
s t r i n g being parsed.
an
error
has
Hence, any e r r o r
m e s s a g e s produced a r e produced a t t h e s e p o i n t s . F i n a l l y , an
LR
parser
with
error
recovery
can
be
implemented s i m p l y by c a l l i n g t h e p r o c e d u r e p a r s e , u s i n g d f ' a s the d e c i s i o n function.
Chapter
1
Implementation
T h i s c h a p t e r d i s c u s s e s two p r o g r a m s . creates
an
SLR(1) p a r s e r , w i t h e r r o r r e c o v e r y .
p r o g r a m c r e a t e s e i t h e r a n LR(l), or
a
strongly
compatible
LR
d i s c u s s e s t h e r e p r e s e n t a t i o n of both
The f i r s t program
programs.
The
second
LALR(l), weakly parser.
The s e c o n d compatible
The f i r s t s e c t i o n
the parsing tables built section
by
describes
the
i m p l e m e n t a t i o n o f t h e S L R ( 1 ) p a r s e r c o n s t u c t o r a n d how
that
s y s t e m i s u s e d w h i l e t h e t h i r d s e c t i o n d o e s t h e same f o r t h e second p a r s e r c o n s t r u c t o r .
V.l Representation of the parsing -
The representation suggest
using
of
arrays.
the
For
tables
parsing
tables
naturally
uniformity of both access and
values held in the arrays, all terminal symbols, nonterminal symbols,
and productions are provided with an internal code
of integers by both programs.
For
terminal
symbols,
the
codes are defined by the set {i I Olicn where n is the number of distinct terminal symbols occurring in the productions) where 0 is reserved
for
the
special
terminal
symbol
$.
Nonterminal symbols are encoded using the set {i
I -msil-1 where m is the number of distinct nonterminals occurring in the productions)
where the start symbol S will always be given the
code
-1.
Productions are coded using the set {i
I 1si
S'
is always given the code 1.
In representing the action
and
t
o
functions,
only
non-error values are kept internally since the vast majority of the function values are in values
are
saved
fact
error.
The
remaining
in groups, one for for each state, where
states having the same
set
of
non-error
represented by a single copy of the groups.
values
will
be
For e x a m p l e ,
the grammar
would p r o d u c e the f o l l o w i n g SLR(1) p a r s i n g t a b l e s :
Action table
-
I
15
I
S->E
I
S 9
1
I
I
+-----+--------+--------+--------+--------+--------+--------
1 16 I 0 I 0 I S 8 I I +-----+--------+--------+--------+--------+--------+-------1 17 1 0 I 0 I 0 I I +-----+--------+--------+--------+--------+--------+--------+ w h e r e shift j i s represented by S j, r e d u c e p is represented b y p , o v e r d e f i n e d i s represented by 0 , and error is omitted.
-
I
I
S l O I
I
S l
1
+ O I + I
goto table
1
14
1
I 1 5
I 1 6
1 1 7
+j
I
15
I
1
I
I I
+
+-----+--------+--------+--------+--------
+-----+--------+--------+--------+--------
I
1 16 1 I I +-----+--------+--------+--------+--------+ 1 17 1 I I I
I
I
+
+-----+--------+--------+--------+--------
where g o t o ( i , X ) = e r r o r h a s b e e n o m i t t e d
By e l i m i n a t i o n o f tables
the error values,
d o e s n o t n e e d t o be s a v e d .
58.8%
of the
above
A l s o , s t a t e s 1,2,8 and 9
i n t h e p r e v i o u s a c t i o n t a b l e a l l h a v e t h e same same
set
of
values
and
t h e r e f o r e w i l l b e r e p r e s e n t e d by o n l y o n e group
of v a l u e s . Each n o n - e r r o r
value
of
the
action
table
w i l l
be
of
the
represented a s follows:
i ) action(i,a)
= s h i f t j w i l l be represented
by t h e p a i r ( x , j )
is
x
where
the
code
t e r m i n a l symbol a .
= r e d u c e A ->
i i ) action(i,a) by t h e p a i r terminal
(x,-p)
symbol
w
w i l l be represented is
where x a
and
p
is
the the
code
of
code
the
of
the
p r o d u c t i o n A -> 2.
i i i ) a c t i o n ( i , a ) = overdefined w i l l be represented by t h e p a i r ( x , O ) w h e r e x i s t h e o f
the
terminal
symbol a.
The n o n - e r r o r s t a t e i,
will
be
values
of
represented
the as
~ o t o ( i , A )= j a n d x i s t h e c o d e of
goto the
table, pair
for
(x,j)
some where
t h e n o n t e r m i n a l A.
For e f f i c i e n c y i n r e t r i e v i n g t h e v a l u e s from t h e a c t i o n and g o t o t a b l e s t h e i n t e g e r p a i r s c o r r e s p o n i n g t o each s t a t e a r e s o r t e d u s i n g t h e r e l a t i o n 5' w h e r e
( a , b ) 5'
(c,d)
i f f e i t h e r aT
20
represent
V . 2 SLR(1) -
*
: reduce E->T
+
: shift 8
)
: reduce E->T
implementation
This section describes how to constructor
with
error
the restriction that
A -> g.
no
use
recovery.
the
parser
SLR(1)
This implementation has
production
can
be
of
the
form
Included in this section is a brief description of
the input grammar,
how
to
run
the
system,
and
how
to
interpret the output produced.
V.2.1
Input Grammar
The input for the program is defining
the
CFG
constructed from.
which
the
set
of
productions
the SLR(1) parsing table is to be
The input will be parsed in a free
style
format, that is, no formatting by columns or line boundaries will be used. a
The end of line character will be treated
as
blank character and each symbol on the input file must be
separated by one or more blanks.
In general, a
terminal
symbol
is
by
represented
a
nonblank string, of 15 characters or less not beginning with "n,n$tc,and "."). In the event that the user may use one of the metasymbols used by the program, or a nonblank string beginning
with
""
.
symbols
are
represented
as
character
15 characters or less, enclosed by the symbols The first symbol of the
string,
if
not
the
empty string, must begin with a nonblank character but blank characters can appear anywhere program
also
accepts
the
else
in
the
string.
The
"" which represents a
string
nonterminal symbol whose name is the empty string. Productions are represented by writing them in the form
A ->
w
where A is a nonterminal,
w
is a sequence of grammar
symbols, and It->" is a metasymbol recognized by the program. Each
production
metasymbol
"."and
It$" must
appear.
is
separated
from
the
next
using
the
after the last production, the metasymbol The
productions
can be entered in any
order except that the first production, on the
input
file,
must be the start production. For example, the grammar
presented
V.1
in
could
be
represented by the following piece of input:
A shorthand notation also exists for productions having the
same
A ->
w
In
hand
side
(i.e.
where A remains constant
these
A -> -1 w
left
productions of the form
between
the
productions).
cases, the productions can be entered in the form
! w -2
...
1
'
! w
-Z1
where there exists the productions
For example, the grammar
in
section
V.l
could
have
the
input
alternatively been written as:
The order in which productions are found in file
corresponds
internally.
In
to a
the order in which they will be coded similar
manner,
the
terminal
and
nonterminal symbols will be coded in the order corresponding to their first appearance in the set of productions.
V.2.2
R u n n i n g t h e SLR(1) p a r s e r c o n s t r u c t o r
The s y s t e m c a n b e School,
by
entering
run
on
the
Vax-11
in
the
Moore
t h e following monitor level procedure
call: $ @[ k a r l ] s l r b n f
After invocation,
the procedure w i l l ask the
user
for
the
f i l e s u s e d by t h e program, and r u n t h e program. The f i r s t f i l e t o b e r e q u e s t e d i s t h e t h e s e t of p r o d u c t i o n s ,
file
containing
and i s r e q u e s t e d w i t h t h e prompt:
input : The s e c o n d f i l e r e q u e s t i s f o r t h e
output
file
which
w i l l c o n t a i n a l l d i a g n o s t i c and i n f o r m a t o r y messages, and i s r e q u e s t e d w i t h t h e prompt: output : The t h i r d f i l e r e q u e s t i s f o r t h e f i l e t h a t t h e c r e a t e d SLR(1)
parsing
t a b l e s s h o u l d b e saved on,
and i s r e q u e s t e d
w i t h t h e prompt: internal representation: T h e l a s t two f i l e r e q u e s t s a r e f o r t e m p o r a r y f i l e s t h a t can
b e u s e d by t h e p r o g r a m , a n d a r e b o t h r e q u e s t e d w i t h t h e
prompt: temporary s t o r a g e u n i t :
Upon c o m p l e t i o n o f
the f i l e requests,
the
The p r o g r a m w i l l n o t p r o d u c e a n y o u t p u t ,
run.
program
on t h e u s e r ' s
s c r e e n , nor w i l l i t a s k t h e u s e r f o r any f u t h e r unless
the
SLR(1)
parsing
c o n f l i c t s ( s e e s e c t i o n V.2.4
w i l l
This paper containing
the
not
SLR(1)
is
information
t a b l e was c r e a t e d and c o n t a i n s f o r handling t h i s case). mention
parsing
how
to
use
the
file
t a b l e s e x c e p t f o r a PASCAL
program s k e l e t o n i n appendix a.
V.2.3
Interpretation
of the
output f i l e
T h e o u t p u t c a n b e b r o k e n i n t o two m a j o r s e c t i o n s
where
t h e f i r s t s e c t i o n d e s c r i b e s how t h e p r o g r a m p a r s e d t h e i n p u t grammar a n d t h e parsing
second
tables.
produced only i f
section
However,
the
prints
the
second
SLR(1)
built
section
w i l l
t h e r e were n o e r r o r s d e t e c t e d i n t h e
be
first
section. T h e f i r s t p a g e of being
parsed,
i l l e g a l syntax. input
grammar,
the input f i l e . be written,
along If
the output is a with
any
error
copy
of
the
input
messages i n d i c a t i n g
t h e r e were n o s y n t a c t i c m i s t a k e s i n
the
t h e n t h i s p a g e w i l l b e a n e x a c t d u p l i c a t e of Otherwise,
p o r t i o n s of
the input f i l e
w i l l
and w i l l be i n t e r s p e r s e d w i t h s y n t a c t i c a l e r r o r s
r e c o g n i z e d by t h e program.
For example,
t h e erroneous input:
would produce t h e f o l l o w i n g o u t p u t :
< S > ->
In
this
.
< A > ->
example,
production
has
the
a
a b
program
.
A ***illegal
is
reporting
LHS
that
the
t e r m i n a l symbol on t h e l e f t hand s i d e of
the production. The n e x t t h r e e s u b s e c t i o n s of coding
scheme
of
terminals,
u s e d by t h e program. For example,
the input :
the
output
reports
the
nonterminals, and productions
w o u l d p r o d u c e t h e f o l l o w i n g output:
T E R M I N A L NODES:
-------- ------
N O N T E R M I N A L NODES:
----------- ------
PRODUCTIONS :
------------
T h e program provides additional coding
information
schemes, t h a t is, if t h e s t r i n g "*undef*"
n o n t e r m i n a l , t h e n that n o n t e r m i n a l d o e s
not
with
the
procedes a
occur
on
the
left
hand
side
of a n y p r o d u c t i o n r e c o g n i z e d w h i l e p a r s i n g
the input f ilea B e l o w t h e c o d i n g s c h e m e i s a d i a g n o s t i c summmary o f how
well
the
program
did
i n parsing t h e given i n p u t bnf.
e v e r y t h i n g i s a c c e p t a b l e t o t h e program, message
"successful
parse"
SLR(1) p a r s i n g t a b l e s .
and
it w i l l
attempt
Otherwise,
it
print
If the
t o construct the
w i l l
give
an
error
summary o f why i t t h o u g h t t h e i n p u t w a s w r o n g , a n d a b o r t a n y further calculations. S h o u l d t h e i n p u t grammar b e program
then
To b e g i n w i t h , each
attempts
t o b u i l d t h e SLR(I)
i t computes t h e f i r s t
nonterminal,
and
p r i n t s o u t t h e s e t s of each s t a t e .
successfully
prints
and
parsed,
parsing tables.
follow
o u t t h e s e sets.
SLR(1) i t e m s
defining
the
the
sets Second, core
for it
of
For example,
t h e p r e v i o u s i n p u t grammar
would
produce
output f o r the f i r s t f i v e s t a t e s a s follows:
............................................. 1)
< S > ->
.
STATE : 1
'
............................................. 7)
< F > ->
.
(
STATE : 2
) STATE : 3
-------------------------------------------a-
6)
< F > ->
id
............................................. 5)
STATE : 4
->
............................................. 3)
< E > ->
4)
< T > ->
The last section
readable
form of
s i z e of
the array
parsing
tables,
of
STATE : 5
. . + the
output,
for
a
is
run,
a
t h e produced p a r s i n g t a b l e followed by t h e parsetable. for
Non-error
values,
of
the
each s t a t e are l i s t e d s e p a r a t e l y w i t h
the action values preceeding t h e s o t o values.
For example, the output produced by the program for the parsing values for the first state would be as follows:
.............................................. STATE 1 id SHIFT TO 3 SHIFT TO 2
(
V.2.4
Conflict Resolution
Sometimes, when a CFG G is provided SLR(1)
input
to
the
parser constructor it can not produce a SLR(1) parser
for G since L(G) In
as
such
is not in the class of languages of SLR(1).
cases,
the
construction
method
has
produced
conflicts in the action table. For example, the grammar in figure 5.1 a
+
is an example of
natural grammar for arithmetic expressions with operators and
*.
grammar,
The and
LR(0) the
characteristic
automaton,
for
follow sets are shown in figure 5.2.
states 9 and 10, there
will
exist
S/R
conflicts
on
this In the
symbols
+
and
*
if
the
characteristic automaton.
SLR(1) parser is built from the This can
also
be
seen
in
the
output produced by the program for such an input (see figure
5.3).
finure 5.1
figure 5.2
...............................................
STATE : 9
***REDUCE/SHIFT CONFLICT O N SYMBOL + OLD ENTRY: -2 CONFLICTING ENTRY: ***REDUCE/SHIFT CONFLICT ON SYMBOL * OLD ENTRY: -2 CONFLICTING ENTRY:
...............................................
STATE : 10
***REDUCE/SHIFT C O N F L I C T O N S Y M B O L + OLD ENTRY: -3 CONFLICTING ENTRY: ***REDUCE/SHIFT CONFLICT O N S Y M B O L * OLD ENTRY: -3 CONFLICTING ENTRY:
f i g u r e 5.3
It t u r n s o u t t h a t t h e s e c o n f l i c t s c a n favor
of
either
looking
resolved
in
a s h i f t o r a r e d u c e a c t i o n by knowing t h e
p r e c e d e n c e and a s s o c i a t i v i t y o f example,
be
these
two
operators.
a t s t a t e 9 and t h e o p e r a t o r
*,
For
the parser
is a t t e m p t i n g t o recognize t h e s e n t e n t i a l form:
E + E * E Assuming t h a t
*
has precedence over
+,
it is c l e a r
that
we
want
to
s h i f t on t h e i n p u t symbol
*
string E
*
t o f i r s t recognize the
E a n d r e d u c e i t t o t h e s t r i n g E p r o d u c i n g t h e new
s e n t e n t i a l form E + E
S h o u l d t h e grammar i n t h e i n p u t f i l e p r o d u c e c o n f l i c t s , the
w i l l
program
arbitrarily
pick
one
of
the
action
d e f i n i t i o n s f o r t h e symbol c a u s i n g t h e c o n f l i c t i n t h e s t a t e and
discard
This choice is
a l l other conflicting entries.
r e p o r t e d t o t h e u s e r a s shown i n f i g u r e 5 . 3 . the
"OLD
program
ENTRY: while
discarded
xx"
the
entry-
f o r t h e symbol
*,
represents
"CONFLICTING Hence,
I n each
t h e e n t r y chosen by t h e ENTRY:
i n s t a t e 9,
yy"
states
a
the a r b i t r a r y choice,
was t o r e d u c e on t h e p r o d u c t i o n l a b e l l e d 2
E->E+E).
(i.e.
To a l l o w t h e u s e r t o c h a n g e t h e a r b i t r a r y by
case,
choice
made
t h e p r o g r a m , t h e p r o g r a m w i l l a l s o become i n t e r a c t i v e i f
a n y c o n f l i c t s a r i s e i n b u i l d i n g t h e SLR(1) p a r s e r .
That is,
t h e program w i l l prompt t h e u s e r w i t h t h e prompt: ENTER STATE TO RESOLVE: To t h i s r e s p o n s e , If will
two c h o i c e s a r e a v a i l a b l e .
the user responds with t h e
stop
so
parser.
0,
the
program
t h a t t h e u s e r can look a t t h e output f i l e i n
order to identify a l l existing SLR(1)
number
If
conflicts
in
building
the
t h e u s e r f e e l s t h a t t h e s e c o n f l i c t s can -
not be resolved, user
should
then he is out
rerun
the
of
luck.
Otherwise,
the
p r o g r a m a n d when g e t t i n g t h e a b o v e
prompt, h e s h o u l d r e s o l v e t h e c o n f l i c t s by u s i n g t h e
second
option. The s e c o n d o p t i o n i n r e s p o n d i n g t o t h e a b o v e p r o m p t s t o type i n t h e s t a t e t h a t t h e u s e r wants t o resolve. u s e r completes h i s answer, t h e program w i l l core if
of
After the
print
out
the
and w i l l a s k t h e u s e r
the state, for verification,
it i s t h e s t a t e h e wanted.
The n e x t r e q u e s t by t h e program provide
t h e i n t e g e r c o d e of
is
for
the
user
to
t h e t e r m i n a l symbol c a u s i n g t h e
conĀ£ l i c t u s i n g t h e prompt: ENTER SYMBOL NUMBER T O RESOLVE: As above,
t h e program w i l l v e r i f y
printing
out
the terminal's
the
user's
response
by
name a n d a s k i n g t h e u s e r i f i t
i s t h e c o r r e c t t e r m i n a l symbol.
Again,
a "N"
response
w i l l
c a u s e t h e program t o reprompt f o r a s t a t e t o r e s o l v e w h i l e a I1
y It r e s p o n s e w i l l h a v e t o p r o g r a m
continue
processing
the
resolution. The n e x t r e q u e s t , action
function's
a f t e r t h e symbol r e q u e s t ,
value
is for
the
f o r t h e s t a t e and symbol w i t h t h e
prompt: ENTER NEW ACTION TO TAKE: If
t h e v a l u e p r o v i d e d by t h e u s e r i s a p o s i t i v e i n t e g e r (and
hence of
a
shift action),
t h e program w i l l p r i n t o u t t h e c o r e
the s t a t e the s h i f t is to.
is
user
negative
(and
w i l l p r i n t out the provided user i f user's
production
by t h e u s e r .
associated
given
with
by
the
t h e program the
label
it w i l l then ask the
I n e i t h e r case,
t h i s was w h a t t h e u s e r w a n t e d a n d a g a i n
verify
the
input.
the
conflict
the
resolution
user has
disregard the conflict resolution. user
value
hence a reduce e n t r y ) ,
The p r o g r a m w i l l p r o v i d e after
If the
will
one been
A "Ytt
last
chance,
specified,
response
by
to the
c a u s e t h e r e s o l u t i o n t o b e p r o c e s s e d w h i l e a "N"
r e s p o n s e w i l l d i s r e g a r d t h e r e s o l u t i o n p r o v i d e d by t h e u s e r . In
either
case,
t h e program w i l l t h e n r e q u e s t f o r a n o t h e r
c o n f l i c t r e s o l u t i o n w i t h t h e prompt: ENTER S T A T E TO RESOLVE:
At t h i s point,
user
responds
with
t h e whole a
0.
process
repeats
unless
I f a 0 i s t y p e d i n by t h e u s e r ,
t h e n no more c o n f l i c t r e s o l u t i o n s w i l l b e p r o c e s s e d a n d program w i l l b u i l d t h e SLR(1)
parser.
w i l l n o t produce
parser
an
SLR(1)
c o n f l i c t has been resolved.
the
the
Note t h a t t h e program unless
at
least
one
V.2.5
Size Restrictions
This program contains several size
restrictions
which
are as follows:
i) No more than 100 terminal symbols may be used.
ii) No more than 200 nonterminal symbols may be used.
iii) No more than 300 productions input
may
appear
in
the
.
iv) No terminal
or
nonterminal
name
may
exceed
15
characters.
v ) For each production A ->
of
terminal
w, w
can not be
a
string,
and nonterminal names, exceeding a length
of ten names.
vi) The number of parse states, created by the program, must not exceed 600.
vii) The number of SLR(1) items, excluding the items of the form A ->
. w, must
not exceed 9,999.
v i i i ) The s i z e of
the array parsetable can
t h e d i m e n s i o n s of
1 0 , 0 0 0 x 2.
V.3 L R ( 1 I L L A L R ( l ) , -
not
exceed
Weak a n d S t r o n g C o m p a t i b i l i t y
parser g e n e r a t o r s
T h i s s e c t i o n d e s c r i b e s how t o u s e t h e p r o g r a m w h i c h c a n build
either
LR(l),
compitable parsing tables. brief
description
program,
V.3.1
weak
LALR(l),
of
compatible,
Included i n t h i s
the
input
grammar,
or strong
section
-how
is
a
t o run the
a n d how t o i n t e r p r e t t h e o u t p u t .
I n p u t Grammar
The i n p u t f o r t h e program i s defining produced.
the
from
CGF
set
of
productions
which t h e p a r s i n g t a b l e s a r e t o b e
These productions
w i t h a l i s t of
the
can
be
optionally
t e r m i n a l s and n o n t e r m i n a l s ,
preceeded
allowing the user
t o s p e c i f y t h e i n t e g e r codes given t o t h e s e symbols. The i n p u t w i l l b e p a r s e d i n a f r e e s t y l e
is, used.
no
formatting The e n d of
by
columns
format,
that
o r l i n e boundaries w i l l be
l i n e character w i l l be treated a s a
blank
character
and each symbol on the input must be separated by
at least one blank. In general, a terminal symbol is any nonempty string of nonblank I1 (11
It
I
.
characters which does not begin with the character
However, it can not be any of 11
9
.
11 9
I1
It#",
user wants to
use
one
.,
or "e").
of
the
I1
the
metasymbols
(i.e.
In the event that the
metasymbols
or
string
a
beginning with a "", and includes
the name composed by the empty string (""). Productions are represented by writing them in the form
A ->
w
where A is the name of a nonterminal,
w
is a sequence
of terminal and nonterminal names, and "->I1 is a recognized by the program.
metasymbol
The symbol "e" has been reserved
to represent the empty string so
that
productions
of
the
using
the
e can be written. form A -> Productions are separated from metasymbol production. i.e.
of
,
and
no
symbols
Productions having the the
form
each should same
A -> -1 w ' A -> -w2 ,
other
follow left
.. , A
the last
hand
side,
-> -nS w can be
written in the form A -> -1 w the metasymbol " 1 "
I A ->
x2
1
... I
A ->
%
where
is treated as an "or" symbol.
For example, the grammar S -> A
A -> aAb
=
A ->
could be entered with the input:
Productions, when
parsed,
will
be
coded
internally
using the order in which they appear on the input. restriction on
the
order
in
which
the
The only
productions
are
written is that the start production must appear first. Unlike the optionally
SLR(1)
parser
constructor,
production
the
That is,
metasymbol " # I " .
may be empty.
followed
either
for
the
second
nonterminal, -2
by
of
the
Elements in these lists will be labeled
in the order that they are found (I for the first 2
the
It is not necessary that all terminals
and nonterminals appear in these lists, and list
before
user is allowed to provide a list of
terminals, followed by a list of nonterminals, the
program
the user to specify the coding scheme of
allows
the nonterminal and terminal symbols. start
this
terminal
for
the
etc*
second
and
-1
nonterminal
terminal,
for the first etc.).
Any
remaining terminals, or nonterminals, not specified by these lists will be labelled
according
to
the
order
of
first
a p p e a r a n c e i n t h e set of p r o d u c t i o n s . For example,
a s s u m e u s i n g t h e p r e v i o u s grammar t h a t t h e
u s e r wants t h e t e r m i n a l b t o b e l a b e l l e d 1 and t e r m i n a l a t o b e l a b e l l e d 2.
T h i s c o u l d b e d o n e by u s i n g t h e i n p u t :
The p r o g r a m d e s c r i b e d b y t h i s s e c t i o n i n f a c t h a s the
SLR(1)
parsing
t a b l e s ( p r o d u c e d b y r u n n i n g t h e SLR(1)
p r o g r a m d e s c r i b e d i n s e c t i o n V.2) t h i s program.
Hence,
used
to
parse
t h e d e s c r i p t i o n of
the
input
for
t h e input r u l e s can
b e f o r m a l l y d e s c r i b e d by t h e s e t o f r u l e s u s e d t h e SLR(1) p a r s i n g t a b l e s w h i c h a r e a s f o l l o w s :
in
creating
.
-> -> ! tother prods> -> nonterminal '-> nonterminal -> ! -> nonterminal '-> -> e-rule ! ! I e-rule ! I -> terminal ! nonterminal ! terminal ! nonterminal -> # ! # ! # -> terminal ! terminal -> nonterminal ! nonterminal $
'.
'.
.
'.
.
.
.
.
. . .
V.3.2
Runing
the
program
The program can be run School
by
entering
the
on
the
Vax-11
in
the
Moore
following monitor level procedure
call: @ [karll runnewbnf
After invocation, the procedure willask
the
user
for
the
files used by the program, and then run the program. The first file requested by the procedure is
the
file
containing the set of productions, and is requested with the prompt: B N F FILE:
The s e c o n d f i l e i s r e q u e s t i s f o r t h e o u t p u t f i l e w h i c h w i l l contain
all
diagnostic
and
informatory
and i s
messages,
requested with t h e prompt: OUTPUT F I L E : The l a s t r e q u e s t i s f o r t h e f i l e t o save t h e p a r s i n g
tables
c r e a t e d and is r e q u e s t e d w i t h t h e prompt: TABLE : Upon c o m p l e t i o n o f run.
the f i l e requests,
the
program
A f t e r t h e program f i n i s h e s r e a d i n g t h e i n p u t bnf
t h e program w i l l r e q u e s t t h e u s e r t o s p e c i f y
what
is
file,
type
of
p a r s e r s h o u l d be c r e a t e d w i t h t h e prompt: ENTER OPTION 0 COMPUTE FIRSTS ONLY 1 BUILD L R ( 1 ) PARSE TABLE 2 BUILD LALR(1) PARSE TABLE 3 BUILD WEAK COMPATIBLE LR PARSE TABLE 4 BUILD STRONG COMPATIBLE LR PARSE TABLE
-
-
-
Once t h e u s e r r e s p o n d s , corresponding as
it
tries
parse to
table,
build
the
program
w i l l
build
the
p r i n t i n g o u t "BUILDING STATE X"
state
X.
This
completes
a l l
i n t e r a c t i o n t h e program h a s w i t h t h e u s e r . The f i r s t p a g e of
the output f i l e
is
a
copy
of
the
i n p u t being p a r s e d , along w i t h any e r r o r messages d e s c r i b i n g i l l e g a l syntax.
For example, < S > ->
the erroneous input:
.
->
a b
.
A ->
e
would p r o d u c e t h e f o l l o w i n g o u t p u t : INPUT PARSE OF PRODUCTIONS: --------- -- -----------
***
3 2 ) PRODUCTION DEFINITION EXPECTED
The a b o v e e r r o r i s s t a t i n g t h a t a t t h e b e g i n n i n g 32
of
the previous input l i n e ,
find a
production
but
found
on
column
t h e program w a s e x p e c t i n g t o something
else
(i.e.
the
file,
after
t e r m i n a l symbol A ) . The n e x t t h r e e s u b s e c t i o n s of the
parse
terminals,
of
the
input,
nonterminals,
and
the input:
a b l < s t a r t s y m b o l > -> < A > < A > -> a < A > b I e
output
r e p o r t s t h e c o d i n g scheme of
program. For example,
the
.
productions
used
by
the the
would produce t h e f o l l o w i n g o u t p u t : TERMINALS: ----------
NON-TERMINALS :
--------------1.
< s t a r t symbol>
-2.
*START SYMBOL* * U N I Q U E * *NOT USED O N RHS*
PRODUCTIONS :
------------
As
can
be
informational provided,
seen
by
messages
the about
above
example,
nonterminal
additional symbols
are
and a r e a s f o l l o w s :
*START SYMBOL*
-
S t a t e s t h a t t h e n o n t e r m i n a l symbol h a s
b e e n r e c o g n i z e d a s t h e s t a r t symbol.
*UNIQUE*
-
S t a t e s t h a t t h e s t a r t symbol d o e s n o t o c c u r
anywhere e l s e i n t h e p r o d u c t i o n s and v a l i d s t a r t symbol.
hence
is
a
*NOT UNIQUE*
-
S t a t e s t h a t t h e s t a r t symbol o c c u r s i n
another production besides
the
start
production
and h e n c e i s a n i n v a l i d s t a r t symbol.
-
*NOT USED O N RHS*
s t a t e s t h a t t h e nonterminal never
a p p e a r s o n t h e r i g h t hand s i d e of any p r o d u c t i o n .
*NT NOT REACHABLE*
-
S t a t e s t h a t t h e nonterminal can
n o t a p p e a r i n any
of
the
h e n c e need n o t b e p a r t of
sentential
and
t h e i n p u t grammar.
-
*NT REPRESENTS N O TERMINAL STRINGS*
i s n o t any t e r m i n a l
forms
strings
States that there
derivable
from
the
nonterminal.
*NT NOT DEFINED*
-
S t a t e s t h a t t h e nonterminal does not
a p p e a r on t h e l e f t hand
side
of
any
production
recognized from t h e i n p u t f i l e .
A f t e r t h e coding schemes, f i r s t s e t of Finally, constructed,
w i l l
print
the
each nonterminal. if the
the
user
program
appropriate parsing tables. be
t h e program
selects w i l l
to
have
a
parser
c o n s t r u c t i t and p r i n t t h e
The o u t p u t o f
the
parser
w i l l
p r i n t e d by s t a t e s where e a c h s t a t e w i l l c o n t a i n i t s c o r e
(items) and non-error
a c t i o n and g o t o v a l u e s .
F o r e x a m p l e , u s i n g t h e i n p u t grammar u s e d a b o v e , the
user
table,
chose
to
build
a
and i f
s t r o n g c o m p a t i b l e LR p a r s i n g
t h e p a r s e t a b l e s p r i n t e d would b e as f o l l o w s : STRONG COMPATIBLE L R ( 1 ) CHARACTERISTIC TABLE
l ) < s t a r t s y m b o l > -> LOOKAHEADS : SEOFS
.
TABLE ENTRIES: $EOF$ REDUCE B Y 3 a SHIFT TO 3 GO TO 2
......................
STATE : 2
l ) < s t a r t s y m b o l > -> LOOKAHEADS : SEOFS
.......................
.
TABLE ENTRIES: $EOF$ REDUCE B Y 1 ...................... 2 ) < A > -> a LOOKAHEADS : SEOFS b TABLE ENTRIES: a SHIFT TO 3 b REDUCE B Y 3 GO TO 4
.
STATE : 3 b
.......................
...................... 2 ) < A > -> a < A > LOOKAHEADS : $EOF$ b
STATE : 4
.
.......................
b
TABLE ENTRIES:
b S H I F T TO 5 ...................... 2 ) < A > -> a < A > b LOOKAREADS : $EOF$ b TABLE ENTRIES : $EOF$ REDUCE B Y 2 b REDUCE b y 2
STATE : a
f
.......................
Appendix
A
Sample PASCAL skeleton for use of SLR(1) Program doparse(table,
parsing tables
{any other files used by program) ) ;
Const numbers tates parsetablesize
= x; {x? of actual parse states) = y; {yz actual size of
numberproductions
=
errorvalue
array parsetable) z; {zl actual number of productions) = n; {n value not in set of labels)
type
{the path will be represented as a stack using a linear list) parsestack = ^stacknode; stacknode = record topstate : integer; next : parsestack end 9 var table : file
of
integer; {file containing parsing tables)
function push(stack : parsestack; newstate : integer) : parsestack; {returns stack with new state added in front) var temporary -
: parsestack;
begin new(temporary) ; with temporaryn do begin topstate:=newstate; next:=stack end 9 push:=temporary end 9
function pop(stack
: parsestack)
: parsestack;
{removes the top element of the stack) begin pop:=stackA.next; dispose(stack) end - 9 function top(stack
: parsestack)
: integer;
{returns state on top of stack) begin top:=stackn,topstate -3end function empty(stack
: parsestack) : parsestack;
{returns an empty stack) begin while stacknil & stack:=pop(stack); empty:=nil end -3 function gettoken
integer;
{This routine returns the label of the next terminal occuring in the input file) end
-3
procedure semantics(stack : parsestack; production : integer); {does any semantic routines associated with reducing the given production) end
- 9
procedure errormessages(state
,
symbol : integer);
{prints out message corresponding to error value for state and symbol) end
-9
function parse : boolean; {parses input. returns true if no parsing errors are found in parsing the input) const
eof token = 0;
type
{representation of an entry in parsetable) tableentry = record symbol , value : integer end -9 {representation of a reference to a group of entries in parsetable) stateeotry = record startposition , size : integer end -, {representation of a production in productionlist) productionentry = record lhssymbol , rhslength : integer end -* var parsetable : array [ 1 actionlist
,
.. parsetablesize 1 of tableentry; array 1 .. numberstates I of stateentry;
gotolist :
[
-
..
productionlist : array [ 1 numberproductions] of productionentry; {other parameters passed with parsing tables) topstate, parsestart, errorstart, errorcontinue, topoftable, productioncount : integer;
{actual number of parse states) {start state) {forced shift state on error recovery) {init-error state) {actual size of parsetable) {actual number of productions)
{local variables) token : integer ; {next terminal from input) value : integer; {next action to take in parsing input) stop : boolean; {true when have parsed whole input) parseerror : boolean; {true if any parsing errors) stack : parsestack; {holds path) procedure getparsetable; (reads in parsing tables) var index -
: integer;
procedure getin(=
invalue : integer);
{reads in next integer from file table) begin invalue:=tableA; get (table) end
- 9
begin reset (table); getin(topstate1; getin(parsestart1; getin(errorstart1; getin(errorc0ntinue); getin(topof table); getin(productioncount); for index:=l Q topstate & begin with actionlist [index] & begin getin(startposition); getin(size) end - 9 with gotolist [index] do begin getin(startpositi0n); getin(size) end end - 9
-
topoftable & with parsetable [index] do begin getin(symbo1) ; getin (value) end -9 for index:=l to productioncount with productionlist [index] & begin getin(rhs1ength) ; getin (lhssymbol) end
for index:=l to -
-
-
end
-9
function clear(stack : parsestack; newbottom : integer
)
: parsestack;
{empties stack and put value on bottom of stack) begin clear:=push(empty(stack),newbottom) end 9 function popelements(stack : parsestack; amount : integer ) : parsestack; {takes the requested amount of states off the stack) begin if (amount = 0) or (stack = nil) then popelements:=stack else. popelements:=popelements(pop(stack), pred (count)') end 9
-
-
function popoffproduction(stack : parsestack; count : integer ) : parsestack; {takes the requested amount of states off the stack, but if stack underflow occurs, it resets the bottom state) begin stack:=popelements(stack,count);
if stack = nil then popoffproduction:=push(stack9errorcontinue) else popoffproduction:=stack end
-9