LR parser answer key | Summaries Computer Science

CS143 Handout 11

Summer 2012 July 9st, 2012

SLR and LR(1) Parsing

Handout written by Maggie Johnson and revised by Julie Zelenski.

LR(0) Isn’t Good Enough

LR(0) is the simplest technique in the LR family. Although that makes it the easiest to

learn, these parsers are too weak to be of practical use for anything but a very limited set

of grammars. The examples given at the end of the LR(0) handout show how even small

additions to an LR(0) grammar can introduce conflicts that make it no longer LR(0). The

fundamental limitation of LR(0) is the zero, meaning no lookahead tokens are used. It is

a stifling constraint to have to make decisions using only what has already been read,

without even glancing at what comes next in the input. If we could peek at the next

token and use that as part of the decision-making, we will find that it allows for a much

larger class of grammars to be parsed.

SLR(1)

We will first consider SLR(1) where the S stands for simple . SLR(1) parsers use the

same LR(0) configurating sets and have the same table structure and parser operation,

so everything you've already learned about LR(0) applies here. The difference comes in

assigning table actions, where we are going to use one token of lookahead to help

arbitrate among the conflicts. If we think back to the kind of conflicts we encountered in

LR(0) parsing, it was the reduce actions that cause us grief. A state in an LR(0) parser

can have at most one reduce action and cannot have both shift and reduce instructions.

Since a reduce is indicated for any completed item, this dictates that each completed

item must be in a state by itself. But let's revisit the assumption that if the item is

complete, the parser must choose to reduce. Is that always appropriate? If we peeked at

the next upcoming token, it may tell us something that invalidates that reduction. If the

sequence on top of the stack could be reduced to the non-terminal A, what tokens do we

expect to find as the next input? What tokens would tell us that the reduction is not

appropriate? Perhaps Follow(A) could be useful here!

The simple improvement that SLR(1) makes on the basic LR(0) parser is to reduce only if

the next input token is a member of the follow set of the non-terminal being reduced.

When filling in the table, we don't assume a reduce on all inputs as we did in LR(0), we

selectively choose the reduction only when the next input symbols in a member of the

follow set. To be more precise, here is the algorithm for SLR(1) table construction (note

all steps are the same as for LR(0) table construction except for 2a)

LR parser answer key, Summaries of Computer Science

Related documents

Partial preview of the text

Download LR parser answer key and more Summaries Computer Science in PDF only on Docsity!

SLR and LR(1) Parsing

LR(0) Isn’t Good Enough

LR(0) is the simplest technique in the LR family. Although that makes it the easiest to

learn, these parsers are too weak to be of practical use for anything but a very limited set

of grammars. The examples given at the end of the LR(0) handout show how even small

additions to an LR(0) grammar can introduce conflicts that make it no longer LR(0). The

fundamental limitation of LR(0) is the zero, meaning no lookahead tokens are used. It is

a stifling constraint to have to make decisions using only what has already been read,

without even glancing at what comes next in the input. If we could peek at the next

token and use that as part of the decisionmaking, we will find that it allows for a much

larger class of grammars to be parsed.

SLR(1)

We will first consider SLR(1) where the S stands for simple . SLR(1) parsers use the

same LR(0) configurating sets and have the same table structure and parser operation,

so everything you've already learned about LR(0) applies here. The difference comes in

assigning table actions, where we are going to use one token of lookahead to help

arbitrate among the conflicts. If we think back to the kind of conflicts we encountered in

LR(0) parsing, it was the reduce actions that cause us grief. A state in an LR(0) parser

can have at most one reduce action and cannot have both shift and reduce instructions.

Since a reduce is indicated for any completed item, this dictates that each completed

item must be in a state by itself. But let's revisit the assumption that if the item is

complete, the parser must choose to reduce. Is that always appropriate? If we peeked at

the next upcoming token, it may tell us something that invalidates that reduction. If the

sequence on top of the stack could be reduced to the nonterminal A, what tokens do we

expect to find as the next input? What tokens would tell us that the reduction is not

appropriate? Perhaps Follow(A) could be useful here!

The simple improvement that SLR(1) makes on the basic LR(0) parser is to reduce only if

the next input token is a member of the follow set of the nonterminal being reduced.

When filling in the table, we don't assume a reduce on all inputs as we did in LR(0), we

selectively choose the reduction only when the next input symbols in a member of the

follow set. To be more precise, here is the algorithm for SLR(1) table construction (note

all steps are the same as for LR(0) table construction except for 2a)

1. Construct F = {I 0 , I 1 , ... In}, the collection of LR(0) configurating sets for G'.

2. State i is determined from Ii. The parsing actions for the state are determined as

follows:

a) If A –> u• is in Ii then set Action[i,a] to reduce A –> u for all a in Follow(A) (A is

not S').

b) If S' –> S• is in Ii then set Action[i,$] to accept.

c) If A –> u•av is in Ii and successor(Ii, a) = Ij, then set Action[i,a] to shift j (a

must be a terminal).

3. The goto transitions for state i are constructed for all nonterminals A using the

rule: If successor(Ii, A) = Ij, then Goto [i, A] = j.

4. All entries not defined by rules 2 and 3 are errors.

5. The initial state is the one constructed from the configurating set containing

S' –> •S.

In the SLR(1) parser, it is allowable for there to be both shift and reduce items in the

same state as well as multiple reduce items. The SLR(1) parser will be able to determine

which action to take as long as the follow sets are disjoint.

Let's consider those changes at the end of the LR(0) handout to the simplified expression

grammar that would have made it no longer LR(0). Here is the version with the addition

of array access:

E' –> E

E –> E + T | T

Here are the first two LR(0) configurating sets entered if id is the first token of the input.

In an LR(0) parser, the set on the right has a shiftreduce conflict. However, an SLR(1)

will compute Follow(T) = { + ) ] $ } and only enter the reduce action on those tokens. The

input [ will shift and there is no conflict. Thus this grammar is SLR(1) even though it is

not LR(0).

Similarly, the simplified expression grammar with the assignment addition:

which we have u as a handle on top of the stack which we then can reduce, i.e., replacing

u by X. We allow such a reduction whenever the next symbol is in Follow(X). However, it

may be that we should not reduce for every symbol in Follow(X), because the symbols

below u on the stack preclude u being a handle for reduction in this case. In other

words, SLR(1) states only tell us about the sequence on top of the stack, not what is

below it on the stack. We may need to divide an SLR(1) state into separate states to

differentiate the possible means by which that sequence has appeared on the stack. By

carrying more information in the state, it will allow us to rule out these invalid

reductions.Consider this example from Aho/Sethi/Ullman that defines a small grammar

for assignment statements, using the nonterminal L for lvalue and R for rvalue and *

for contentsof.

S' –> S

S –> L = R

S –> R

L –> *R

Consider parsing the expression id = id. After working our way to configurating set I 2

having reduced the first id to L, we have a choice upon seeing = coming up in the input.