Boost logo

Boost :

From: Joel de Guzman (joel_at_[hidden])
Date: 2004-12-26 13:44:08

David Abrahams wrote:

> I guess I was confused (and still am, if you say you've been agreeing
> with me) by this little exchange:
> Me:
> Okay, either I'm "right" and first-follow can be used to prune
> unproductive parses in a nondeterministic parse *or* using
> first-follow makes a parser deterministic. It can't be both :)
> You:
> Using first-follow makes a parser deterministic, it can't be both.
> But I'm happy to drop this now. It isn't crucial.

Ok, I looked back and re-read the exchanges. Now I understand
where we diverged. I should have been more careful. You are right
of course. And I agree with you. I think I didn't quite communicate
what I meant. Re-reading, I realize there indeed is a lapse in
logic there. It should have read: first-follow is used to make a
parser deterministic.

>>When the grammar is ambiguous, the computation of the first set
>>will fail because the set itself cannot have multiple same entries.
> It's hard for me to understand how you reach that conclusion. When an
> algorithm says to add something to a set that is already in the set, the
> normal expectation is that the set remains unchanged and the algorithm
> proceeds, not that it "fails."

Another lapse in logic. The computation of the first set alone may
not fail. It is the building of the predictive parser from the
information that will fail. That was what I meant, sorry.

> I've tried to say several times that the properties of LL(1) are
> irrelevant to the arguments I've been making, and you always seem to
> eventually agree, but here it is again. What is your point in bringing
> up LL(1) here?

Ok, let's prune this unproductive branch, backtrack, and move on :-)

>>Long story. Anyway, the basic problem is that Spirit parsers
>>are dynamic. The computation of the first set, for instance,
>>cannot be done at compile time.
> Sure, but I thought you were going to be mostly emphasizing a static
> subrule model.

Yes. But 1) unless I eliminate all the dynamic rules, this will
still be a problem. 2) There are very nice things you can do
with dynamic parsing that can't be done with pure static parsing.

Unfortunately, I can't solve 1 and I consider 2 very important.

> I guess if you use string literals you'll have this
> problem no matter what, though I'm not sure the use of string literals
> should be a foregone conclusion. If you consider parser generators like
> YACC, they cleanly separate lexical analysis from parsing and the parser
> itself is never working with string literals.

Again, true. However, one of the original intent of Spirit is to
be usable in micro parsing tasks. When parsing small grammars
like, say, a complex number, you do not want to write a separate
front end lexer.

Anyway, I do plan to write an optional but separate lexer

>>Say for example:
>> r = str_p("apple") | "banana" | "orange";
>>the first set of r is: ['a', 'b', 'o']. This can be obtained
>>through a call to first(r) which calls first("apple"),
>>first("banana") and first("orange") recursively.
> Right.
>>The problem is that each alternative can potentially have
>>different types. A production such as:
>> a | b | c;
> (nit: that's a RHS, not a production. You need a LHS symbol to make it
> a production).

I wonder why I snipped "r = ". It was there when I wrote my

>>can be thought of as a
>> composite<alternative_operation, tuple<A, B, C> >
>>where A, B and C are the types of a, b and c. So, now, when
>>creating the parser, you'll have to make a table of alternates
>>tuple<A, B, C> which dispatches properly based on information
>>known only at runtime: 1) the current input character and
>>2) the first follow sets of A, B and C.
>>One possible solution is to make the parsers truly static.
>> str<'a','p','p','l','e'>()
>> | str<'b','a','n','a','n',','a'>()
>> | str<'o','r','a','n','g','e'>()
>>in this case, first/follow can be a static operation. But, I'm not
>>thrilled about this solution. It has lots of problems too (e.g.
>>the rule's opaque virtual function wall).
> Again, only an issue when crossing the static/dynamic boundary.

Inevitable, I guess.

> In YACC this would be:
> enum tokens { apple, bananna, orange };
> Actually, I rather like
> typedef str<'a','p','p','l','e'> apple;
> for Spirit.

It should be an object so we don't have to instantiate it
to work with ETs:

     str<'a','p','p','l','e'> apple;

> Most grammars don't have billions of tokens. BTW, is
> typedef str<'appl','e'> apple;
> legal C++? I'm just curious about multi-char literals.

I would think so, but extracting the individual chars from
the multi-char literal would have the endian problem, and the
number of bits in the literal would be implementation defined;
so, I would assume that 'appl' (assuming 4 chars) would not work
on all platforms. AFAICT.

>>At the opposite extreme, another possible solution is to use virtual
>>functions and run-time dispatch all over. Hence, the productions can
>>be stored in standard homogeneous containers instead of heterogenious
>>containers (tuples). I'm also not thrilled about this solution.
> Agreed. I don't really see why your previous idea was so bad in the
> case where you're *not* using runtime dispatch all over.

I'll feel very sad to leave dynamic parsers behind. It will be
a casualty once we go fully static. I might have a solution but
it's still too early to tell if its any good.

>>No. At least, not just yet. I'd like to concentrate first on top-bottom
>>RD approaches. I have a keen eye on recursive ascent, however.
> Hm. I didn't *think* their strategy for managing nondeterministic
> semantic actions was specific to bottom-up parsing. Is it??

Not sure. Isn't it? Ok, I'll investigate.

> Let me also mention that taking throw-away semantic actions is not so
> bad when:
> 1. The actions are all stateless and just produce synthesized
> attributes


> 2. First/Follow pruning eliminates most parse explorations that
> later turn out to fail anyway.



Joel de Guzman

Boost list run by bdawes at, gregod at, cpdaniel at, john at