Boost logo

Boost :

From: Joel de Guzman (joel_at_[hidden])
Date: 2004-12-26 01:43:47


David Abrahams wrote:
> Joel de Guzman wrote:
>
>>David Abrahams wrote:
>>
>>>Joel de Guzman wrote:
>>>
>>>
>>>>Using first-follow makes a parser deterministic, it can't be both.
>>>>An LL(1) parser, for instance, doesn't do backtracking.
>>>
>>>Yes, but LL(1) is not synonymous with "recursive descent with first/follow."
>>
>>Hmmm... I don't recall mentioning that. Did I?
>
>
> No. I'm arguing that the mere use of first/follow does not in and of
> itself make a parser deterministic.
>
>
>>>>>>typically with one symbol lookahead (LL(1)). A strictly predictive
>>>>>>parser is a lot more restrictive in the grammars that one can use,
>>>>>>however. There's no way to make it parse adhoc grammars like the
>>>>>>C pre-processor or, ahem, c++. What I am inclined to do is to mix
>>>>>>determinism with classic spirit's non-determinism. Exactly the
>>>>>>recipe that you are alluding to.
>>>>>
>>>>>I'm not sure if it is or not -- I'm just getting more confuseder. ;^)
>>>>
>>>>Classic LL(1) does not have the "dynamic failure conditions" that
>>>>you mention.
>>>
>>>Yes, I know that. But then, LL(1) is not synonymous with "recursive
>>>descent with first/follow."
>>
>>Sure. But I don't see the point. What am I missing?
>
>
> Well, I say "you could use first/follow to prune unproductive parses
> while still using dynamic failure conditions to get nondeterminism," and
> you come back with "Classic LL(1) doesn't have those." I assume you're
> using that as an argument against my statement, but it's a hollow
> argument because classic LL(1) is only one way to use first/follow with
> recursive descent.

Oh. I see.

No, I'm not using it as an argument, and no, I am not disputing
your statement. That's the very reason I qualified my statement
with "classic LL(1)". Actually, we were in agreement (or at least I
thought so).

>>>>NLL(1) grammars cannot have ambiguity. If the grammar
>>>>is ambiguous, first follow computation will fail (first follow
>>>>conflicts).
>>>
>>>
>>>Hum? The algorithm for computing the first and follow sets of a symbol
>>>don't have an "otherwise the computation fails" branch, and the
>>>definitions of these sets are simple:
>>>
>>> first(a) = the set of tokens that can begin strings derived from a.
>>> follow(N) = the set of tokens that can follow N in grammatical sequences
>>>
>>>You can use this information to eliminate hopeless branches of a parse
>>>and avoid wasting effort backtracking without requiring the grammar to
>>>be LL(1) or LL(2) or... even to be unambiguous at all.
>>
>>Yes. In theory.
>
>
> So "first follow computation will fail" is false. BTW, I don't know
> what NLL(1) means. Nondeterministic LL(1)? I can't even picture that.

No, it is true.

BTW, the "NLL" is a typo introduced by your quoting of my post.
Here's the original:

> Classic LL(1) does not have the "dynamic failure conditions" that
> you mention. LL(1) grammars cannot have ambiguity. If the grammar
> is ambiguous, first follow computation will fail (first follow
> conflicts).

When the grammar is ambiguous, the computation of the first set
will fail because the set itself cannot have multiple same entries.
You'll have to look-ahead more than 1 tokens to resolve the ambiguity,
and by doing so, the grammar is no longer LL(1).

>>>Maybe we're talking about the same thing.
>>
>>Yes! That was what I was trying to point out from the start.
>>However, it's not as simple as it seems.
>
>
> Why not?

Long story. Anyway, the basic problem is that Spirit parsers
are dynamic. The computation of the first set, for instance,
cannot be done at compile time. Say for example:

     r = str_p("apple") | "banana" | "orange";

the first set of r is: ['a', 'b', 'o']. This can be obtained
through a call to first(r) which calls first("apple"),
first("banana") and first("orange") recursively.

The problem is that each alternative can potentially have
different types. A production such as:

     a | b | c;

can be thought of as a

     composite<alternative_operation, tuple<A, B, C> >

where A, B and C are the types of a, b and c. So, now, when
creating the parser, you'll have to make a table of alternates
tuple<A, B, C> which dispatches properly based on information
known only at runtime: 1) the current input character and
2) the first follow sets of A, B and C.

One possible solution is to make the parsers truly static.
Example:

     str<'a','p','p','l','e'>()
     | str<'b','a','n','a','n',','a'>()
     | str<'o','r','a','n','g','e'>()

in this case, first/follow can be a static operation. But, I'm not
thrilled about this solution. It has lots of problems too (e.g.
the rule's opaque virtual function wall).

At the opposite extreme, another possible solution is to use virtual
functions and run-time dispatch all over. Hence, the productions can
be stored in standard homogeneous containers instead of heterogenious
containers (tuples). I'm also not thrilled about this solution.

Searching for a viable solution to this problem is actually
one of the main motivations behind the development of Fusion. So,
IOTW, I had to develop some basic infrastructure/scaffoldings first
before I could tackle predictive parsing in Spirit. Hence, it wasn't
as easy at it seems.

>>>If you're talking about what I'm describing, I don't think there's
>>>anything particularly radical about it, and I'd be very surprised if it
>>>hadn't been done before.
>>
>>AFAIK, no. Correct me if I'm wrong.
>
>
> I once sent you a comparative study of some NLP parsing strategies. One
> of those was using first/follow computation and parsing ambiguous
> grammars. I'm pretty sure they were using backtracking or that they
> told me that they had thought of adding it.

Cool! I'll look into that.

>>>>An alternative scheme is to let the user switch
>>>>strategies (possibly using spirit directives).
>>>>
>>>>Another scheme, used by PCCTS, is to make the parser predictive
>>>>by default, and allow the user to use syntactic and semantic
>>>>predicates with arbitrary lookahead on specific points in
>>>>the grammar.
>>>
>>>
>>>Eewww, I smell a kludge ;-)
>>
>>Not really. There's a big problem with backtracking in general:
>>backtracking and semantic actions do not mix well (e.g. false
>>triggering).
>
>
> It's not a problem if you build a parse tree and run the analysis on
> that (which is sometimes required for synthesized attributes anyway).

Agreed. Yet, for that to succeed, you'll have to build it all the
way to the top (start rule). Only then will all correct branches
be known and all failed branches be thrown away. For simple parsing
tasks, this is expensive. In a lot of cases, I do not want to build
a parse tree.

>>In a syntactic and semantic predicate, all actions
>>are intentionally inhibited.
>
>
> Have you considered the strategy of
> http://docs.biostat.wustl.edu/cgi-bin/info2html?(bison.info.gz)GLR%2520Parsers
> (http://tinyurl.com/694kq)
> ??

No. At least, not just yet. I'd like to concentrate first on top-bottom
RD approaches. I have a keen eye on recursive ascent, however.

Cheers,

-- 
Joel de Guzman
http://www.boost-consulting.com
http://spirit.sf.net

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk