Boost logo

Boost :

From: Andreas Grosam (agrosam_at_[hidden])
Date: 2003-09-08 10:18:24


On Montag, September 8, 2003, at 03:38 Uhr, Joel de Guzman wrote:

> Hi Andreas,
>
> Gee, those are interesting insights! Would you mind
> if we discuss this over at Spirit's mailing list?
>
> Spirit-general mailing list
> Spirit-general_at_[hidden]
> https://lists.sourceforge.net/lists/listinfo/spirit-general
>
> Thanks, and see you there :-)

Ok, cu there.
But I would like to mention, that the source forge mailing list service
is currently not working well. I get a lot of errors, like "Error:
Forum not found", when I attempt to read a mail. Searching the archives
does not work, too. Also, i need to subscribe to this list first...

Andreas

>
> --
> Joel de Guzman
> http://www.boost-consulting.com
> http://spirit.sf.net
>
>
> ---- Original Message ----
> From: Andreas Grosam
>
>> Suppose, we have the following simple grammar:
>>
>> rule<> S, A, BC, C;
>>
>> S = A >> BC >> C;
>> A = ch_p('a');
>> BC = ch_p('b') >> !(ch_p('c'));
>> C = ch_p('c');
>>
>> We can see, that "abc" and "abcc" are valid languages.
>>
>> The problem with the grammar is, that it is not LL(1). For Spirit this
>> shouldn't be a problem, though - because it generates LL(inf) parsers.
>> (Please correct me!)
>> However, Spirit fails to parse "abc" :
>> It first processes A and finds an 'a', then it processes BC and finds
>> a 'b',
>> then it needs to make a choice:
>> either 'c' or nothing (epsilon) is valid. It chooses the first
>> alternative
>> and finds a 'c'. At this point BC has been successfully processed.
>> Finally,
>> it attempts to process rule C, however, there is no character anymore
>> and the
>> match fails. At this point, the parser states the string "abc" is not
>> valid
>> -it does not try the second alternative in BC.
>> This is how a LL(1) parser would behave.
>> In order to recognize the string correctly, a LL(2) parser would be
>> required,
>> so that it can look ahead two tokens and defer the decision which
>> alternative
>> to use in BC.
>>
>> So, why does a Spirit parser fail here? Is this the effect of "short-
>> circuiting" rules?
>>
>> If Spirit is actually LL(inf), how does it handle possibly
>> exponential time
>> complexity when parsing?
>>
>> Can we explicitly control look ahead? There is a class look_ahead,
>> what is it
>> for, and how can we use it?
>>
>> Suggestion: a directive controlling local look ahead:
>>
>> rule<> BC = look_ahead_d(2)[ ch_p('b') >> !(ch_p('c')) ];
>> (alternatively: rule<> BC = look_ahead_d<2>()[ /*...*/]; )
>>
>> look_ahead_d(spec) might be overloaded (or template functions, or
>> functors),
>> where spec is either an unsigned integral defining the number of
>> tokens for
>> lookahead or a rule - or whatever is useful for semantic, syntactic
>> and fixed
>> look ahead.
>>
>>
>> Would this be possible? Can we then mix rules with different local
>> look_ahead
>> specifications? (means, rules should not be distinct types, or shall
>> have a
>> common base type, e.g.:
>> S = look_ahead_d<2>()[ A ] >> look_ahead_d(C)[ B ];
>>
>> (Oh, yes, I'm aware of the longest_d and shortest_d directives, which
>> could
>> solve the example case above - but this is not sufficient in many
>> other
>> cases, e.g. where BC = b >> *c; )
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk