Suppose, we have the following simple grammar:
Courier
rule<<> S, A, BC, C;
S = A >> BC >> C;
A = ch_p('a');
BC = ch_p('b') >> !(ch_p('c'));
C = ch_p('c');
We can see, that "abc" and "abcc" are valid languages.
The problem with the grammar is, that it is not LL(1). For Spirit this
shouldn't be a problem, though - because it generates LL(inf) parsers.
(Please correct me!)
However, Spirit fails to parse "abc" :
It first processes A and finds an 'a', then it processes BC and finds
a 'b', then it needs to make a choice:
either 'c' or nothing (epsilon) is valid. It chooses the first
alternative and finds a 'c'. At this point BC has been successfully
processed. Finally, it attempts to process rule C, however, there is
no character anymore and the match fails. At this point, the parser
states the string "abc" is not valid -it does not try the second
alternative in BC.
This is how a LL(1) parser would behave.
In order to recognize the string correctly, a LL(2) parser would be
required, so that it can look ahead two tokens and defer the decision
which alternative to use in BC.
So, why does a Spirit parser fail here? Is this the effect of "short-
circuiting" rules?
If Spirit is actually LL(inf), how does it handle possibly exponential
time complexity when parsing?
Can we explicitly control look ahead? There is a class look_ahead,
what is it for, and how can we use it?
Suggestion: a directive controlling local look ahead:
rule<<> BC = look_ahead_d(2)[
Courierch_p('b') >> !(ch_p('c'))
];
(alternatively: rule<<> BC = look_ahead_d<<2>()[
Courier/*...*/]; )
look_ahead_d(spec) might be overloaded (or template functions, or
functors), where spec is either an unsigned integral defining the
number of tokens for lookahead or a rule - or whatever is useful for
semantic, syntactic and fixed look ahead.
Would this be possible? Can we then mix rules with different local
look_ahead specifications? (means, rules should not be distinct types,
or shall have a common base type, e.g.:
CourierS = look_ahead_d<<2>()[ A ] >>
look_ahead_d(C)[ B ];
(Oh, yes, I'm aware of the longest_d and shortest_d directives, which
could solve the example case above - but this is not sufficient in
many other cases, e.g. where BC = b >> *c; )
Thanks for help.
Andreas Grosam