Boost logo

Boost :

From: Joel de Guzman (djowel_at_[hidden])
Date: 2003-09-08 08:38:01

Hi Andreas,

Gee, those are interesting insights! Would you mind
if we discuss this over at Spirit's mailing list?

Spirit-general mailing list

Thanks, and see you there :-)

Joel de Guzman
---- Original Message ----
From: Andreas Grosam
> Suppose, we have the following simple grammar:
> rule<> S, A, BC, C;
> S = A >> BC >> C;
> A = ch_p('a');
> BC = ch_p('b') >> !(ch_p('c'));
> C = ch_p('c');
> We can see, that "abc" and "abcc" are valid languages.
> The problem with the grammar is, that it is not LL(1). For Spirit this
> shouldn't be a problem, though - because it generates LL(inf) parsers.
> (Please correct me!)  
> However, Spirit fails to parse "abc" :
> It first processes A and finds an 'a', then it processes BC and finds a 'b',
> then it needs to make a choice: 
> either 'c' or nothing (epsilon) is valid. It chooses the first alternative
> and finds a 'c'. At this point BC has been successfully processed. Finally,
> it attempts to process rule C, however, there is no character anymore and the
> match fails. At this point, the parser states the string "abc" is not valid
> -it does not try the second alternative in BC.    
> This is how a LL(1) parser would behave.
> In order to recognize the string correctly, a LL(2) parser would be required,
> so that it can look ahead two tokens and defer the decision which alternative
> to use in BC.  
> So, why does a Spirit parser fail here? Is this the effect of "short-
> circuiting" rules? 
> If Spirit is actually LL(inf), how does it handle possibly exponential time
> complexity when parsing? 
> Can we explicitly control look ahead? There is a class look_ahead, what is it
> for, and how can we use it? 
> Suggestion: a directive controlling local look ahead:
> rule<> BC = look_ahead_d(2)[ ch_p('b') >> !(ch_p('c')) ];
> (alternatively: rule<> BC = look_ahead_d<2>()[ /*...*/]; )
> look_ahead_d(spec) might be overloaded (or template functions, or functors),
> where spec is either an unsigned integral defining the number of tokens for
> lookahead or a rule - or whatever is useful for semantic, syntactic and fixed
> look ahead.   
> Would this be possible? Can we then mix rules with different local look_ahead
> specifications? (means, rules should not be distinct types, or shall have a
> common base type, e.g.:  
> S = look_ahead_d<2>()[ A ] >> look_ahead_d(C)[ B ];
> (Oh, yes, I'm aware of the longest_d and shortest_d directives, which could
> solve the example case above - but this is not sufficient in many other
> cases, e.g. where BC = b >> *c; )  

Boost list run by bdawes at, gregod at, cpdaniel at, john at