|
Boost : |
From: Joel de Guzman (djowel_at_[hidden])
Date: 2003-09-08 08:38:01
Hi Andreas,
Gee, those are interesting insights! Would you mind
if we discuss this over at Spirit's mailing list?
Spirit-general mailing list
Spirit-general_at_[hidden]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Thanks, and see you there :-)
-- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net ---- Original Message ---- From: Andreas Grosam > Suppose, we have the following simple grammar: > > rule<> S, A, BC, C; > > S = A >> BC >> C; > A = ch_p('a'); > BC = ch_p('b') >> !(ch_p('c')); > C = ch_p('c'); > > We can see, that "abc" and "abcc" are valid languages. > > The problem with the grammar is, that it is not LL(1). For Spirit this > shouldn't be a problem, though - because it generates LL(inf) parsers. > (Please correct me!) > However, Spirit fails to parse "abc" : > It first processes A and finds an 'a', then it processes BC and finds a 'b', > then it needs to make a choice: > either 'c' or nothing (epsilon) is valid. It chooses the first alternative > and finds a 'c'. At this point BC has been successfully processed. Finally, > it attempts to process rule C, however, there is no character anymore and the > match fails. At this point, the parser states the string "abc" is not valid > -it does not try the second alternative in BC. > This is how a LL(1) parser would behave. > In order to recognize the string correctly, a LL(2) parser would be required, > so that it can look ahead two tokens and defer the decision which alternative > to use in BC. > > So, why does a Spirit parser fail here? Is this the effect of "short- > circuiting" rules? > > If Spirit is actually LL(inf), how does it handle possibly exponential time > complexity when parsing? > > Can we explicitly control look ahead? There is a class look_ahead, what is it > for, and how can we use it? > > Suggestion: a directive controlling local look ahead: > > rule<> BC = look_ahead_d(2)[ ch_p('b') >> !(ch_p('c')) ]; > (alternatively: rule<> BC = look_ahead_d<2>()[ /*...*/]; ) > > look_ahead_d(spec) might be overloaded (or template functions, or functors), > where spec is either an unsigned integral defining the number of tokens for > lookahead or a rule - or whatever is useful for semantic, syntactic and fixed > look ahead. > > > Would this be possible? Can we then mix rules with different local look_ahead > specifications? (means, rules should not be distinct types, or shall have a > common base type, e.g.: > S = look_ahead_d<2>()[ A ] >> look_ahead_d(C)[ B ]; > > (Oh, yes, I'm aware of the longest_d and shortest_d directives, which could > solve the example case above - but this is not sufficient in many other > cases, e.g. where BC = b >> *c; )
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk