Boost logo

Boost :

From: Joel de Guzman (joel_at_[hidden])
Date: 2004-12-27 19:42:53


Hartmut Kaiser wrote:
>
> Dave Handley wrote:
>
>
>>The grammar for Spirit was (in a slightly cut down form):
>>
>>keyword =
>> str_p( "Group" ) |
>> str_p( "Separator" ) |
>>//etc.;
>>comment =
>> lexeme_d[
>> ch_p( '#' ) >> * ( ~chset_p( "\n\r" ) ) >> chset_p( "\n\r"
>>) ]; stringLiteral = lexeme_d[
>> ch_p( '\"' ) >> * ( ~chset_p( "\"\n\r" ) ) >> chset_p(
>>"\"\n\r" ) ]; word = lexeme_d[
>> ( alpha_p | ch_p( '_' ) ) >>
>> * ( alnum_p | ch_p( '_' ) )
>> ];
>>floatNum =
>> real_p;
>>vrml = *( keyword | comment | stringLiteral | word | floatNum );
>>
>>I've cut down the keywords because there are over 60 of them.
>> I would be interested to know if there was any obvious ways
>>to optimise the Spirit parser.
>
>
> At least you could have used the symbol parser (look here:
> http://www.boost.org/libs/spirit/doc/symbols.html), which is a deterministic
> parser usable especially for keyword matching. I'm pretty sure, that this
> alone would speed up your test case a lot, because your keyword rule from
> above (if it contains 60 alternatives) is a _very_ ineffective way to
> recognise known in advance keywords.

Yes. I'd be very interested to know the results when using the
symbol parser. A 60 alternative rule will definitely be slow!

Some more things to note:
* why use lexeme_d on the lexer stage?
* the comment rule can be rewritten as:
   '#' >> *(anychar_p - eol_p) >> eol_p;
* stringLiteral can be rewriten similarly.
* word can be rewritten using chsets.

One thing for sure is that the grammar is not optimized.

Cheers,

-- 
Joel de Guzman
http://www.boost-consulting.com
http://spirit.sf.net

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk