Boost logo

Boost :

From: Dave Handley (dave_at_[hidden])
Date: 2004-12-27 09:18:23


A colleague and I are working on a fast lexical analyser designed to work
alongside Spirit. We currently have some working prototypes, and as such I
thought that I would gauge interest in this library. We plan to produce a
DFA based lexical analyser that provides output as a set of iterable
polymorphic flyweighted tokens. These could then be provided as input to
Spirit instead of character iterators (albeit with the addition of a token_p
parser in Spirit). The objectives of our project are as follows:

1) As fast as lex/flex.
2) Simple to use
3) Rules to generate tokens can be provided both statically and
dynamically. Static definition would be through an offline pre-process
stage much like lex.
4) Easy to interface with Spirit.

I understand from Joel's posts below (both on this group and on codeproject)
that he is already planning some work to this end. My understanding of
expression template based lexers is that they can outperform DFAs under some
circumstances, but with other situations DFAs can provide the best
performance. I am therefore recommending this in addition to Joel and
Eric's work rather than instead of it.

Finally, as a taster of the work we are doing, I have performed some rather
unscientific and arbitrary performance tests on a 20Mb VRML 1.0 file. The
performance of flex, Spirit, and our new library are as follows:

flex: < 1 second
new library: about 1 second
Spirit: 40-50 seconds.

These results were produced with no semantic actions to slow down any of the
3 solutions. They are also not conclusive, since we are almost an order of
magnitude slower on an Athlon 64 bit machine at present.

Dave Handley

Relevant posts:
Joel de Guzman wrote:

>> I guess if you use string literals you'll have this
>> problem no matter what, though I'm not sure the use of string literals
>> should be a foregone conclusion. If you consider parser generators like
>> YACC, they cleanly separate lexical analysis from parsing and the parser
>> itself is never working with string literals.
>
>Again, true. However, one of the original intent of Spirit is to
>be usable in micro parsing tasks. When parsing small grammars
>like, say, a complex number, you do not want to write a separate
>front end lexer.
>
>Anyway, I do plan to write an optional but separate lexer
>sometime.

Joel de Guzman wrote:

> As for the lexer, I am working with Eric Niebler, author of Greta and
>xpressive (google regular expression c++) with the hopes of developing
>a powerful expresion template based lexer front end.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk