Boost logo

Boost :

From: Dave Handley (dave.handley_at_[hidden])
Date: 2006-11-21 05:31:24


Hartmut Kaiser wrote:

>
> Joel de Guzman wrote:
>
> > >> The main selling point of lexertl as far as boost goes is
> > probably speed.
> > >> lexertl lexers are both fast to construct (typically under 100
> > >> milliseconds on a modern machine) and fast to tokenise input (the
> > >> generated state machine uses the flex technique of
> > equivalence classes).

When I did some performance testing for Ben with an early version quite
a while back, my appraisal was that the performance was very favourable
compared to other lexers out in the wild at the time - flex was included
in the comparison list.

> > >>
> > >> I personally like the fact that you can dump the DFA as data and
> > >> process it later with any language you like too.
> > >
> > > Spirit is a parser, please don't compare apples and oranges.
> > > You cannot implement, say, Wave, with just a lexer. I suggest you
> > > don't go the "this-is-better-that-is-better" route. Spirit
> > has its own
> > > lexer too, FYI. It's called SLex. See Slex in
> > http://tinyurl.com/29mcn
> >
> > Oh and BTW, if you want to talk about speed, matching the
> > speed of Flex is not good enough. The thing to beat is Re2C!
> > Hartmut and I shall see how lexertl fares soon. Wave has an
> > adaptable front end where you can choose your own lexer. Re2C
> > (http://re2c.org/) is one of them.

Unfortunately, I never tested it head to head against re2c.

>
> Yes, and Slex is the other one. BTW, Slex seems to be very similar to
> lexertl (with the exeption of not allowing to optimize the constructed
> state
> machine tables - but this is not a principal issue, merely lack of
time).
>
> Wave is based on a modular (layered) design. The lexer sits on top of
the
> input (character) stream, producing C++ tokens, exposed via an
iterator
> interface. The preprocessing component consumes the lexer iterators
and is
> almost completely independent. The only dependency is that both have
to
> use
> the same token type (which is a template parameter to both).
>
> It is very easy to interface a different lexer to the preprocessor.
The
> cpp_tokens example (libs/wave/samples/cpp_tokens) demonstrates this by
> using
> the Slex based lexer. I'll elaborate on this in another mail during
the
> next
> couple of days.
>

Personally, I would like to see something like lexertl available as a
lexer front-end to spirit. Ben and I have discussed putting a
performant iterator style interface onto the front end of his code so
that you could simply use this as the iterator type to spirit - or wave,
or anything else for that matter. I love the way spirit works, but have
always been limited in using it in the real world for most things
because of performance reasons - I know Joel and team are working on
spirit 2 which should address many of these performance issues, but it
is my firm belief that a fast lexer tagged to the front of spirit or
spirit2 would make a huge difference to the performance of generated
parsers.

Also, the fact that it is very fast to build the DFA in Ben's code is
interesting. This allows more potential for building dynamic parsers
which could be very useful in a whole range of potential apps.

Dave Handley


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk