Boost logo

Boost :

From: Dave Handley (dave_at_[hidden])
Date: 2004-12-27 13:17:07

Hartmut Kaiser wrote:

>I'm definitely interested to have a look at your library. Besides my
>interest in Spirit I'd like to try it out as an alternative lexing
>for Wave. I'm pretty sure this should is interesting for you as well,
>because there are implemented already two different lexers, which gives a
>good opportunity to compare them in a real environment.

I have to confess to not knowing much about Wave, but I would be willing to
look in more detail at using this library for Wave. Once we have our first
stable version of the software we will be happy to let you have a look at it
in more detail. I expect this to happen within the next month or so.

>As for a dynamic DFA based lexer Wave already uses the Spirit based SLEX,
>but a static DFA based solution is very interesting to look at. Is the DFA
>generated at compile time?

By static and dynamic, I am meaning compile-time and run-time. The system
is designed to generate the DFA at run-time. But we are discussing a method
at the moment whereby the same code base could be used to generate
code-stubs for compile-time DFA creation. I am keen to support both.

>We definitely should try the new upcoming Spirit-2 code base as well, since
>it should be a lot faster then the current version. Is it possible to have
>look at your test code as well? This way we could try to make a comparision
>as soon as the Spirit-2 codebase evolves.

I would be quite keen to see how Spirit-2 performs on similar tests. If the
interest is there, then we will quite happily post the code into the boost
yahoo group once we have completed a bit of tidying up in the New Year. We
need to put some effort into writing more detailed test cases. At present,
we are only directly comparing lexical analysis, and have not looked at the
performance of the interface in real detail. We have a desire to properly
test the system with a complete parse. To do this I think there are a
number of useful test cases:

1) Flex and bison/lex and yacc.
2) Spirit - without any assistance from any lexer.
3) Spirit 2 once available.
4) Our library (called lextl at present) with yacc/bison.
5) Lextl with Spirit.
6) Lextl with Spirit 2.

Case 1 would be the control - and the target performance for other cases to
achieve. I am confident that case 4 should achieve the same speed, also I
think there is a good likelihood of cases 5 and 6 achieving the same speed.
The key question is whether 5 outperforms 2 and 6 outperforms 3. If this is
the case, then I think we have a viable library. My current test file which
is a 20Mb VRML1.0 file reduces to about 2e6 tokens when whitespace is
stripped and the file is lexed. By using flyweighted tokens, we are hoping
to reduce the overhead for Spirit to parse tokens instead of characters to a
minimum, so the reduction from 2e7 to 2e6 input entities should give an
order of magnitude speed increase to Spirit. At least that is my hope :-)

Dave Handley

Boost list run by bdawes at, gregod at, cpdaniel at, john at