Boost logo

Boost :

From: Dave Handley (dave_at_[hidden])
Date: 2004-12-28 06:14:30

Since I get the mailing list in a digest, it looks like I have a lot of
questions to answer. Sorry for the long(er) post, but I'm answering them
all in one post:

Darren Cook wrote:

>There seems some interest currently in alternatives/improvements to Spirit,
>so I thought I'd mention Hapy:

I'll have a look at Hapy and see whether we could interface to that as well.

Joel de Guzman wrote:

>Some more things to note:
>* why use lexeme_d on the lexer stage?
>* the comment rule can be rewritten as:
> '#' >> *(anychar_p - eol_p) >> eol_p;
>* stringLiteral can be rewriten similarly.
>* word can be rewritten using chsets.

I used lexeme_d because the VRML grammar uses whitespace to separate ints
and floats. Without the lexeme_d, ints could merge into one.

The other optimisations are all valid, but unfortunately, don't make any
difference to the stopwatch. This is because comments, stringLiterals and
words don't occur too often in a VRML file - most of the time is spent
parsing floats and ints.

Joel de Guzman wrote:

>Sure, no problem. But watch out, Spirit-2 will have a speed improvement
>in the 10X-20X range over classic Spirit ;-) I already have some test
>cases, and this does not touch on predictive parsing and lexing.

Joel, I would be really keen to try running my VRML test case through
Spirit-2. Given that I've now got the Spirit-1 parser running at about 6-7
seconds, if you get a 10x speed improvement on this file, then it could make
the case for separate lexing much harder to support! I may also need a
bigger file to start doing test cases on - millisecond test timings don't
make for accuracy in my view :-)

Joel de Guzman wrote:

>Anyway, as I said, I'm very interested with your lexer.
>Would you be interested in a collaboration?

Yes, we would be very interested in a collaboration.

Hartmut Kaiser wrote:

>How do you specify production rule's? Your above sample specifies how to
>the recognition part only ;-)
>The numbers are the token ID's, I assume?

The production rules are wrapped up in the different (and extensible)
concept of a token. The token_float for example will match a float, then
store the matched number as a float inside the token. The normal token
stores nothing since something like a keyword match you don't need the
overhead of the original text, but the token_string stores the string that
was matched. I am contemplating an interface that allows you to use a
functor to perform some basic string manipulation before the token is
stored - for example dropping the enclosing quotes or similar. Although
this is clearly similar to functionality already available in Spirit.

Yes the numbers are token IDs - I have contemplated using const char *
template parameters as well, but at present I haven't done that because of
the added difficulty of ensuring that your names have external linkage.
Perhaps in the final version we can support both.

>Is your token type a predefined one or is it possible to use my own token
>types instead?
>I'm asking, because Wave in principle allows to use any token type for its
>work, but it requires the tokens to expose a certain (very small)

In the system as stands, all token types must inherit from a base version.
This base version provides lots of useful virtual functions that derived
token types can work with. The iterator interface then returns the
token_base from its de-reference function. You can define new token types,
but accessing the specific interface on the derived token means doing one

1) Ensuring the interface is exposed in the token_base.
2) Using polymorphic despatch to get the correct type, through a visitor
for example.
3) If only a single token type is used in the lexer then use a
4) Using a dynamic_cast.

You can also define new rule classes - at present we have written a rule
class for conventional tokens, and another rule class for tokens that have a
dynamic (run-time) ID rather than the static (compile-time) ID in the
template parameter.

Dave Handley

Boost list run by bdawes at, gregod at, cpdaniel at, john at