Boost logo

Boost :

From: Hartmut Kaiser (hartmutkaiser_at_[hidden])
Date: 2003-03-08 02:14:20


Hi all,

Sorry for the somewhat lenghty post, but I hope it will be helpful for
someone of you.

The Boost.Spirit based C++ preprocessor iterator (the project name is
'Wave') is functionally complete now. All pp operators and pp statements
are in place, the macro expansion engine works as expected. So I've
released a first version: Wave V0.9.0 (please consider it a beta).

Conceptually, the Wave library is a conformant (to the C++ Standard)
preprocessing C++ lexer, which exposes an (forward-)iterator interface
for iteration over the preprocessed C++ tokens.

The main goals for this project are:
- full conformance with the C++ standard (INCITS/ISO/IEC 14882/1998)
- usage of Spirit for the parsing parts of the game (certainly :-)
- maximal usage of STL and/or Boost libraries (for compactness and
maintainability)
- straightforward extendability for the implementation of additional
features (as variadics and placemarkers)
- building a flexible library for different C++ lexing and preprocessing
needs.

At the first steps it is not planned to make a very high performance or
very small C++ preprocessor. If you are looking for these objectives you
probably have to look at other places.
Although the C++ preprocessor should work as expected and will be usable
as a reference implementation, for instance for testing of other
preprocessor oriented libraries as Boost.Preprocessor et.al. or for
developing new pp functionalities.
Tests done by Paul Mensonides showed, that the Wave library is very
conformant to the C++ Standard, such that it compiles several strict
conformant modules written by him, which are even not compilable with
EDG based preprocessors (i.e. Comeau or Intel).

The C++ preprocessor is not built as a monolitic application, it's
rather a modular library, which exposes a context object and an iterator
interface. The context object helps to configure the actual pp process
(as search path's, predefined macros, etc.). The exposed iterators are
generated by this context object too. Iterating over the sequence
defined by the two iterators will return the preprocessed tokens, which
are generated on the fly from the underlying input stream.
The overall preprocessing is a two stage process:

     input stream
     (characters)
          |
          v
    +-----------+
    | C++ lexer | (tokenizer)
    +-----------+
          |
          v
      pp tokens
          |
          v
    +-----------+
    |preprocess.| (macro expansion etc.)
    +-----------+
          |
          v
     preprocessed
     C++ tokens

As you can see, the input stream feeds a full C++ lexer module (the
generated C++ tokens here are exposed through an iteration interface
too). This C++ lexer allows the preprocessing module to work on tokens,
not directly on the character stream (performance!), additionally this
helps to resolve language ambiguities such as

   'some_class<include<some_term> >'

or similar (see C++ standard 2.1.1.3), which is difficult to do in a one
step process. During token generation the C++ lexer does physical source
lines splicing into logical source lines (removal of '\\' followed by a
'\n'), trigraph and alternative token recognition etc.

The exposed C++ lexer iteration interface generates the preprocessing
tokens consumed by the preprocessing module, which does the actual work,
the preprocessing :-). After this the resulting tokens are converted to
C++ tokens exposed by the preprocessor interator.

To make the C++ preprocessing library modular, the C++ lexer is held
completely separate and independend from the preprocessor (it is
actually a template parameter). To proof this concept I've implemented
two different full blown C++ lexers (one based on a re2c based C++ lexer
written by Dan Nuffer some time ago [VERY fast], the other based on the
Spirit based Slex dynamic lexing engine - a table driven DFA [quite
compact]). Both lexers are plugable into the preprocessor through a
unified iterator interface and are completely interchangeable.

BTW the C++ lexers are usable standalone, without using the
preprocessing part of the library. It would be very interesting to see,
how the other existing and ongoing C++ lexers (see the Spirit examples)
fit into the picture. So the user of the final library will be able to
decide, which C++ lexer fits best his/her needs.

There a couple of things left by now:
- report the concatination of unrelated tokens as an error
- write a more complete documentation (for now please see the samples)
- test the Wave pp iterator more thoroughly

There is already some documentation in place, which you may use as a
starting point. If this isn't enough, there is a sample driver program
for the Wave library (source: cpp.cpp etc.), which fully utilizes the
capabilities of the library, so you may look at the source for further
information (for now).

You can find the Wave library in the Spirit CVS
(cvs.spirit.sourceforge.net:/cvsroot/spirit): 'spirit/wave'.
Additionally there is a zip file, that can be downloaded here:
http://sourceforge.net/projects/spirit/

There will be eventually separate releases of binary packages, built for
different platforms.

Please note, that to build the enclosed sample driver (essentially a
full blown text stream --> text stream preprocessor) you will need to
have a correctly installed Boost distribution in place, because there
are used several different Boost libraries (as Boost.Filesystem,
Boost.inreview.program_options etc.)

It is planned to bundle the Wave library later on with a strict version
of the pp-lib from Paul Mensonides (Boost.Preprocessor) and put it into
the Boost CVS.

The Wave library compiles and works so far with
- VC7.1 (final beta)
- gcc 3.2 (Cygwin and linux)
- IntelV7/DinkumwareSTL (from VC6sp5)
(other compilers were not tested by now).

Last but not least I want to thank Paul Mensonides for his invalueable
comments, thourough testing and very helpful tips, which made it
possible for me to write the Wave library in such a quite short amount
of time.

Regards Hartmut


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk