Boost logo

Boost :

From: Vesa Karvonen (vesa_karvonen_at_[hidden])
Date: 2003-01-15 17:23:21


Hi Paul,

Paul Mensonides:
> > TOKEN_SEQ_TO_SEQ(FIRST, REST, IS_LAST, B o o s t SPACE 1 3 1)
> > ==> (B)(o)(o)(s)(t)(SPACE)(1)(3)(1)
>
>I already have something almost exactly like this in the high-precision
>arithmetic that haven't committed yet. Of course, you'd have to have a
>macro for every letter of the alphabet, every number, and every
>"miscellaneous" thing like "SPACE". Also, it cannot be made to work on
>any operators, so you'd need special names for them too.

That would be true. In many interesting cases you would need more than one
macro per token:
- TOKEN_TO_CHAR_LIT(token) for making characters.
- TOKEN_TO_STR_LIT(token) for composing strings.
- TOKEN_TO_NUMBER(token) for comparing tokens (avoids having to duplicate
functionality for tokens).
- etc...
I have been thinking about the possibility of using such a syntax to allow a
kind of string manipulation using the preprocessor. The preprocessor simply
does not have the means to destructure ordinary identifiers, numeric
literals, strings, etc... So, the best thing that can be done is to pass
them unstructured. It is not ideal, of course, but I think that it could be
more than usable. Consider the following grammar production representations:

In Yacc/Bison:

  Expr
    : Term Add Expr { return $1 + $3; }
    | Term Sub Expr { return $1 - $3; }
    ;

Using preprocessor, we could use:

  ((E x p r)
    ((T e r m) (A d d) (E x p r), { return _1 + _3; } )
    ((T e r m) (S u b) (E x p r), { return _1 - _3; } ))

Now, the reason why we can't just write `Expr' instead of `(E x p r)' is
that the preprocessor has no way of comparing arbitrary tokens like `Expr'.
However, it is not impossible to compare token sequences like `(E x p r)'.
If one is willing to spend a few macros, it is possible to get an even nicer
syntax:

  // ... sweetener macros earlier in the same .cpp file ...
  #define EXPR (E x p r)
  #define TERM (T e r m)
  #define ADD (A d d)
  #define SUB (S u b)

  // ... later in the grammar ...
  (EXPR
    (TERM ADD EXPR, { return _1 + _3; } )
    (TERM SUB EXPR, { return _1 - _3; } ))

The same token sequence technique could be used for many interesting kind of
code generators that need to manipulate symbolic information.

hmm... I think that one of the next things that one would need for lexer and
parser generators would be to have set and map data structures. Perhaps I'll
implement a functional red-black tree or AVL tree using the preprocessor.
Well, perhaps in a few weeks I'll have the time.

>How about this instead:
[...]

That looks *very* nice, and is probably also very fast!

Too bad that the same technique does not work if the elements are not
parenthesized.

-Vesa Karvonen

_________________________________________________________________
MSN 8: advanced junk mail protection and 2 months FREE*.
http://join.msn.com/?page=features/junkmail


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk