Boost logo

Boost :

From: Hartmut Kaiser (hartmut.kaiser_at_[hidden])
Date: 2005-12-12 10:22:25

Daryle Walker wrote:

> Looking at
> <>, I see
> various kinds of tokens. There are tokens for the
> preprocessor that are seen by the lexer and don't make it to
> the preprocessing iterator level.
> The other sets of tokens do make it to that level, modulo any
> transformations. The trigraphs are put in the operator token list.
> However, trigraphs should not be there.

The trigraph token types are generally in the same token set as the
corresponding tokens they represent. And yes these are in the operator
tokenset, this is because of section 2.12 [lex.operators] of the Standard.

> They are processed
> before anything else, even before the preprocessor tokens.
> So there should be another level of lexer working here. As
> is, it doesn't seem that you could use the trigraph for "#",
> "??=", for preprocessor directives.
> ??=include <cstdio> // this should work

Wave correctly interprets this.

As I pointed out already, Wave currently doesn't strictly follow the
mandated translation phases. The trigrah tokens are processed on the lexer
level, i.e. before anything else. Wave has a runtime option to convert the
trigraphs token values to their equivalent token representation (i.e. '??='
to '#'), but always leaves the trigraph token id in place. Please let me

1. The trigraph token id's are essentially equivalent to their corresponding
non-trigraph token id's modulo a single bit, i.e. You're able to get at the
'real' token id by using the BASEID_FROM_TOKEN(t) macro. Because of this
Wave correctly interprets the ??=include directive.

2. The token _values_ are converted only optionally to their non-trigraph
representation to allow the library user to access the original token value
(which may be useful in some contexts). I must admit, though, that the
current default -namely not to convert the values - is a bug and should be
fixed (I'll do that asap).

> On a related note, I thought maybe Wave should use a
> generator interface:
> template < typename Iterator, typename FileID >
> class phase1
> {
> public:
> phase1( Iterator b, Iterator e, FileID id );
> operator bool() const; // TRUE while not done
> cpp_p1_char_type operator ()();
> };
> template < typename Iterator, typename FileID >
> class phase2
> {
> public:
> explicit phase2( phase1<Iterator, FileID> const &p );
> operator bool() const; // TRUE while not done
> cpp_p2_line_string_type operator ()();
> };
> //...
> You generally can't rewind, of course. The cpp_p1_char_type
> would contain the expanded character's identity AND some
> indicator of its location (starting iterator, file ID, and
> line, row, and un-lined offset numbers).
> The cpp_p2_line_string_type would carry the locations for
> each character in its string. Then the tokens of later
> phases would know the location of their first characters.

Yes, I agree. Wave should be rewritten (and hopefully will be rewritten) in
a layered way cleanly implementing every of the mandated translation phases
each on top of the previous. But I'm inclined to expose iterator interfaces,
not generators - to allow for rewinds, error handling etc. Such a layered
implementation would be generally useful for tool builders allowing to use
every of the translation steps separately, if necessary.

But that's for V2 and not done in a week or so.

Regards Hartmut

Boost list run by bdawes at, gregod at, cpdaniel at, john at