Boost logo

Boost :

From: Daryle Walker (darylew_at_[hidden])
Date: 2005-08-19 08:02:13


Wave is our C++ preprocessor, but preprocessing is the third phase of
translating a file. (Looking at section 2.1 in the standard). I have a gut
feeling that all the compilers out there mush the first three phases
together in parsing a file. Glancing over the Wave docs gives me the same
impression about it. Are either one of these feelings accurate (this
requires a separate answer for each parser)? If the answer for Wave is
"yes", could we separate them, at least as an option? I feel that this is
important so we can gain full understanding of each phases. It may be more
complicated[1], and most likely slower, but it could represent a clean
implementation. (BTW, what phases does Wave act like?)

The first two[2] phases are:

1. Native characters that match basic source characters are converted as so
(including line breaks). Trigraphs are expanded to basic source[3]. Other
characters are turned into internal Unicode expansions (i.e. act like
"\uXXXX" or \Uxxxxxxxx"[4]).
2. The backslash-newline soft line-break combination are collapsed, folding
multiple native lines into one logical line. We should spit out an error if
the folding creates Unicode escapes. For non-empty files, we need to spit
out errors if the last line is not a hard line-break, either a non-newline
character or a backslash-newline combination is forbidden.

[1] Our "Wave-1" would convert the original text (iterators) into phase-1
tokens. Our "Wave-2" would convert phase-1 token (iterators) into phase-2
tokens, etc. Remember that any file-name and line/column positions will
have to be passed through each phase.
[2] I thought Wave just did phase-3, with phases 1 and 2 thrown in at the
same time. But now I'm not sure which phase Wave stops at. I don't think
it can go past phase-4, because doing phase-5 needs knowledge of the
destination platform.
[3] Only '?' characters that are part of a valid trigraph sequence are
converted; all others are left unchanged.
[4] But actual "\uXXXX" resolution doesn't happen until phase 5!

-- 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT hotmail DOT com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk