Boost logo

Boost :

From: Tobias Schwinger (tschwinger_at_[hidden])
Date: 2006-02-14 20:13:31


Paul Mensonides wrote:
>
> I'm trying very hard not be harsh, but what you suggest has far-reaching
> implications that you don't understand.

There should be some sort of disclaimer in the documentation of Boost.PP that mentions that the design is most unfortunately influenced by the bad quality of some widely used preprocessors.
Otherwise there will be false conclusions that pp-metaprogramming is either ugly by definition or that there is something wrong with the design of the library...

Here are my comments on the text. Note that I use imperative for the sake of simplicity.

> The process of macro expansion is best viewed as the process of =canning
> for macro expansion (rather than the process of a single macro =xpansion
> alone). When the preprocessor encounters a sequence of preprocessing tokens
> =nd whitespace separations that needs to be scanned for macro expansion, it has to perform a number of steps. These steps are examined in this
> document in detail.

Strike that paragraph. It uses terms not yet defined and doesn't say much more than the title (assuming it's still "how macro expansion works").

>
> Conventions
>
> Several conventions are used in this document for the sake of =implicity.
> First, except in contexts where the distinction matters, this =ocument
> uses the terms "token" and "preprocessing token" =nterchangeably. Where
> the distinction does matter, this document will explicitly =ifferentiate
> between them by using the terms "regular token" and =preprocessing token."
> From a technical standpoint, the primary difference between the two =oken
> types relates to permissiveness. Preprocessing tokens are much more
> permissive than regular tokens. Ignoring classifications (e.g. keyword,

How can tokens be permissive? Their definiton is probably more permissive (if you can say that).

> identifier), all regular =okens are trivially converted to preprocessing
> tokens. Not all preprocessing tokens can be converted to regular tokens. For
> example, 0.0.0.0 is a single, valid preprocessing =oken, but it cannot be
> converted to a regular token. Thus, preprocessing tokens can be thought of
> as a superset of regular =okens.
>
> Second, when this document uses the phrase "sequence of tokens" (i.e. "sequence of preprocessing tokens"), the possibility of whitespace separations is implied. In other words, "sequence of tokens" really means
> "sequence of tokens =nd whitespace separations." Both the C and C++
> standards do this as well, though they don't =xplicitly say they are doing
> it. Rather, it is implied by the phases of translation and by certain parts of the text such as §16.3.2/2 of the C++ standard and §6.10.3.2/2 of the C standard.
>
> Third, this document uses the term "scanning" to mean "scanning for =acro
> expansion." There are other kinds of scanning (in the general sense) that
> are =performed during the macro expansion process. For example, a
> replacement list is scanned for instances of formal =arameters to
> substitute. Other kinds of scanning besides scanning for macro expansion are
> =eferred to explicitly.
>
> Lastly, this document uses a graphical notation to illustrate various parts of the process in examples. For clarity, all whitespace separations
> are ignored in these =iagrams. Given a sequence of tokens, the current
> location of the scanner is =ndicated with a circumflex (^). For example,
>
> + + +
> ^
>
> A range of tokens is indicated by using a frame such as:
>
> + + +
> |^ |
> |_____|
> |
> LABEL
>
> These frames are labeled with the meaning of the frame or the =equence of
> tokens bounded by the frame. An identifier token that has been painted is

The reader probably has no idea what "painted" means at this point. Indicate the forward-declaration by "see below" or something like that.

> marked with an =postrophe (').
>
> A' token
> ^
>
> This document will discuss what it means for an identifier token to =e
> "painted." It also doesn't use character literals ('a') to avoid ambiguity. It is important to note that the apostrophe is not literally
> added to =he sequence of tokens. Rather, it is only a notation that
> indicates a property of the =dentifier to which it is "attached."
> _________________________________________________________________
>
> Locations
>
> There are several points where the preprocessor must scan a sequence =f
> tokens looking for macro invocations to expand. The most obvious of these is
> between preprocessing directives =subject to conditional compilation). For
> example,

I had to read this sentence multiple times for me to make sense...

VVVVV

>
> #include "file.h"
>
> // ...
>
> #define A()
>
> // ...
>
> #undef A
>
> Both of the sections containing comments must be scanned for macro expansion.
>
> The preprocessor must also scan a few other sequences of tokens. These
> include the expression that controls an #if or =elif directive. Subject to
> certain conditions, the operands of the =include directive and the #line
> directive are =lso scanned for macro expansion.
>
> The places where a sequence of tokens is scanned for macro expansion =n a
> source file are not really the subject of this document. As such, the
> remainder of this document concentrates on just =equences of tokens that
> must be scanned, though sometimes the =define directive is used.

^^^^^ simplify this section by using negative logic (in other words: enumerate the contexts where /no/ macro expansion is done).

>
> Before continuing, however, note that an argument to a macro that =ontains
> what could be a preprocessing directive results in undefined =ehavior. For
> example,
>
> #define MACRO(x) x
>
> MACRO(
> #include "file.h"
> )

Indicate more clearly that this code is not OK.

>
> This is undefined behavior because it gives a preprocessor =mplementation
> latitude. Specifically, it allows for a preprocessor to parse an entire file
> =nto directives and sequences of tokens between directives prior to semantic evaluation. It also allows for a preprocessor to semantically
> evaluate the result =f preprocessing in a single pass.

Put the implementation details in parentheses or use a footnote.

> _________________________________________________________________
>
> Scanning
>
> Consider the following macro definitions and the subsequent sequence =f
> tokens to be scanned:
>
> #define OBJECT OBJ ## ECT F()
> #define F() G(G(G))
> #define G(x) x
>
> + X OBJECT +
>
> These definitions and the subsequent sequence of tokens will be used =s a
> running example in describing the steps of the expansion process.
>
> Given a sequence of tokens to be scanned, the preprocessor =uccessively
> looks at each token looking for identifiers. If a given token is not an
> identifier, the preprocessor outputs the =oken and moves on to the next.
>
> + X OBJECT +
> ^
>
> + X OBJECT +
> ^
>

VVVVV
> The term "output" is being used generally. Sometimes it means that the
> preprocessor is finished with the token =nd that the token can be handed
> off to the underlying language parser =r output to a text stream (in the
> case of a compiler's preprocess-only =ode). Other times, it means that it
> is output to a buffer that is later =eused by the preprocessor for another
> purpose. In all cases, however, it does mean that this scan for macro expansion is finished with the token and that the token results from = this
> scan.
^^^^^ shouldn't this be part of the "Conventions" section?

>
> If the token is an identifier, the preprocessor must check =o see if the
> identifier corresponds to the name of a macro that is =urrently defined.
> If it is not, the preprocessor outputs the token and moves on to the =ext.
> In the running example, the current token is the identifier =, which does
> not correspond to a macro name.
>
> + X OBJECT +
> ^
>
> + X OBJECT +
> ^
> _________________________________________________________________

>
> "Blue Paint"
>
> If the current token is an identifier that refers to a macro, the preprocessor must check to see if the token is painted. If it is painted, it
> outputs the token and moves on to the next.
>
> When an identifier token is painted, it means that the preprocessor =ill
> not attempt to expand it as a macro (which is why it outputs it and =oves
> on). In other words, the token itself is flagged as disabled, and it behaves like an identifier that does not corresponds to a macro. This
> disabled flag is commonly referred to as "blue paint," and if =he disabled
> flag is set on a particular token, that token is called =painted." (The
> means by which an identifier token can become painted is =escribed below.)

Remove redundancy in the two paragraphs above.

I like the "behaves like an identifier that does not correspond to a macro name"-part.

>
> In the running example, the current token is the identifier =BJECT, which
> does correspond to a macro name. It is not painted, however, so the
> preprocessor moves on to the next =tep.
> _________________________________________________________________
>
> Disabling Contexts
>
> If the current token is an identifier token that corresponds to a =acro
> name, and the token is not painted, the preprocessor must =heck to see if
> a disabling context that corresponds to the macro =eferred to by the
> identifier is active. If a corresponding disabling context is active, the
> preprocessor =aints the identifier token, outputs it, and moves on to the
> next token.
>
> A "disabling context" corresponds to a specific macro and exists over =
> range of tokens during a single scan. If an identifier that refers to a
> macro is found inside a disabling =ontext that corresponds to the same
> macro, it is painted.
>
> Disabling contexts apply to macros themselves over a given geographic sequence of tokens, while blue paint applies to particular identifier tokens. The former causes the latter, and the latter is what prevents "recursion" in macro expansion. (The means by which a disabling cotnext
> comes into existence is =iscussed below.)
>
> In the running example, the current token is still the identifier =BJECT.
> It is not painted, and there is no active disabling context that =ould
> cause it to be painted. Therefore, the preprocessor moves on to the next
> step.
>

The introductions of these terms feels structurally too aprupt to me. Introduce these terms along the way, continuing with the example.

______________________________________________________________
>
> Object-like Macro Invocations
>
> If an identifier token that corresponds to a macro name is not =ainted,
> and there is not active disabling context that would cause it =o be
> painted, the preprocessor must check to see what type of macro is =eing
> referenced--object-like or function-like. If the macro is defined as an
> object-like macro, which is the case =BJECT in the running example, the
> identifier alone forms =n invocation of the macro, and the preprocessor
> begins the replacement =rocess of that invocation.
>
> + X OBJECT +
> |^ |
> |______|
> |
> OBJECT invocation (INV)
>
> For an object-like macro, the first step in the replacement process =s the
> retrieval of the replacement list of the macro from the symbol =able. In
> the running example, the OBJECT macro's replacement =ist is
>
> OBJ ## ECT F()
>
> The preprocessor then performs any token-pasting operations in the replacement list. (Note that there is no stringizing operator in object-like
> macros.) The result of token-pasting in OBJECT's replacement list =s
>
> OBJECT F()
>
> After all token-pasting has been performed, the resulting sequence of tokens is used to replace the invocation of the macro. At the same time, a
> disabling context that corresponds to the macro =eing replaced is
> activated. This disabling context surrounds the tokens that came from the replacement list.
>
> + X OBJECT F() +
> | |
> |__________|
> |
> OBJECT disabling context (DC)

<-- explain what a disabling context and then what blue paint is is here

>
> Finally, scanning resumes beginning with the first token that came =rom
> the replacement list (or the next token after the invocation if the replacement list was empty). In the running example, scanning resumes at
> OBJECT.
>
> + X OBJECT F() +
> |^ |
> |__________|
> |
> OBJECT DC
>
> Note that this OBJECT identifier is a different =dentifier token than the
> one that formed an invocation of the =BJECT macro. It came from the
> replacement list of the OBJECT macro =indirectly by way of token-pasting).

<snip>

>
> Nullary Invocations
>
> If scanning finds that an identifier (followed by a left-parenthesis) =hat
> refers to a nullary function-like macro (as is the case with = in the
> running example) the preprocessor must find the =orresponding
> right-parenthesis. The preprocessor must find the right-parenthesis. If it
> cannot (e.g. it finds EOF instead), the result is an =incomplete
> invocation" error.
>
> While it does so, it must ensure that there are no tokens between the left-parenthesis and the right-parenthesis. Specifically, between the two
> parentheses, there must be either one =r more whitespace separations or
> nothing at all. If any tokens are present between them, the result is a "too
> many =rguments" error.
Try to remove some detail.

>
> The identifier and the left- and right-parentheses form an invocation =f
> the macro, and the preprocessor begins the replacement process of =hat
> invocation.
>
> + X OBJECT' F() +
> | |^ ||
> | |___||
> | | |
> | F INV
> |____________|
> |
> OBJECT DC
>
> As with an object-like macro, the first step in the replacement =rocess is
> the retrieval of the replacement list of the macro from the =ymbol table.
> In the running example, the F macro's replacement list =s
>
> G(G(G))
>
> The preprocessor then performs any token-pasting operations in the replacement list. (Note that a nullary macro is a function-like macro, so
> the =tringizing operator exists. However, the stringizing operator must be
> applied to an instance of a =ormal parameter in the replacement list. A

<-- add missing "nullary"

> function-like macro has no formal parameters, and therefore any use =f the
> stringizing operator is automatically an error.) The result of token-pasting

It's not clear to me why the stringizing operator leads to an error rather than a '#' character. Probably too much of a sidenote, anyway.

> in F's replacement list is
>
> G(G(G))
>
> In this case, there are no token-pasting operations to perform, so =he
> result is a no-op.
>
> The sequence of tokens resulting from token-pasting is used to =eplace the
> invocation of the macro. At the same time, a disabling context that
> corresponds to the macro =eing replaced is activated. This disabling
> context surrounds the tokens that came from the =eplacement list.
>
> + X OBJECT' G(G(G)) +
> | | ||
> | |_______||
> | | |
> | F DC |
> |________________|
> |
> OBJECT DC
>
> Finally, scanning resumes beginning with the first token that came =rom
> the replacement list (or the next token after the invocation if the replacement list was empty).
>
> + X OBJECT' G(G(G)) +
> | |^ ||
> | |_______||
> | | |
> | F DC |
> |________________|
> |
> OBJECT DC
>
> Nullary function-like macro invocations are nearly identical to object-like macro invocations. The primary difference is that an invocation
> of a function-like macro =equires multiple tokens.
> _________________________________________________________________
>
> Interleaved Invocations
>
> It is important to note that disabling contexts only exist during a =ingle
> scan. Moreover, when scanning passes the end of a disabling context, that disabling context no longer exists. In other words, the output of a scan
> results only in tokens and =hitespace separations. Some of those tokens
> might be painted (and they remain painted), but =isabling contexts are not
> part of the result of scanning. (If they were, there would be no need for
> blue paint.)

Misses (at least) a reference to 16.3.4-1 (the wording "with the remaining tokens of the source" (or so) is quite nice there, so consider using something similar).

I believe I wouldn't really understand what you are talking about here without knowing that part of the standard. "A single scan" -- the concept of rescanning was introduced too periphicially to make much sense to someone unfamiliar with the topic.

>
> In the diagrams used in this document, the tokens that have been =utput by
> a scan are left in the diagram to provide context for the =eader, such as:
>

<snip>

>
> Note that interleaved invocations do not allow for infinite =xpansion.
> More tokens must physically be present after the replacement list to complete an interleaved invocation, and this sequence of tokens is ultimately limited to the finite sequence of tokens contained in the source file.
> _________________________________________________________________
>

The above part seems very good to me.

> Non-nullary Invocations
>
> If scanning finds an identifier (followed by a left-parenthesis) that refers to a non-nullary function-like macro (as is the case with = G in the
> running example) the preprocessor must find the corresponding
> right-parenthesis. The preprocessor must find the right-parenthesis. If it
> cannot (e.g. it finds EOF instead), the result is an =incomplete
> invocation" error.
>
> While it does so, it must delineate the actual arguments between the =eft-
> and right-parentheses. This delineation process is a distinct step that
> separates each =argument into a separate argument before any other
> processing of the =rguments is done. Each argument is separated by a comma
> token (,), but =ommas between matched pairs of left- and right-parentheses
> do not =eparate arguments, and the right-parenthesis of such matched pairs
> is =ot used as the right-parenthesis that terminates the argument list.
> For example, in
>
> MACRO(a, (b, c), d)
>
> the argument list to MACRO is delineated into three =rguments.
>
> After the arguments have been delineated, the preprocessor compares =he
> number of actual arguments to the arity of the macro. If there are more or
> less, than the result is a "too many arguments" =r "too few arguments"
> error. (If the macro is variadic, the number of arguments must be at least one greater than the number of named formal parameters. This implies that a
> macro defined as:
>
> #define V(x, y, ...) // ...

Mention that variadic macros are not a C++ feature.

>
> has a minimum arity of three--not two.)
>
> In C++, if any argument is empty or contains only whitespace =eparations,
> the behavior is undefined. In C, an empty argument is allowed, but gets
> special treatment. (That special treatment is described below.)

It requires at least C99, right? If so, say it (it's likely there are C compilers that don't support that version of the language).

<snip>

>
> + X OBJECT' G(G(G)) +
> | ||^ |||
> | ||_______|||
> | | | ||
> | | G INV |
> | |_________||
> | | |
> | F DC |
> |__________________|
> |
> OBJECT DC
>
> // recursive scan of argument #1 to G:
>
> G(G)
> |^ |
> |____|
> |
> G INV
> _ _ ______ _ _
> |
> F DC
> _ _ ______ _ _
> |
> OBJECT DC
>
> // recursive scan of argument #1 to G:
>
> G
> ^
> _ _ _ _ _
> |
> F DC
> _ _ _ _ _
> |
> OBJECT DC
>
> G
> ^
> _ _ _ _ _
> |
> F DC
> _ _ _ _ _
> |
> OBJECT DC
>
> // recursive scan results in: G
>
> G
> |^|
> |_|
> |
> G DC
> _ _ ___ _ _
> |
> F DC
> _ _ ___ _ _
> |
> OBJECT DC
>
> G'
> ^
> _ _ ___ _ _
> |
> F DC
> _ _ ___ _ _
> |
> OBJECT DC
>
> // recursive scan results in: G'
>
> + X OBJECT' G' +
> | ||^ |||
> | ||__|||
> | | | ||
> | | G DC
> | |____||
> | | |
> | F DC
> |_____________|
> |
> OBJECT DC
>
> + X OBJECT' G' +
> ^
>
> + X OBJECT' G' +
> ^
>
> Thus, the running example finally results in
>
> + X OBJECT' G' +
>
> Note that the disabling context that is created when a macro =nvocation is
> replaced is not created until after all arguments =hat need to scanned for
> macro expansion have been. This is why both G's expand in a the example.

You should repeat the "painting rules" here. You already said that paint applies to single tokens but it's a good idea to more explicitly point out to the reader that no parentheses are needed for G to end up painted.

> _________________________________________________________________
>
> Arguments to Interleaved Invocations
>
> Because arguments not used as operands of token-pasting or =tringizing are
> scanned separately but within any active disabling =ontexts, the
> possibility exists for an argument (or part of an =rgument) to be scanned
> in inside a disabling context that no longer =xists after the macro

An "in" too much?

> invocation has been replaced. Consider,
>
> #define A(m) m( B(f)
> #define B(x) A(x)
>
> #define C(x) [ x ]
>
> A(C) ) *
>
> The following diagrams the expansion of this sequence of tokens:
>
> A(C) ) *
> |^ |
> |____|
> |
> A INV
>
> // recursive scan of argument #1 to A results in:
>
> C( B(f) ) *
> ||^ | |
> ||_______|_|
> | | |
> | C INV
> |________|
> |
> A DC
>
> // recursive scan of argument #1 to C:
>
> B(f)
> |^ ||
> |____||
> | |
> B INV
> _ _ ______|
> |
> A DC
>
> // recursive scan of argument #1 to B results in: f
>
> A(f)
> |^ ||
> |____||
> | |
> B DC
> _ _ ______|
> |
> A DC
>
> A'(f)
> | ^ ||
> |_____||
> | |
> B DC
> _ _ _______|
> |
> A DC
>
> // recursive scan results in: A'(f)
>
> [ A'(f) ] *
> |^ |
> |_________|
> |
> C DC
>
> The result of scanning for expansion is
>
> [ A'(f) ] *
>
> Note that the argument to C was scanned inside the =isabling context
> corresponding to A, but that disabling =ontext is no longer active when
> scanning resumes after the replacement =f C's invocation. (Scenarios such
> as this one are tricky, and preprocessors rarely =andle this kind of thing
> correctly.)

Point out explicitly that A remains painted.

> _________________________________________________________________
>
> Virtual Tokens
>
> [...]

Interesting! Is it implemented somewhere?

BTW. I'm done with the comments, so far.

I believe there is more that can be done to enhance the structure. Anyway, all in all it reads quite nice.

I hope it's of any use.

>
> © Copyright [1]Paul Mensonides = 2003-2006

Regards,

Tobias


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk