Boost logo

Boost :

From: Paul Mensonides (pmenso57_at_[hidden])
Date: 2006-02-14 07:58:20


> -----Original Message-----
> From: boost-bounces_at_[hidden]
> [mailto:boost-bounces_at_[hidden]] On Behalf Of Oliver Kullmann

> First of all I think here the documentation of the library
> doesn't reach its full potential yet: I tried first to
> read through "Topics", but found all examples contrived
> and complicated (couldn't find a single plain example),

What is a "plain example"?

> and thus I skipped that and went to Reference, where
> I found BOOST_PP_SEQ_FOR_EACH, making some still mysterious
> remarks about "The next available BOOST_PP_FOR repetition."
> etc. But at various places it somehow says that "reentry problems"
> whatever this means are solved now.

There is an article in the topics section about this (titled "Reentrancy").

> Only when looking it up
> in the rationale of C99 I found a clear statement about the issue
> (while the C++ standard somehow supposes you know already the
> issue).

It is pretty clear in both standards (6.10.3.4/2 in C and 16.3.4/2 in C++).

> So a clear statement about the limitations of the looping
> constructions
> *at each place* would help I would guess quite a few people trying to
> use this library.

It is unnecessary clutter. It is not the documentation's responsibility to
constantly tell you the way that the language works. Unless the documentation
says otherwise, it should be assumed that a macro cannot expand inside itself.
That is simply the way that the language works. The library makes it *appear*
that a few macros are recursive, but then it explicitly says that the macro can
be used as if it was recursive.

> (I believe documentation must be written so that
> it can be entered in many ways, and not assuming you are reading it
> from the start to the end --- redundancy is needed.)

I disagree on all counts. It is impractical to make every page of the
documentation a complete discourse on both how the preprocessor works and how
preprocessor metaprogramming works. Beyond that, redundancy in both code and
documentation is a maintenance nightmare.

> Second of all it seems the current facilities in the library should
> be enhanced (though I don't understand how FOR and FOR_r is supposed
> to be used): BOOST_PP_SEQ_FOR_EACH is such a natural and easy thing,
> the use of it should be promoted, while FOR and its derivates look
> ugly.

Hmm. FOR is far more general. SEQ_FOR_EACH can be implemented with FOR, but
not vice versa.

> But it seems that the library favours FOR, and seems to envisage
> a main loop using FOR, and only at the second level using FOR_EACH.

The library doesn't favor it, it simply is more general purpose. In order to
allow the kind of reentrance that FOR allows, hundreds of macros are required
for *each* higher-order algorithm. SEQ_FOR_EACH is only one of many.

> My solution was to convert the second sequence B into a list, and
> use BOOST_PP_LIST_FOR_EACH for the inner loop. But it seems
> to me that this works is
> pure chance, and with any change in the implementation of
> BOOST_PP_SEQ_FOR_EACH
> or BOOST_PP_LIST_FOR_EACH (introducing a common macro) this solution
> will break.

It will only break if SEQ_FOR_EACH or LIST_FOR_EACH arbitrarily decides to use
the other--which won't happen. I design the algorithms in the library, and I'm
well aware of vertical dependencies and their implications.

> According to what I understand from the documentation, the
> natural solution,
> offering for example BOOST_PP_SEQ_FOR_EACH_2, which uses
> macros guaranteed
> not to occur in BOOST_PP_SEQ_FOR_EACH, has been deprecated in
> favour of
> a more complicated solution.

You don't understand the implications of what you're saying. In order to do
that, I'd need to define hundreds of macros to implement SEQ_FOR_EACH alone. It
isn't as simple as just replicating the interface macro a few times. Instead,
I'd have to reimplement the entire algorithm from scratch--intentionally
avoiding the use of any other higher-order algorithm.

> But it seems this more
> complicated solution
> works only for one kind of loop, and not for others, and the
> documentation

I'm not sure what you're referring to by "more complicated solution". If you
specify what kind of thing your referring to, I'll expound.

> seems so arcane that I don't have a clue how it could work
> (the typical
> phenomenon, that the documentation wants to show off by
> presenting only
> complicated examples, which show all sorts of templates,
> etc., and this
> without saying what actually should by achieved --- good code
> solves one
> problem at a time, and good documentation discusses one
> problem at a time).

The documentation has many examples, but you call them either complicated or
contrived. What exactly to you want? This kind of stuff doesn't boil down to
_simple_ use cases. Rather, typical use cases are parts of overarching designs
that are complex--which is precisely the reason that preprocessor
metaprogramming enters the picture at all. Furthermore, I have a responsibility
(as does any library author) not to promote naive or poor uses of the
library--which is almost always the case with examples that implement something
simple. So, I can do two things--I can provide simple contrived examples that
demonstrate what various things do, and I can provide complex examples that
implement real world scenarios. The documentation does both already.

> So if the library offers a natural way of nesting
> BOOST_PP_SEQ_FOR_EACH,
> then documenting this (directly, without solving other
> problems) would be good,
> or otherwise adding BOOST_PP_SEQ_FOR_EACH_2 would help some
> users (actually,
> just the presence of this second form would announce the
> underlying problem).

The library 1) shouldn't have to announce this problem and 2) already does
announce this problem--it just doesn't do it hundreds of times in hundres of
different places. If you are writing code in the language defined by the
preprocessor, you should *at least* know the basics of that language. The
library is not the language, it is merely a systematic set of reusable contructs
written in that language. The documentation's only responsibility is to
document itself.

> Sometimes less general but simpler solutions are preferable over more
> general but complicated solutions.

Again, I'm not sure what you mean by "more complicated solutions".

-----

I'm trying very hard not be harsh, but what you suggest has far-reaching
implications that you don't understand. Regarding examples in the
documentation, I don't think there is a way to please you--see posts by me in
this thread:

http://thread.gmane.org/gmane.comp.lib.boost.user/16132

-----

Regarding the technical implications:

(Please make the effort to follow this.)

As it stands now, the library emulates recursion by replicating macros. For
example, a WHILE loop can be implemented such as:

#define WHILE_1(pred, op, state) \
    IF(pred(2, state), WHILE_2, state TUPLE_EAT(3))(pred, op, op(2, state)) \
    /**/
#define WHILE_2(pred, op, state) \
    IF(pred(3, state), WHILE_3, state TUPLE_EAT(3))(pred, op, op(3, state)) \
    /**/
// etc. (i.e. hundreds...)

A predicate is passed that determines when the algorithm should stop, and an
operation is used to iterate the state on each step of the algorithm.

There are a couple of things here must be understood. First, each WHILE_xyz
represents both an entry point into the algorithm (i.e. an interface to the
algorithm) *and* a single algorithmic step taken by the algorithm. So, WHILE_2
does not mean "separate nested WHILE loop" or "second looping dimension".
Instead, it means "the step after WHILE_1".

Now, when the algorithm (meaning the entire collection of macros) invokes the
predicate and the operation, it passes along the number of the next available
WHILE_xyz macro. This value is called the "state" of the algorithm. (Note that
this is not the 'state' parameter. That is the (arbitrary) state the a
particular *use* of the algorithm is iterating.) The predicate and the
operation can use this algorithm state to reenter the algorithm, but before the
algorithm itself actually continues into that macro. As the algorithm
continues, the algorithm state values passed to the predicate and the operation
change (i.e. they increase). This is important. They do not remain fixed,
which means that the predicate, for example, cannot just arbitrarily use
WHILE_2. It has to use the WHILE_xyz indicated by the algorithm state value
passed to it.

Obviously it is possible to provide several totally distinct implementations of
the entire loop, but that implies several hundred macros per loop, instead of
sharing a single set of several hundred macros for all loop uses. Say you did
this anyway, and you get WHILE_A, WHILE_B, WHILE_C (and so on, each with their
own WHILE_A_1, WHILE_A_2, etc.). This doesn't really buy you anything, because
you'll still ultimately end up passing A, B, C (etc.) as a reentry point.

E.g. suppose you write some macro that uses WHILE_A, and then you change another
macro so that it uses this macro, but this other macro was already using
WHILE_A. Fine, you just change this other macro to use WHILE_B. But then,
there is another macro that uses this second macro, and this third macro is
already using WHILE_B, so you have to change that to use WHILE_C. This is a
significant bubbling of implementation detail, and it is *way* worse when all
three macros are written by different people. Ultimately, you arrive at the
conclusion that it doesn't matter *which* WHILE the first macro uses, so you
parametize the first macro with the WHILE to be used, Similarly with the second
(and when the second calls the first, it uses the next WHILE from the one it is
currently using). Similarly again with the third. So, instead of having the
most nested use of WHILE be WHILE_A, you have the outermost use be WHILE_A, and
each inner use adjusts, so that the first nested use uses WHILE_B, the next
nested use uses WHILE_C, and so on. This is the only way to do it that scales,
and this is ultimately no different than just having one WHILE with one set of
implementation macros.

This is exactly (minus a lot of workarounds) the way that the library emulates
recursion (particularly: in FOR). The problem comes when you write a
higher-order algorithm on top of another higher-order algorithm. For example,
SEQ_FOR_EACH on top of FOR. (Higher-order algorithms are fundamentally
different because the implementation of the algorithm cannot completely control
what macros are called inside of them.) It doesn't take hundreds of macros to
implement SEQ_FOR_EACH because it can reuse the more general purpose algorithmic
steps provided by FOR. In order to make SEQ_FOR_EACH reentrant, you need, at
minimum, a distinct interface macro (i.e. SEQ_FOR_EACH_1, SEQ_FOR_EACH_2, etc.)
and distinct macro to be repeated by FOR for each possible entry point into
SEQ_FOR_EACH.

That isn't too bad in terms of number of macros, but you introduce yet another
algorithm state by doing so. If each higher-order algorithm did so, the user
would be inundated by the states of various algorithms. E.g. just to invoke
SEQ_FOR_EACH, you'd need to supply both the state of SEQ_FOR_EACH and the state
of FOR. Worse, if another algorithm (with multiple entry points) uses
SEQ_FOR_EACH, you'd need the state of that algorithm, the state of SEQ_FOR_EACH,
and the state of FOR to use it. You resolve this in one of two ways: 1) you
simply don't provide reentrance in higher-order algorithms that are implemented
using other higher-order algorithms, or 2) you reimplement the algorithm from
scratch to avoid using other higher-order algorithms. The former is what the
library does, the latter requires hundreds of macros to implement each
higher-order algorithm.

-----

All of this is the status quo of the library. There are two other models that
are superior in terms of ease of use and avoidance of these types of issues.

The first of these, I use in a different (far more powerful) preprocessor
library. The problem with this model is that it isn't portable to broken
preprocessors (which are heavily used by users of Boost--e.g. VC++).

The second alternative model revolves around automatically deducing all
algorithm states whenever they are needed. IOW, completely hiding them from
users altogether. The problem with this model is that it isn't portable because
of efficiency concerns. Some preprocessors would handle it without any
significant problem. Others would slow to a crawl--e.g. EDG-based
preprocessors.

Of these two, the first is much more extensible. It allows for the definition
of new algorithms without *any* macro replication. The second would still be
better (as far as ease-of-use is concerned) than what the library provides now,
but there is no way that I could get away with it. The second can be improved
beyond what I've described here (e.g. by separating the notions of "entry-point"
and "algorithmic step"--200+ loop iterations may be necessary, 200+ nested loops
is massive overkill), but it all boils down to the same thing--always deducing
the state when needed.

Unfortunately, I cannot do either of these in Boost because Boost cannot simply
drop support for a compiler just because it has a broken or slow preprocessor
(of all things).

> P.S. The situation with the preprocessor is complicated by
> the general silence
> on this issue in the books. And quite amazing, how the
> library achieves
> to squeeze out nearly a general purpose machine (but not with
> an infinite
> tape :-)) from the preprocessor machinery, which seems to
> have been castrated
> by the designers on purpose.

Incidentally (and this isn't exactly relevant to this conversion), the model
used by my other library is capable of using a set of macros exponentially, such
that (e.g.) the "end of the tape" is trillions of iterations away--effectively
removing it as a limitation. Given unlimited system resources (like memory),
any algorithm that needed that many steps would literally take centuries to
execute even on the fastest preprocessor. IOW, it removes the limitation for
all intents and purposes.

Also, attached is a draft of a document that I'm writing (for both libraries)
that describes the macro expansion process in detail. You might find it useful.
If you or others read this, feedback is welcome. I'm _not_ interested in
feedback related to formatting or layout. I want to avoid an argument about an
issue that 1) nobody will ever agree on and 2) I don't think is as important as
it is sometimes made out to be. I _am_ interested in feedback relating to
structure and content.

Regards,
Paul Mensonides


   The process of macro expansion is best viewed as the process of = scanning
   for macro expansion (rather than the process of a single macro = expansion
   alone). When the preprocessor encounters a sequence of preprocessing tokens
   = and whitespace separations that needs to be scanned for macro expansion, it has to perform a number of steps. These steps are examined in this
   document in detail.
     _________________________________________________________________

    Conventions

   Several conventions are used in this document for the sake of = simplicity.
   First, except in contexts where the distinction matters, this = document
   uses the terms "token" and "preprocessing token" = interchangeably. Where
   the distinction does matter, this document will explicitly = differentiate
   between them by using the terms "regular token" and = "preprocessing token."
   From a technical standpoint, the primary difference between the two = token
   types relates to permissiveness. Preprocessing tokens are much more
   permissive than regular tokens. Ignoring classifications (e.g. keyword,
   identifier), all regular = tokens are trivially converted to preprocessing
   tokens. Not all preprocessing tokens can be converted to regular tokens. For
   example, 0.0.0.0 is a single, valid preprocessing = token, but it cannot be
   converted to a regular token. Thus, preprocessing tokens can be thought of
   as a superset of regular = tokens.

   Second, when this document uses the phrase "sequence of tokens" (i.e. "sequence of preprocessing tokens"), the possibility of whitespace separations is implied. In other words, "sequence of tokens" really means
   "sequence of tokens = and whitespace separations." Both the C and C++
   standards do this as well, though they don't = explicitly say they are doing
   it. Rather, it is implied by the phases of translation and by certain parts of the text such as §16.3.2/2 of the C++ standard and §6.10.3.2/2 of the C standard.

   Third, this document uses the term "scanning" to mean "scanning for = macro
   expansion." There are other kinds of scanning (in the general sense) that
   are = performed during the macro expansion process. For example, a
   replacement list is scanned for instances of formal = parameters to
   substitute. Other kinds of scanning besides scanning for macro expansion are
   = referred to explicitly.

   Lastly, this document uses a graphical notation to illustrate various parts of the process in examples. For clarity, all whitespace separations
   are ignored in these = diagrams. Given a sequence of tokens, the current
   location of the scanner is = indicated with a circumflex (^). For example,

+ + +
  ^

   A range of tokens is indicated by using a frame such as:

 + + +
|^ |
|_____|
   |
   LABEL

   These frames are labeled with the meaning of the frame or the = sequence of
   tokens bounded by the frame. An identifier token that has been painted is
   marked with an = apostrophe (').

A' token
   ^

   This document will discuss what it means for an identifier token to = be
   "painted." It also doesn't use character literals ('a') to avoid ambiguity. It is important to note that the apostrophe is not literally
   added to = the sequence of tokens. Rather, it is only a notation that
   indicates a property of the = identifier to which it is "attached."
     _________________________________________________________________

    Locations

   There are several points where the preprocessor must scan a sequence = of
   tokens looking for macro invocations to expand. The most obvious of these is
   between preprocessing directives = (subject to conditional compilation). For
   example,

#include "file.h"

    // ...

#define A()

    // ...
    
#undef A

   Both of the sections containing comments must be scanned for macro expansion.

   The preprocessor must also scan a few other sequences of tokens. These
   include the expression that controls an #if or = #elif directive. Subject to
   certain conditions, the operands of the = #include directive and the #line
   directive are = also scanned for macro expansion.

   The places where a sequence of tokens is scanned for macro expansion = in a
   source file are not really the subject of this document. As such, the
   remainder of this document concentrates on just = sequences of tokens that
   must be scanned, though sometimes the = #define directive is used.

   Before continuing, however, note that an argument to a macro that = contains
   what could be a preprocessing directive results in undefined = behavior. For
   example,

#define MACRO(x) x

MACRO(
    #include "file.h"
)

   This is undefined behavior because it gives a preprocessor = implementation
   latitude. Specifically, it allows for a preprocessor to parse an entire file
   = into directives and sequences of tokens between directives prior to semantic evaluation. It also allows for a preprocessor to semantically
   evaluate the result = of preprocessing in a single pass.
     _________________________________________________________________

    Scanning

   Consider the following macro definitions and the subsequent sequence = of
   tokens to be scanned:

#define OBJECT OBJ ## ECT F()
#define F() G(G(G))
#define G(x) x

+ X OBJECT +

   These definitions and the subsequent sequence of tokens will be used = as a
   running example in describing the steps of the expansion process.

   Given a sequence of tokens to be scanned, the preprocessor = successively
   looks at each token looking for identifiers. If a given token is not an
   identifier, the preprocessor outputs the = token and moves on to the next.

+ X OBJECT +
^

+ X OBJECT +
  ^

   The term "output" is being used generally. Sometimes it means that the
   preprocessor is finished with the token = and that the token can be handed
   off to the underlying language parser = or output to a text stream (in the
   case of a compiler's preprocess-only = mode). Other times, it means that it
   is output to a buffer that is later = reused by the preprocessor for another
   purpose. In all cases, however, it does mean that this scan for macro expansion is finished with the token and that the token results from = this
   scan.

   If the token is an identifier, the preprocessor must check = to see if the
   identifier corresponds to the name of a macro that is = currently defined.
   If it is not, the preprocessor outputs the token and moves on to the = next.
   In the running example, the current token is the identifier = X, which does
   not correspond to a macro name.

+ X OBJECT +
  ^

+ X OBJECT +
    ^
     _________________________________________________________________

    "Blue Paint"

   If the current token is an identifier that refers to a macro, the preprocessor must check to see if the token is painted. If it is painted, it
   outputs the token and moves on to the next.

   When an identifier token is painted, it means that the preprocessor = will
   not attempt to expand it as a macro (which is why it outputs it and = moves
   on). In other words, the token itself is flagged as disabled, and it behaves like an identifier that does not corresponds to a macro. This
   disabled flag is commonly referred to as "blue paint," and if = the disabled
   flag is set on a particular token, that token is called = "painted." (The
   means by which an identifier token can become painted is = described below.)

   In the running example, the current token is the identifier = OBJECT, which
   does correspond to a macro name. It is not painted, however, so the
   preprocessor moves on to the next = step.
     _________________________________________________________________

    Disabling Contexts

   If the current token is an identifier token that corresponds to a = macro
   name, and the token is not painted, the preprocessor must = check to see if
   a disabling context that corresponds to the macro = referred to by the
   identifier is active. If a corresponding disabling context is active, the
   preprocessor = paints the identifier token, outputs it, and moves on to the
   next token.

   A "disabling context" corresponds to a specific macro and exists over = a
   range of tokens during a single scan. If an identifier that refers to a
   macro is found inside a disabling = context that corresponds to the same
   macro, it is painted.

   Disabling contexts apply to macros themselves over a given geographic sequence of tokens, while blue paint applies to particular identifier tokens. The former causes the latter, and the latter is what prevents "recursion" in macro expansion. (The means by which a disabling cotnext
   comes into existence is = discussed below.)

   In the running example, the current token is still the identifier = OBJECT.
   It is not painted, and there is no active disabling context that = would
   cause it to be painted. Therefore, the preprocessor moves on to the next
   step.
     _________________________________________________________________

    Object-like Macro Invocations

   If an identifier token that corresponds to a macro name is not = painted,
   and there is not active disabling context that would cause it = to be
   painted, the preprocessor must check to see what type of macro is = being
   referenced--object-like or function-like. If the macro is defined as an
   object-like macro, which is the case = OBJECT in the running example, the
   identifier alone forms = an invocation of the macro, and the preprocessor
   begins the replacement = process of that invocation.

+ X OBJECT +
    |^ |
    |______|
        |
        OBJECT invocation (INV)

   For an object-like macro, the first step in the replacement process = is the
   retrieval of the replacement list of the macro from the symbol = table. In
   the running example, the OBJECT macro's replacement = list is

OBJ ## ECT F()

   The preprocessor then performs any token-pasting operations in the replacement list. (Note that there is no stringizing operator in object-like
   macros.) The result of token-pasting in OBJECT's replacement list = is

OBJECT F()

   After all token-pasting has been performed, the resulting sequence of tokens is used to replace the invocation of the macro. At the same time, a
   disabling context that corresponds to the macro = being replaced is
   activated. This disabling context surrounds the tokens that came from the replacement list.

+ X OBJECT F() +
   | |
   |__________|
        |
        OBJECT disabling context (DC)

   Finally, scanning resumes beginning with the first token that came = from
   the replacement list (or the next token after the invocation if the replacement list was empty). In the running example, scanning resumes at
   OBJECT.

+ X OBJECT F() +
   |^ |
   |__________|
         |
         OBJECT DC

   Note that this OBJECT identifier is a different = identifier token than the
   one that formed an invocation of the = OBJECT macro. It came from the
   replacement list of the OBJECT macro = (indirectly by way of token-pasting).
     _________________________________________________________________

    Iterative Replacement vs. Functional Hierarchy

   Before continuing, it is important to note that a replacement list is = not
   scanned for macro expansion prior to the replacement of an = invocation. For
   example,

#define A B
#define B 123

A

   It is tempting to believe that the top-level scan calls = A, A calls B, B returns 123 to A, and A returns = 123 to the top-level. In other words, it
   is tempting to believe that the invocations form a = functional hierarchy
   similar to the procedural programming model that is = at the root of most
   programming paradigms. That is not the way that macro expansion works,
   however. Instead, the top-level scan replaces A with = B and resumes. It
   then replaces B with 123 and resumes.

   This iterative replacement model may seem foreign, but it has a = strong
   similarity to inlined functions. For example,

inline int g() {
    return 123;
}

inline int f() {
    return g();
}

const int x = f();

   Instead of calling f, the call of f is = effectively replaced with the body
   of f. The call of g contained therein, is effectively replaced = with the
   body of g. Inlined functions are not quite that simple, of course. They
   operate under the rules of lexical scope, for example, which = macro
   expansion does not. Nevertheless, inlined functions have a similarity to the
   iterative = replacement model of macro expansion, and one can view macro
   expansion = as a sort of "continual inlining."
     _________________________________________________________________

    Resumption of Scanning

   After the OBJECT invocation has been replaced in the = running example,
   scanning resumes on the first token that came from the = replacement list.

+ X OBJECT F() +
   |^ |
   |__________|
         |
         OBJECT DC

   This time, however, when the preprocessor looks at the = OBJECT token, a
   disabling context that corresponds to = OBJECT is active. Because of that,
   this particular OBJECT token is = painted, and the preprocessor moves on to
   the next token.

+ X OBJECT' F() +
   | ^ |
   |___________|
         |
         OBJECT DC
     _________________________________________________________________

    Function-like Macro Invocations

   If an identifier token that corresponds to a macro name is not = painted,
   and there is not active disabling context that would cause it = to be
   painted, the preprocessor must check to see what type of macro is = being
   referenced--object-like or function-like. If the macro is function-like,
   things are a little more complex than = with object-like macros. The process
   is similar, but several more steps are performed by the = preprocessor.

   After finding that the an identifier token refers to a function-like macro, the preprocessor must look ahead to the next token in the = sequence
   of tokens to see if it is a left-parenthesis. If it is not, the preprocessor
   outputs the identifier, and moves on = to the next token (i.e. the one that
   was not a left-parenthesis).

   If the next token after the identifier is a = left-parenthesis, the
   preprocessor must check the arity of the macro = referenced by the
   identifier. In the running example, the preprocessor has found that = F
   identifier and the ( token immediately = following it (as the next token,
   whitespace between the identifier and = the left-parenthesis is ignored).
   The preprocessor sees that F is defined as a nullary = macro--that is, it
   takes no arguments.
     _________________________________________________________________

    Nullary Invocations

   If scanning finds that an identifier (followed by a left-parenthesis) = that
   refers to a nullary function-like macro (as is the case with = F in the
   running example) the preprocessor must find the = corresponding
   right-parenthesis. The preprocessor must find the right-parenthesis. If it
   cannot (e.g. it finds EOF instead), the result is an = "incomplete
   invocation" error.

   While it does so, it must ensure that there are no tokens between the left-parenthesis and the right-parenthesis. Specifically, between the two
   parentheses, there must be either one = or more whitespace separations or
   nothing at all. If any tokens are present between them, the result is a "too
   many = arguments" error.

   The identifier and the left- and right-parentheses form an invocation = of
   the macro, and the preprocessor begins the replacement process of = that
   invocation.

+ X OBJECT' F() +
   | |^ ||
   | |___||
   | | |
   | F INV
   |____________|
          |
          OBJECT DC

   As with an object-like macro, the first step in the replacement = process is
   the retrieval of the replacement list of the macro from the = symbol table.
   In the running example, the F macro's replacement list = is

G(G(G))

   The preprocessor then performs any token-pasting operations in the replacement list. (Note that a nullary macro is a function-like macro, so
   the = stringizing operator exists. However, the stringizing operator must be
   applied to an instance of a = formal parameter in the replacement list. A
   function-like macro has no formal parameters, and therefore any use = of the
   stringizing operator is automatically an error.) The result of token-pasting
   in F's replacement list is

G(G(G))

   In this case, there are no token-pasting operations to perform, so = the
   result is a no-op.

   The sequence of tokens resulting from token-pasting is used to = replace the
   invocation of the macro. At the same time, a disabling context that
   corresponds to the macro = being replaced is activated. This disabling
   context surrounds the tokens that came from the = replacement list.

+ X OBJECT' G(G(G)) +
   | | ||
   | |_______||
   | | |
   | F DC |
   |________________|
            |
            OBJECT DC

   Finally, scanning resumes beginning with the first token that came = from
   the replacement list (or the next token after the invocation if the replacement list was empty).

+ X OBJECT' G(G(G)) +
   | |^ ||
   | |_______||
   | | |
   | F DC |
   |________________|
            |
            OBJECT DC

   Nullary function-like macro invocations are nearly identical to object-like macro invocations. The primary difference is that an invocation
   of a function-like macro = requires multiple tokens.
     _________________________________________________________________

    Interleaved Invocations

   It is important to note that disabling contexts only exist during a = single
   scan. Moreover, when scanning passes the end of a disabling context, that disabling context no longer exists. In other words, the output of a scan
   results only in tokens and = whitespace separations. Some of those tokens
   might be painted (and they remain painted), but = disabling contexts are not
   part of the result of scanning. (If they were, there would be no need for
   blue paint.)

   In the diagrams used in this document, the tokens that have been = output by
   a scan are left in the diagram to provide context for the = reader, such as:

+ + +
    ^

   Here, scanning is looking at the third + token. This particular scan is done
   with the two preceding tokens. Thus, if a disabling context existed around
   these tokens...

 + + +
|^ |
|_____|
   |
   DC

   ...the disabling context effectively "closes" as the scan moves on = from
   token to token because the disabling context is not part of the = result of
   the scan.

+ + +
 |^ |
 |___|
   |
   DC

+ + +
   |^|
   |_|
    |
    DC

+ + +
      ^

   Because function-like macro invocations require multiple tokens, it = is
   possible for such an invocation to span the end of a replacement = list. For
   example,

#define A() B
#define B() 123

A()()

   The A() invocation is replaced with B, and = a B invocation is formed using
   the tokens that follow the = A invocation.

 A() ()
|^ |
|___|
  |
  A INV

  B ()
||^| |
||_|__|
| | |
|__| B INV
  |
  A DC

   Thus, the invocation of B is geographically = interleaved with the disabling
   context corresponding to = A. When the invocation of B gets replaced by 123 and scanning resumes at 123, the disabling = context corresponding to A
   no longer exists.

   In order for an "outer" disabling context to remain active, = invocations
   must be nested (rather than interleaved) within that = disabling context.
   For example,

#define C() D()
#define D() 123

C()

   Here, the D invocation will be nested within the = disabling context
   corresponding to C.

 C()
|^ |
|___|
  |
  C INV

  D()
||^ ||
||___||
| | |
| D INV
|_____|
   |
   C DC

  123
||^ ||
||___||
| | |
| D DC
|_____|
   |
   C DC

   Disabling contexts (or lack thereof) have no effect on the result of either of these two examples. However, that isn't always the case.

#define A() B
#define B() A

A()()()

   Here, the result is ultimately an unpainted B = token--because no A or B
   token ever becomes = painted.

 A() ()()
|^ |
|___|
  |
  A INV

  B () ()
||^| |
||_|__|
| | |
|__| B INV
  |
  A DC

  A ()
||^| |
||_|__|
| | |
|__| A INV
  |
  B DC

 B
|^|
|_|
 |
 A DC

   Things are different in the following example:

#define C() D()
#define D() C()

C()

   Here, the result is ultimately C'().

 C()
|^ |
|___|
  |
  C INV

  D()
||^ ||
||___||
| | |
| D INV
|_____|
   |
   C DC

  C()
||^ ||
||___||
| | |
| D DC
|_____|
   |
   C DC

  C'()
|| ^ ||
||____||
| | |
| D DC
|______|
    |
    C DC

   Note that interleaved invocations do not allow for infinite = expansion.
   More tokens must physically be present after the replacement list to complete an interleaved invocation, and this sequence of tokens is ultimately limited to the finite sequence of tokens contained in the source file.
     _________________________________________________________________

    Non-nullary Invocations

   If scanning finds an identifier (followed by a left-parenthesis) that refers to a non-nullary function-like macro (as is the case with = G in the
   running example) the preprocessor must find the corresponding
   right-parenthesis. The preprocessor must find the right-parenthesis. If it
   cannot (e.g. it finds EOF instead), the result is an = "incomplete
   invocation" error.

   While it does so, it must delineate the actual arguments between the = left-
   and right-parentheses. This delineation process is a distinct step that
   separates each = argument into a separate argument before any other
   processing of the = arguments is done. Each argument is separated by a comma
   token (,), but = commas between matched pairs of left- and right-parentheses
   do not = separate arguments, and the right-parenthesis of such matched pairs
   is = not used as the right-parenthesis that terminates the argument list.
   For example, in

MACRO(a, (b, c), d)

   the argument list to MACRO is delineated into three = arguments.

   After the arguments have been delineated, the preprocessor compares = the
   number of actual arguments to the arity of the macro. If there are more or
   less, than the result is a "too many arguments" = or "too few arguments"
   error. (If the macro is variadic, the number of arguments must be at least one greater than the number of named formal parameters. This implies that a
   macro defined as:

#define V(x, y, ...) // ...

   has a minimum arity of three--not two.)

   In C++, if any argument is empty or contains only whitespace = separations,
   the behavior is undefined. In C, an empty argument is allowed, but gets
   special treatment. (That special treatment is described below.)

   Assuming that the number of actual arguments is correct, the = identifier,
   the left- and right-parentheses, and all of the tokens = between them form
   an invocation of the macro, and the preprocessor = begins the replacement
   process of that invocation.

+ X OBJECT' G(G(G)) +
   | ||^ |||
   | ||_______|||
   | | | ||
   | | G INV |
   | |_________||
   | | |
   | F DC |
   |__________________|
             |
             OBJECT DC

   Here, the preprocessor has delineated the arguments to G = which results in
   in the correct number of arguments for G = (which is one, because G is
   unary). That argument is

G(G)

   As with both object-like and nullary function-like invocations, the = first
   step in the replacement list of a function-like macro invocation = is the
   retrieval of the replacement list of the macro from the symbol = table. In
   the running example, the G macro's replacement list = is

x

   which is an instance of the formal parameter.

   After retrieval of the replacement list, the preprocessor replaces = all
   instances of formal parameters in the replacement with actual = parameters.
   However, it does this in a variety of ways depending on how an = instance of
   the formal parameter is used in the replacement list. If a particular
   instance of a formal parameter in the replacement = list is an operand of
   either the token-pasting operator or the = stringizing operator, the actual
   argument is substituted in place of the = formal parameter instance. (In
   C99, if the argument is empty or contains only whitespace = separations, a
   placemarker is substituted instead.) If a particular instance of a formal
   parameter is not an = operand of either the token-pasting operator or the
   stringizing = operator, the actual parameter is scanned for macro expansion,
   and the = result is substituted in place of the formal parameter instance.
   (The scanning of an argument is the only true instance of recursion = in the
   macro expansion process.) Though it is scanned independently, any disabling
   contexts still = exist.

   Note that an argument is scanned for macro expansion if and = only if at
   least one instance of a corresponding formal = parameter exists in the
   replacement list without being an operand of one = of the operators. Thus,
   in

#define EAT(x)
#define BINARY(a, b) // ...

EAT( BINARY() )

   no error should occur--even though there would be an error if the = argument
   to EAT was scanned for macro expansion.

   After all instances of formal parameters in the replacement list have = been
   substituted, the preprocessor performs all token-pasting and = stringizing
   operations in the resulting sequence of tokens. Note that the precedence
   between the token-pasting and stringizing = operators is unspecified, as is
   the associativity of token-pasting = operator. However, the operands of the
   token-pasting operator are single tokens = (those immediately to the right
   and left of it). The operand of the stringizing operator is the entire
   argument. (Recall that it must be applied to an instance of a formal parameter.) Thus, even the order of evaluation between token-pasting and stringizing is unspecified, one can think of a use of the stringizing operator as being replaced with a third "stringized" version of the = actual
   argument. (Otherwise, the preprocessor would have to mark the extent of the
   = substituted argument in some way.) Of course, the order of evaluation is
   unspecified, and therefore code = should avoid relying on any particular
   order.

   It is important to note that, even though the token-pasting operator operates on exactly two tokens (rather than entire arguments), it does affect entire arguments in the sense that the token-pasting operator prevents the entire argument from being scanned for macro expansion = prior
   to substitution.

   In C99, placemarkers are used as a temporary pseudo-token so that token-pasting and stringizing work "correctly." For example,

#define M(x) 1 x ## 3

M()

   Here, the argument for x is empty, so a placemarker is = substituted in
   place of the x in the replacement list.

1 <placemarker> ## 3

   This causes the token-pasting operator to operate on 3 = and the placemarker
   (instead of 3 and 1). Token-pasting of a token to a placemarker results in
   the token. Token-pasting of two placemarkers results in a placemarker.
   Stringizing of a placemarker results in the empty string:

#define STR(x) #x

STR() // ""

   After all token-pasting and stringizing operations have been = performed,
   all remaining placemarkers are removed, and the resulting = sequence of
   tokens is used to replace the invocation of the macro. At the same time, a
   disabling conttext that corresponds to the macro = being replaced is
   activated. This disabling context surrounds the toknes that came from the replacement list.

   In the running example, the x formal parameter instance = in G's replacement
   list is not an operand of either the = token-pasting or stringizing
   operator. Therefore, the actual argument corresponding to x is = scanned for
   macro expansion separately, but within any active disabling = contexts. In
   this example, during the recursive scan of G's = argument, another
   invocation of G is found, and its = argument is recursively scanned as well.

+ X OBJECT' G(G(G)) +
   | ||^ |||
   | ||_______|||
   | | | ||
   | | G INV |
   | |_________||
   | | |
   | F DC |
   |__________________|
             |
             OBJECT DC

    // recursive scan of argument #1 to G:
    
         G(G)
        |^ |
        |____|
           |
           G INV
    _ _ ______ _ _
           |
           F DC
    _ _ ______ _ _
           |
           OBJECT DC

        // recursive scan of argument #1 to G:

            G
            ^
        _ _ _ _ _
            |
            F DC
        _ _ _ _ _
            |
            OBJECT DC

            G
             ^
        _ _ _ _ _
            |
            F DC
        _ _ _ _ _
            |
            OBJECT DC

        // recursive scan results in: G
    
         G
        |^|
        |_|
         |
         G DC
    _ _ ___ _ _
         |
         F DC
    _ _ ___ _ _
         |
         OBJECT DC

         G'
           ^
    _ _ ___ _ _
         |
         F DC
    _ _ ___ _ _
         |
         OBJECT DC

    // recursive scan results in: G'

+ X OBJECT' G' +
   | ||^ |||
   | ||__|||
   | | | ||
   | | G DC
   | |____||
   | | |
   | F DC
   |_____________|
          |
          OBJECT DC

+ X OBJECT' G' +
               ^

+ X OBJECT' G' +
                ^

   Thus, the running example finally results in

+ X OBJECT' G' +

   Note that the disabling context that is created when a macro = invocation is
   replaced is not created until after all arguments = that need to scanned for
   macro expansion have been. This is why both G's expand in a the example.
     _________________________________________________________________

    Arguments to Interleaved Invocations

   Because arguments not used as operands of token-pasting or = stringizing are
   scanned separately but within any active disabling = contexts, the
   possibility exists for an argument (or part of an = argument) to be scanned
   in inside a disabling context that no longer = exists after the macro
   invocation has been replaced. Consider,

#define A(m) m( B(f)
#define B(x) A(x)

#define C(x) [ x ]

A(C) ) *

   The following diagrams the expansion of this sequence of tokens:

 A(C) ) *
|^ |
|____|
   |
   A INV

   // recursive scan of argument #1 to A results in:

  C( B(f) ) *
||^ | |
||_______|_|
| | |
| C INV
|________|
     |
     A DC

    // recursive scan of argument #1 to C:
    
         B(f)
        |^ ||
        |____||
           | |
           B INV
    _ _ ______|
           |
           A DC

        // recursive scan of argument #1 to B results in: f
    
         A(f)
        |^ ||
        |____||
           | |
           B DC
    _ _ ______|
           |
           A DC

         A'(f)
        | ^ ||
        |_____||
           | |
           B DC
    _ _ _______|
           |
           A DC
    
    // recursive scan results in: A'(f)

 [ A'(f) ] *
|^ |
|_________|
     |
     C DC

   The result of scanning for expansion is

[ A'(f) ] *

   Note that the argument to C was scanned inside the = disabling context
   corresponding to A, but that disabling = context is no longer active when
   scanning resumes after the replacement = of C's invocation. (Scenarios such
   as this one are tricky, and preprocessors rarely = handle this kind of thing
   correctly.)
     _________________________________________________________________

    Virtual Tokens

   In the implementation of a scanning routine, the single most design-affecting part of this whole process is the handling of the disabling contexts. This section discusses a possible way in which they can
   be = implemented. Recall that disabling contexts can be viewed as closing
   contexts = because they are only active during a single scan and because
   scanning = never moves backwards. Because of this property, disabling
   contexts can be implemented using = flags in the symbol table and virtual
   tokens. Instead of maintaining expensive data structures similar to the
   above = diagrams, when a macro invocation is replaced, the preprocessor can
   flag = the macro in the symbol table as "disabled" and insert a virtual
   token = immediately following the sequence of tokens used to replace an invocation. When scanned, this virtual token clears the flag in the symbol
   table = and is removed. Using @A to indicate a virtual token that clears the
   = disabled flag on A in the symbol table...

#define A() A()

 A() A() *
|^ |
|___|
  |
  A INV

// set disabled flag on A in symbol table

A() @A A() *
^

A'() @A A() *
     ^

// clear disabled flag on A

A'() A() *
    |^ |
    |___|
      |
      A INV
 
// set disabled flag on A
 
A'() A() @A *
     ^

A'() A'() @A *
          ^

// clear disabled flag on A

A'() A'() *
          ^

   This implementation strategy is fairly straightforward (and it = produces
   the correct results). It does, however, get a little more complex when
   parameters are = involved. For any given formal parameter, how it is handled
   during substitution = is a property of the formal parameter itself. If the
   parameter is unused, no variations of the parameter are = required. If the
   parameter is used without being an operand of the = token-pasting or
   stringizing operator, a scanned variation is required. If the parameter is
   used as an operand of either of the two = operators, the actual argument
   itself is required. These last two variations are not mutually exclusive.
   For a given formal parameter, both may be required, as in

#define F(x) #x x

   Which variations are required for a particular formal parameter is = known
   at the point of definition of the macro.

   When a macro invocation is expanded, the preprocessor processes each argument in the order they are passed (rather than the order in which = they
   appear in the replacement list). This ensures a virtual token that exists
   inside one argument is = processed at the correct point.

   If a formal parameter is not used in the replacement list of a macro, = the
   argument will be discarded. However, before it is discarded, the
   preprocessor must execute any = virtual tokens that it contains. For
   example,

#define EAT(x) x
#define M() EAT(1

 M() 2)
|^ |
|___|
  |
  M INV

// set disabled flag on M

 EAT(1 @M 2)
|^ |
|___________|
      |
      EAT INV

   Here, even though the argument to EAT is discarded, the = virtual token @M
   must be executed. Otherwise, the disabled flag in the symbol table will
   never be = cleared.

   If only a scanned variation of a formal parameter is required, the preprocessor can simply recursively scan the actual argument. That recursive
   scan will automatically handle the virtual tokens = correctly.

   If only the actual argument itself is required, the preprocessor must iterate through the tokens contained in the argument to paint tokens and execute virtual tokens. It must paint all identifier tokens that correspond
   to macros that = are flagged in the symbol table as disabled and execute all
   virtual = tokens. For example, if the argument to a macro is

A @A A

   and the actual argument itself is required for argument substitution, = the
   sequence of tokens used for substitution must be

A' A

   If both a scanned variation and an actual argument variation is = required,
   the preprocessor has to handle both of the above scenarios. This situation
   is a little more complex, because it has to handle = both scenarios
   correctly with regard to painting and virtual tokens. The simplest strategy
   is to do the actual argument case first, but to = keep track of all virtual
   tokens executed therein. After the actual argument variation is generated,
   the preprocessor = can undo all of the state changes caused by the execution
   of those = virtual tokens. Then, the scanned variation can be generated in
   the same way as = above--simply by recursively scanning it for macro
   expansion. It is important that the actual argument variation is handled
   first, = because during the recursive scanning the of the scanned variation,
   any = number of disabled flags could be set and later cleared by virtual tokens. At the start of processing either type of argument, the state must
   be = the same for both cases, and because virtual tokens may be generated
   and = executed during the recursive scan, it would be very difficult to keep
   = track of which ones originally existed in the argument.
     _________________________________________________________________

    Â© Copyright [1]Paul Mensonides = 2003-2006

    Distributed under the [2]Boost Software = License, Version 1.0. See
    [3]http://chaos-pp.sourceforge.net<= /a> for the most recent version of
    this document.

References

   1. 3D"mailto:pmenso57_at_[hidden]"
   2. file://localhost/tmp/3D"./license.html"
   3. 3D"http://chaos-pp.sourceforge.net"/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk