Extending the Preprocessor and Limiting its Effects

Synopsis

This paper describes several proposed additions to the C++ preprocessor:  variadic macros, placemarker tokens, well-defined token-pasting, and a scoping mechanism.  A short summary of each follows....

Variadic Macros

Variadic macros were added to the C preprocessor as of C99.  They are, effectively, a way to pass a variable number of arguments to a macro.  The specific syntax is as follows:

#define A(...) __VA_ARGS__
A(1, 2, 3) // 1, 2, 3

#define B(a, ...) __VA_ARGS__
B(1, 2, 3) // 2, 3

The ellipsis is used to denote that the macro can accept any number of trailing arguments.  It must always occur as the last formal parameter of the macro.  The variadic arguments passed to the macro are identified by the special symbol __VA_ARGS__ in the replacement list of a variadic macro.  The use of this symbol is prohibited in any other context.

A slight extension to this facility, as it exists in C99, could be to allow the ellipsis without a preceding comma similar to variadic functions in the core language.

#define TRACE(a ...) /* ... */

This would ease the syntactic invocation of variadic macros when combined with placemarker tokens.

Placemarker Tokens

Placemarker tokens (technically, preprocessing tokens) are simply a well-defined way of passing "nothing" as a macro argument.  This facility was also added to the C preprocessor as of C99.

#define X(p) f(p)

X("abc") // f("abc")
X()      // f()

#define Y(a, b) int[a][b]

Y(2, 2) // int[2][2]
Y(, 2)  // int[][2]

Well-Defined Token-Pasting

Currently, as of both C++98 and C99, if token-pasting results in multiple preprocessing tokens, the behavior is undefined.  For example,

#define PASTE(a, b) a ## b

PASTE(1, 2) // okay
PASTE(+, -) // undefined behavior

There are two possible specifications to achieve well-defined behavior in this context.  First, if token-pasting yields multiple preprocessing tokens, the result is a no-op.  In other words, the behavior is as if the token-pasting never occurred.

PASTE(+, ==) // + ==

The second alternative is to retokenize the result of token-pasting.

PASTE(+, ==) // += =

Either specification would be fine, though the second solution is preferable.

Scoping Mechanism

One of the major problems of the preprocessor is that macro definitions do not respect any of the scoping mechanisms of the core language.  As history has shown, this is a major inconvenience and drastically increases the likelihood of name clashes within a translation unit.  The solution is to add both a named and unnamed scoping mechanism to the C++ preprocessor.  This would limit the scope of macro definitions without limiting the accessibility of macro definitions.

General Motivation

The motivations for these proposed additions derive from two perspectives.  The first is "normal" use of the C++ language and the second is metaprogramming with the preprocessor.  Both of these contexts are important in all of the above additions to the C++ preprocessor.  Below are practical motivations in the context of "normal" C++ programming.

Both variadic macros and placemarker tokens have already been added to C.  This represents an unnecessary incompatibility between C and C++.  Adding these facilities to the C++ preprocessor would cause no code to break that is currently well-defined.

Variadic macros, in particular, are important to the C++ language from a practical perspective.  Specifically, template class instantiations that contain more than a single template argument.

#define TYPE(x) x
TYPE(std::basic_string<char, my_traits<char> >) // invalid

However, with variadic macros:

#define TYPE(...) __VA_ARGS__
TYPE(std::basic_string<char, my_traits<char> >) // valid

Placemarker tokens are a natural counterpart to variadic macros.  They formalize the optional nature of a variadic argument (or arguments) so that variadic macros appear similar to the variadic functions, but have been generalized to include named parameters as well.

Token-pasting of unrelated tokens (i.e. token-pasting resulting in multiple preprocessing tokens) is undefined for no substantial reason.  It is not dependent on architecture nor is it difficult for an implementation to diagnose.

Furthermore, retokenization is what most, if not all, preprocessors already do and what most programmers already expect the preprocessor to do.  Well-defined behavior is simply standardizing existing practice and removing an arbitrary and unnecessary undefined behavior from the standard.

A scoping mechanism is needed for the simple practicality of avoiding name collisions.  Both a named and unnamed scoping mechanism are necessary in order to limit the scope of names while maintaining the utility of macro accessibility.

[author's note:  I have seen parts of the scoping mechanism that is currently proposed, but I may not have seen all of it.  There are several major problems associated with what I have seen however.  I will address these issues here and propose an alternative.  This is what I have seen as an example by Francis Glassborow on comp.lang.c++.moderated:

#< // no preprocessor macros can permeate this barrier
   // unless explicitly asked for

#import XYZ // allow that macro through
#define ABC 123
#define FALSE 0
...
#export FALSE
#> // only explicitly exported macros escape

]

The first major problem is the lack of named scopes.  This is a major practical problem given the lazy-evaluation of the preprocessor.

#<
#define DETAIL 123
#define INTERFACE() DETAIL
#export INTERFACE
#>

INTERFACE() // DETAIL?

The second major problem is the use of #< and #> as scoping directives.  These directives are difficult to see in code that contains many other directives, and this lack of visibility would visibly complicate code.

An alternative mechanism is proposed here.  The equivalent of the above example would look like this:

# region // unnamed
#    import XYZ
#    define ABC 123
#    define FALSE 0
#
#    export FALSE
# endreg

Semantically, the above is exactly the same as the original example, but with the directives #region and #endreg replacing #< and #>.  These are much more visible relative to other directives, and therefore much clearer.  (Other alternative names such as #scope/#endscope, #namespace/#endnamespace, and #module/#endmodule would also be good alternatives.)

A named scope would simply be one that contains an identifier after the #region directive.  For example,

# region BOOST
#    define DETAIL 123
#    define INTERFACE() BOOST::DETAIL
#    export INTERFACE
#    define QUALIFIED 1
# endreg

DETAIL      // DETAIL
INTERFACE() // 123

BOOST::QUALIFIED // 1

Further, nested scopes are necessary:

# region BOOST
#    region CONFIG
#        define MAX_ARITY 15
#    endreg
# endreg

BOOST::CONFIG::MAX_ARITY // 15

Names of scopes would also be subject to the #import directive:

# region LIBRARY
#    region TEXT
#        define NEWLINE "\n"
#    endreg
# endreg

// ...

# region
# import LIBRARY::TEXT

// ...

# endreg

Motivation in the Context of Preprocessor Metaprogramming

Preprocessor metaprogramming is another major context for these extensions.  Despite the fact that the preprocessor is not an ideal preprocessor, it is still useful if it is used in a structured way.

One of the major problems with the preprocessor is the lack of a scoping mechanism.  That issue can be solved with a scoping mechanism as described above.

The second major problem with the preprocessor is that it doesn't respect the syntax and semantics of the core language.  This is a weakness, but it is also a strength in different contexts.  Manipulation of core language constructs, without the interference of syntax and semantics, is vital for portability as well as code generation with the preprocessor.

The Boost Preprocessor Library is a real-world, practical example of this type of programming.  Other libraries, in Boost in particular, rely heavily on its functionality.  The Boost Python Library and the Boost Metaprogramming Library [MPL] are working examples of the practicality of extensive use of the preprocessor as a code generation tool.  There are also many other users external to Boost itself.  In short, the precedent exists in the community and preprocessor metaprogramming is a valuable design tool.

The alternative to the preprocessor is to use an external tool.  This may be a reasonable choice for program developers, but it is not for library developers when user-customization and extensibility are involved.

Variadic macros and placemarkers make working with the macro expansion engine easier in general and allow certain idioms that are otherwise not possible or very obtuse.  [author's note:  I can provide many examples of this if necessary.]

Well-defined semantics for token-pasting is also a major concern.  Without this facility, preprocessor metaprogramming becomes significantly more complex.  Take C99's placemarker tokens as an example.  It is not currently possible to detect placemarker tokens in a general fashion.  It is possible to get close, but there is always some form of input that fouls up the mechanism.  With well-defined token-pasting semantics, this is possible with a very simple set of macros:

#define CAT(a, b) PRIMITIVE_CAT(a, b)
#define PRIMITIVE_CAT(a, b) a ## b

#define SPLIT(i, ...) PRIMITIVE_CAT(SPLIT_, i)(__VA_ARGS__)
#define SPLIT_0(a, ...) a
#define SPLIT_1(a, ...) __VA_ARGS__

#define IS_VARIADIC(...) \
    SPLIT(0, CAT(IS_VARIADIC_R_, IS_VARIADIC_C __VA_ARGS__)) \
    /**/
#define IS_VARIADIC_C(...) 1
#define IS_VARIADIC_R_1 1, *
#define IS_VARIADIC_R_IS_VARIADIC_C 0, *

#define IS_EMPTY(...) IS_EMPTY_I(__VA_ARGS__)
#define IS_EMPTY_I(...) \
    IS_VARIADIC(IS_EMPTY_ ## __VA_ARGS__ ## IS_EMPTY) \
    /**/
#define IS_EMPTY_IS_EMPTY ()

IS_EMPTY(abc) // 0
IS_EMPTY()    // 1
IS_EMPTY(++)  // 0 (currently undefined behavior)

This is only a small example of preprocessor metaprogramming, but it illustrates one of the many problems associated with the current undefined behavior.

Preprocessor metaprogramming, and perhaps more importantly non-preprocessor programming, would greatly benefit from a scoping mechanism.  It would encapsulate the names defined by code such as the above, yet still provide a means for users to access them.

Conclusion

The lack of these features will cause dialects in the language as well as propagate an unnecessary rift between C and C++.  There is at least one preprocessor that allows users to enable variadics and placemarkers in C++ and several others (that support C99) that are likely to allow this in the future (though this isn't a certainty).  Scoping mechanisms similar to that proposed above are already being implemented.  Token-pasting of unrelated tokens is already logically handled by nearly every preprocessor (if not all) in common use.  Consequently, if the standard does not provide these facilities, users will turn to tool-specific language extensions.

Author

Paul Mensonides - current author of the Boost Preprocessor Library

Supporters

Hartmut Kaiser - author of the Wave preprocessor