Boost logo

Boost :

Subject: Re: [boost] [preprocessor] check if a token is a keyword (was "BOOST_PP_IS_UNARY()")
From: Lorenzo Caminiti (lorcaminiti_at_[hidden])
Date: 2010-08-17 08:17:06


On Tue, Aug 17, 2010 at 2:12 AM, Paul Mensonides <pmenso57_at_[hidden]> wrote:
> On 8/16/2010 9:21 PM, Lorenzo Caminiti wrote:
>
>> Yes, I am aware of this "limitation". However, for my application it
>> is not a problem to limit the argument of `IS_PUBLIC()` to
>> pp-identifiers and pp-numbers with no decimal points (if interested,
>> see "MY APPLICATION" below).
>>
>> 1) Out of curiosity, is there a way to implement `IS_PUBLIC()`
>> (perhaps without using `BOOST_PP_CAT()`) so it does not have this
>> limitation? (I could not think of any.)
>
> The limitation is not BOOST_PP_CAT per se, but token-pasting in general.
>  The "good" part of using BOOST_PP_CAT in combination with
> BOOST_PP_IS_NULLARY, et al, is that they have been "hacked" together for
> preprocessors that are broken.  Effectively, the detection macros work by

Yes, I understand.

> manipulating the operational syntax of macro expansion.  For that to work,
> stuff has to happen (namely, macros being expanded) at roughly the correct
> time.  The basic problem with VC++, for example, is that they don't, so the
> pp-lib works overtime to attempt to _force_ expansions all over the library.

I got my pp-parsers to successfully work under both GCC and MSVC.
Especially on MSVC, I also had to do "hack" some of the macros to make
sure they expand when they are supposed to -- BTW, having a library
like Boost.Preprocessor has proven to be immensely useful.

>  Unfortunately, there is a limit to what can be forced--particularly with
> more advanced manipulations of the macro expansion process such as those
> used by Chaos where there is analogy to the uncertainty principle (e.g. you
> cannot force expansion in may contexts without changing the result = you
> cannot measure particle velocity and position at the same time).  Even with
> those types of manipulations, however, there is no way to do the above with
> "smashing the particles together and seeing what comes out."

That's an interesting analogy :) (I do have an engineering/physics background).

> The limitation is caused by the ridiculous limitation that token-pasting
> arbitrary tokens together where the result is not a single token results in
> undefined behavior.  Even to detect this scenario, the simplest
> implementation in a preprocessor is to simply juxtapose the characters
> making up the tokens and re-tokenize them.  If there is more than one, issue
> diagnostic, otherwise insert the single token.  A better definition would be
> simply to insert the resulting sequence of tokens.
>
>> 2) Also, does the expansion of any of the following result in
>> undefined behavior? (I don't think so...)
>>
>>     IS_PUBLIC(public abc)            // Expand to 1.
>>     IS_PUBLIC(public::)                // Expand to 1.
>>     IS_PUBLIC(public(abc, ::))       // Expand to 1.
>>     IS_PUBLIC(public (abc) (yxz))  // Expand to 1.
>>
>> (My application relies on some of these expansions to work.)
>
> All of those look fine.  Basically, what happens in the following
>
> #define M(a) id ## a
>
> The appearance of the formal parameter 'a' adjacent to the token-pasting
> operator affects _which_ actual parameter is substituted.  Namely, the
> version of the actual parameter which has _not_ had macros replaced in it.
>  However, the token-pasting operation doesn't occur until after that
> substitution, and its operands are only the two _tokens_ immediately
> adjacent to it.  E.g.
>
> #define A() 123
> #define B(x) x id ## x
>
> B(A())
> => 123 id ## A()
> => 123 idA()

OK, now I understand much better how my `IS_PUBLIC()` macro actually
works -- thanks a lot!

> I.e. the token-pasting operator affects the expansion of the actual
> parameter (at least in that substitution context), but its operands are only
> the tokens on either side after that substitution.
>
> Because of that, you're basically getting:
>
> PREFIX_ ## public abc
> PREFIX_ ## public ::
> PREFIX_ ## public ( abc , :: )
> PREFIX_ ## public ( abc ) ( yxz )
>
> ...all of which are okay.
>
>> MY APPLICATION
>>
>> I am using `IS_PUBLIC()` and similar macros to program the
>> preprocessor to *parse* a Boost.Preprocessor sequence of tokens that
>> represents a function signature. For example:
>>
>>     class c {
>>         public: void f(int x) const; // Usual function declaration.
>>     };
>>
>>     class c {
>>         PARSE_FUNCTION_DECL( // Equivalent declaration using pp-sequences.
>>         (public) (void) (f)( (int)(x) ) (const)
>>         );
>>     };
>
> What happens with stuff like pointers, or does that not matter for your
> application?  E.g. (public) (void) (f)( (int*)(x) ) (const) ?

My library does not need to detect pointers at the preprocessor
metaprogramming level. I can wait until using the compiler at the
template metaprogramming level to detect and handle pointers (using
Boost.MPL, Boost.TypeTraits, etc). So my pp-parser macros simply have
to expand:

    IS_PUBLIC(int*) // Expand to 0.
    IS_INT(int*) // Expand to 1.

Where I never use the last expansion because I use template
metaprogramming to detect and manipulate types. Similarly for
references, etc.

(There is actually one exception to this for functions returning
`void*` because my pp-parser macro need to detect functions returning
`void`. I have implemented a workaround for this case allowing a
special syntax within the signature sequence... but that is _very_
specific to my application.)

>> The parser macro above can say "the signature sequence starts with
>> `public` so this is a member function" at a preprocessor
>> metaprogramming level and then expand to special code as a library
>> might need to handle member functions. The parser macros can even do
>> some basic syntax error checking -- for example, if `(const)` is
>> specified as cv-qualifier at the end of the signature sequence of a
>> non-member function, the parser macro can check that and expand to a
>> compile-time error like `SYNTAX_ERROR_unexpected_cv_qualifier` (using
>> `BOOST_MPL_ASSERT_MSG()`).
>>
>> Most of the tokens within C++ funciton signatures are composed of
>> pp-idenfitiers such as the words `public`, `void`, `f`, etc. There are
>> some exceptions like `,` to separate funciton parameters, `<`/`>` for
>> templates, `:` for constructors' member initializers, etc. The grammar
>> of my preprocessor parser macros requires the use of different tokens
>> in these cases. For example, parenthesis `(`/`)` are used for
>> templates instead of `<`/`>`:
>>
>>     template<  typename T>  f(T x); // Usual.
>>
>>     PARSE_FUNCTION_DECL( // PP-sequence.
>>     (template)( (typename)(T) ) (f)( (T)(x) )
>>     );
>>
>> (Instead of `(template)(<) (typename) (T) (>) (f)( (T)(x) )` which
>> will have caused the parser macro to fail when inspecting `(<)` via
>> one of the `IS_XXX()` macros as per the limitation from using
>> `BOOST_PP_CAT()` mentioned above.)
>>
>> The grammar of my preprocessor parser macros clearly documents that
>> only pp-identifiers can be passed as tokens of the function signature
>> sequence. Therefore, the "limitation" of `IS_PUBLIC()` indicated above
>> is not a problem for my application.
>>
>>
>> Thank you very much.
>
> You're welcome.  I don't know the ultimate purpose of this encoding, but the

This encoding, which I am calling "parenthesized syntax" (given the
ridiculous amount of parenthesis that it requires :) ) is used by my
library under construction "Boost.Contract"
https://sourceforge.net/projects/dbcpp/ to implement contract
programming for C++ as specified by N1962. For example:

template<typename T>
class myvector {
public:
    CONTRACT_FUNCTION(
    (public) (void) (push_back)( (const T&)(element) ) (copyable)
        (precondition)(
            (size() < max_size())
            // More preconditions here...
        )
        (postcondition)(
            (size() == (CONTRACT_OLDOF(this)->size() + 1))
            // More postconditions here...
        )
    ({
        ... // Original implementation.
    }) )
    ...
};

Note how I can define new "keywords" like `precondition`,
`postcondition`, `copyable`, etc; program `IS_XXX()` macros for those;
and use the pp-parser macros to parse them and expand to code that
checks these assertions at the right time during execution.

I have also extended the parenthesized syntax to support concepts
(interfacing with Boost.ConceptCheck) and named parameters
(interfacing with Boost.Parameter). The idea being that contracts +
concepts + named parameters fully specify the interface requirements.
An example of concepts + contracts:

CONTRACT_FUNCTION(
(template)( (typename)(T) )
    (requires)(
        (boost::CopyConstructible<T>)
        (boost::Assignable<T>)
        (Addable<T>)
    )
(T) (sum)( (T*)(array) (int)(n) (T)(result) )
    (precondition)( (array) (n > 0) )
({
    ... // Original implementation.
}) )

> encoding itself doesn't look too bad.

In my experience, the parenthesized syntax is OK for this application
-- it's not terrible but it's not great either... My programmer's life
would be better without this syntax but worst without contracts :)

However, using the preprocessor to parse and generate every function
declaration (with a contract) slows down compilation quite a bit... I
think I can optimize the code of my macros and the way I am using
Boost.Preprocessor but I am still finishing up the implementation and
I am leaving optimizations for later. BTW, for this optimization it
would be useful to assess the computational complexity (maybe in terms
of "number of macro expansions"?) of the Boost.Preprocessor macros --
how can I do that?

Thank you.

-- 
Lorenzo

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk