|
Boost : |
Subject: Re: [boost] [preprocessor] check if a token is a keyword (was "BOOST_PP_IS_UNARY()")
From: Lorenzo Caminiti (lorcaminiti_at_[hidden])
Date: 2010-08-17 08:17:06
On Tue, Aug 17, 2010 at 2:12 AM, Paul Mensonides <pmenso57_at_[hidden]> wrote:
> On 8/16/2010 9:21 PM, Lorenzo Caminiti wrote:
>
>> Yes, I am aware of this "limitation". However, for my application it
>> is not a problem to limit the argument of `IS_PUBLIC()` to
>> pp-identifiers and pp-numbers with no decimal points (if interested,
>> see "MY APPLICATION" below).
>>
>> 1) Out of curiosity, is there a way to implement `IS_PUBLIC()`
>> (perhaps without using `BOOST_PP_CAT()`) so it does not have this
>> limitation? (I could not think of any.)
>
> The limitation is not BOOST_PP_CAT per se, but token-pasting in general.
> The "good" part of using BOOST_PP_CAT in combination with
> BOOST_PP_IS_NULLARY, et al, is that they have been "hacked" together for
> preprocessors that are broken. Effectively, the detection macros work by
Yes, I understand.
> manipulating the operational syntax of macro expansion. For that to work,
> stuff has to happen (namely, macros being expanded) at roughly the correct
> time. The basic problem with VC++, for example, is that they don't, so the
> pp-lib works overtime to attempt to _force_ expansions all over the library.
I got my pp-parsers to successfully work under both GCC and MSVC.
Especially on MSVC, I also had to do "hack" some of the macros to make
sure they expand when they are supposed to -- BTW, having a library
like Boost.Preprocessor has proven to be immensely useful.
> Unfortunately, there is a limit to what can be forced--particularly with
> more advanced manipulations of the macro expansion process such as those
> used by Chaos where there is analogy to the uncertainty principle (e.g. you
> cannot force expansion in may contexts without changing the result = you
> cannot measure particle velocity and position at the same time). Even with
> those types of manipulations, however, there is no way to do the above with
> "smashing the particles together and seeing what comes out."
That's an interesting analogy :) (I do have an engineering/physics background).
> The limitation is caused by the ridiculous limitation that token-pasting
> arbitrary tokens together where the result is not a single token results in
> undefined behavior. Even to detect this scenario, the simplest
> implementation in a preprocessor is to simply juxtapose the characters
> making up the tokens and re-tokenize them. If there is more than one, issue
> diagnostic, otherwise insert the single token. A better definition would be
> simply to insert the resulting sequence of tokens.
>
>> 2) Also, does the expansion of any of the following result in
>> undefined behavior? (I don't think so...)
>>
>> IS_PUBLIC(public abc) // Expand to 1.
>> IS_PUBLIC(public::) // Expand to 1.
>> IS_PUBLIC(public(abc, ::)) // Expand to 1.
>> IS_PUBLIC(public (abc) (yxz)) // Expand to 1.
>>
>> (My application relies on some of these expansions to work.)
>
> All of those look fine. Basically, what happens in the following
>
> #define M(a) id ## a
>
> The appearance of the formal parameter 'a' adjacent to the token-pasting
> operator affects _which_ actual parameter is substituted. Namely, the
> version of the actual parameter which has _not_ had macros replaced in it.
> However, the token-pasting operation doesn't occur until after that
> substitution, and its operands are only the two _tokens_ immediately
> adjacent to it. E.g.
>
> #define A() 123
> #define B(x) x id ## x
>
> B(A())
> => 123 id ## A()
> => 123 idA()
OK, now I understand much better how my `IS_PUBLIC()` macro actually
works -- thanks a lot!
> I.e. the token-pasting operator affects the expansion of the actual
> parameter (at least in that substitution context), but its operands are only
> the tokens on either side after that substitution.
>
> Because of that, you're basically getting:
>
> PREFIX_ ## public abc
> PREFIX_ ## public ::
> PREFIX_ ## public ( abc , :: )
> PREFIX_ ## public ( abc ) ( yxz )
>
> ...all of which are okay.
>
>> MY APPLICATION
>>
>> I am using `IS_PUBLIC()` and similar macros to program the
>> preprocessor to *parse* a Boost.Preprocessor sequence of tokens that
>> represents a function signature. For example:
>>
>> class c {
>> public: void f(int x) const; // Usual function declaration.
>> };
>>
>> class c {
>> PARSE_FUNCTION_DECL( // Equivalent declaration using pp-sequences.
>> (public) (void) (f)( (int)(x) ) (const)
>> );
>> };
>
> What happens with stuff like pointers, or does that not matter for your
> application? E.g. (public) (void) (f)( (int*)(x) ) (const) ?
My library does not need to detect pointers at the preprocessor
metaprogramming level. I can wait until using the compiler at the
template metaprogramming level to detect and handle pointers (using
Boost.MPL, Boost.TypeTraits, etc). So my pp-parser macros simply have
to expand:
IS_PUBLIC(int*) // Expand to 0.
IS_INT(int*) // Expand to 1.
Where I never use the last expansion because I use template
metaprogramming to detect and manipulate types. Similarly for
references, etc.
(There is actually one exception to this for functions returning
`void*` because my pp-parser macro need to detect functions returning
`void`. I have implemented a workaround for this case allowing a
special syntax within the signature sequence... but that is _very_
specific to my application.)
>> The parser macro above can say "the signature sequence starts with
>> `public` so this is a member function" at a preprocessor
>> metaprogramming level and then expand to special code as a library
>> might need to handle member functions. The parser macros can even do
>> some basic syntax error checking -- for example, if `(const)` is
>> specified as cv-qualifier at the end of the signature sequence of a
>> non-member function, the parser macro can check that and expand to a
>> compile-time error like `SYNTAX_ERROR_unexpected_cv_qualifier` (using
>> `BOOST_MPL_ASSERT_MSG()`).
>>
>> Most of the tokens within C++ funciton signatures are composed of
>> pp-idenfitiers such as the words `public`, `void`, `f`, etc. There are
>> some exceptions like `,` to separate funciton parameters, `<`/`>` for
>> templates, `:` for constructors' member initializers, etc. The grammar
>> of my preprocessor parser macros requires the use of different tokens
>> in these cases. For example, parenthesis `(`/`)` are used for
>> templates instead of `<`/`>`:
>>
>> template< typename T> f(T x); // Usual.
>>
>> PARSE_FUNCTION_DECL( // PP-sequence.
>> (template)( (typename)(T) ) (f)( (T)(x) )
>> );
>>
>> (Instead of `(template)(<) (typename) (T) (>) (f)( (T)(x) )` which
>> will have caused the parser macro to fail when inspecting `(<)` via
>> one of the `IS_XXX()` macros as per the limitation from using
>> `BOOST_PP_CAT()` mentioned above.)
>>
>> The grammar of my preprocessor parser macros clearly documents that
>> only pp-identifiers can be passed as tokens of the function signature
>> sequence. Therefore, the "limitation" of `IS_PUBLIC()` indicated above
>> is not a problem for my application.
>>
>>
>> Thank you very much.
>
> You're welcome. I don't know the ultimate purpose of this encoding, but the
This encoding, which I am calling "parenthesized syntax" (given the
ridiculous amount of parenthesis that it requires :) ) is used by my
library under construction "Boost.Contract"
https://sourceforge.net/projects/dbcpp/ to implement contract
programming for C++ as specified by N1962. For example:
template<typename T>
class myvector {
public:
CONTRACT_FUNCTION(
(public) (void) (push_back)( (const T&)(element) ) (copyable)
(precondition)(
(size() < max_size())
// More preconditions here...
)
(postcondition)(
(size() == (CONTRACT_OLDOF(this)->size() + 1))
// More postconditions here...
)
({
... // Original implementation.
}) )
...
};
Note how I can define new "keywords" like `precondition`,
`postcondition`, `copyable`, etc; program `IS_XXX()` macros for those;
and use the pp-parser macros to parse them and expand to code that
checks these assertions at the right time during execution.
I have also extended the parenthesized syntax to support concepts
(interfacing with Boost.ConceptCheck) and named parameters
(interfacing with Boost.Parameter). The idea being that contracts +
concepts + named parameters fully specify the interface requirements.
An example of concepts + contracts:
CONTRACT_FUNCTION(
(template)( (typename)(T) )
(requires)(
(boost::CopyConstructible<T>)
(boost::Assignable<T>)
(Addable<T>)
)
(T) (sum)( (T*)(array) (int)(n) (T)(result) )
(precondition)( (array) (n > 0) )
({
... // Original implementation.
}) )
> encoding itself doesn't look too bad.
In my experience, the parenthesized syntax is OK for this application
-- it's not terrible but it's not great either... My programmer's life
would be better without this syntax but worst without contracts :)
However, using the preprocessor to parse and generate every function
declaration (with a contract) slows down compilation quite a bit... I
think I can optimize the code of my macros and the way I am using
Boost.Preprocessor but I am still finishing up the implementation and
I am leaving optimizations for later. BTW, for this optimization it
would be useful to assess the computational complexity (maybe in terms
of "number of macro expansions"?) of the Boost.Preprocessor macros --
how can I do that?
Thank you.
-- Lorenzo
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk