Boost logo

Boost :

Subject: Re: [boost] Review Request: Variadic Macro Data library
From: Paul Mensonides (pmenso57_at_[hidden])
Date: 2011-02-21 14:37:33


On Mon, 21 Feb 2011 12:57:05 -0500, Edward Diener wrote:

> On 2/21/2011 3:57 AM, Paul Mensonides wrote:

>> Another way to provide comfort is via education. Hardcore pp-
>> metaprogramming knowledge is not required for this.
>
> Providing function-like syntax for invoking a macro with a variable
> number of parameters, as an alternative to pp-lib data syntax, is
> important to end-users and library developers if just for the sake of
> familiarity and regularity. A programmer using a "call" syntax which may
> be a macro or a function is not going to stop and say: this is a
> function so I can call it as 'somefunction(a,b,c)', this is a macro and
> therefore I must call it as 'somemacro((a,b,c))'. Instead he will ask
> that the same syntax be applied to both. You seem to feel this is wrong
> and that someone invoking a macro should realize that it is a macro (
> and normally does because it is capital letters ) and therefore be
> prepared to use a different syntax, but I think that regularity in this
> respect is to be valued.

Most C/C++ developers perceive macro expansion mechanics to be similar to
function call mechanics. I.e. where a user "calls" a macro A, and that
macro "calls" the macro B, the macro B "returns" something, which is, in
turn "returned" by A. That is fundamentally *not* how macro expansion
behaves. The perceived similarity, where there is none (going all the
way back to way before preprocessor metaprogramming) is how developers
have gotten into so much trouble on account of macros.

I take serious issue with anything that intentionally perpetuates this
mentality. It is one thing if the syntax required is the same by
coincidence. It's another thing altogether when something is done to
intentionally make it so.

>> ----
>>
>> BOOST_VMD_DATA_ELEM(n, ...)
>>
>> The direct analog of this in Chaos is CHAOS_PP_VARIADIC_ELEM(n, ...).
>> If I (or you) added it to the pp-lib, I would prefer it be called
>> BOOST_PP_VARIADIC_SIZE(...).
>
> Did you mean BOOST_PP_VARIADIC_ELEM(n,...) ?

Yes, sorry!

>> ----
>>
>> BOOST_VMD_DATA_TO_PP_TUPLE(...)
>>
>> Chaos has no direct analog of this because (as you know), it's
>> pointless (unless you're doing something internally to for compiler
>> workarounds).
>
> I do not think it is pointless. I am going variadics -> tuple. The
> end-user wants regularity, even if its a no-brainer to write '(
> __VA_ARGS__ )'.
>
> I think this is where we may differ, not technically, but in our view of
> what should be presented to an end-user. I really value regularity ( and
> orthogonality ) even when something is trivial. My view is generally
> that if it costs the library developer little and makes the end-user see
> a design as "regular", it is worth implementing as long as it is part of
> the design even if it is utterly trivial. You may feel I am cosseting an
> end-user, and perhaps you are right. But as an end-user myself I
> generally want things as regular as possible even if it causes some very
> small extra compile time.

It isn't the end of the world to provide it for the sake of symmetry.

>> The primary reason that there is no direct conversion (other than
>> (__VA_ARGS__)) to the other data types is that with variadic data there
>> is no such thing as an empty sequence and there is no safe way to
>> define a rogue value for it without artificially limiting what the data
>> can contain.
>
> I understand this.
>
> My point of view is that although it's theoretically a user error to use
> an empty sequence for a variadic macro even if a corresponding pp-lib
> data type can be empty, in reality it should be allowed since there is
> no way to detect it. I understand your point of view that you want to do
> everything possible to eliminate user error, and I agree with it, but
> sometimes nothing one can do in a language is going to work ( as you
> have pointed out with variadics and an empty parameter ).

That is not what I'm referring to. To clarity, using the STL as an
example, a typical algorithm processes a finite sequence by progressively
closing a range of iterators [i,j). Effectively, this iterator range is
a "view" of a sequence of elements. (The underlying data structure also
has its own natural view.) However, with the preprocessor, there is no
indirection and there is no imperative functionality (i.e. assignment).
Because of that, you cannot form views (in whatever form). Instead, you
have to embed the entire data structure.

At that point, you can do one of two things. Either use the data
structure itself as your "view" and progressively make it smaller as you
"iterate" or you can embed the data structure into another data structure
which let's you add an arbitrary "terminal" state--the equivalent of the
iterator range [j,j). For variadic content as a sequence, you cannot
directly use the variadic content as the view because you cannot encode
this terminal state. Instead, you'd have to go with option two, but then
you pay the price for it elsewhere.

> Going the other way from a pp data type to variadics, I admit I did not
> consider the case in my current library where the pp data type is empty.
> Since this can be detected on the pp data type side, I think I can put
> out a BOOST_PP_ASSERT_MSG(cond, msg) in that case, or I can choose to
> ignore it if that is what I decide to do. But in either case it is a
> detectable problem.

For the most part, however, these macros already exist. They are named
(e.g.) BOOST_PP_SEQ_ENUM. However, some do not exist such as
BOOST_PP_ARRAY_ENUM, and others have different naming conventions such as
BOOST_PP_TUPLE_REM_CTOR. For the sake of symmetry, BOOST_PP_ARRAY_ENUM
and BOOST_PP_TUPLE_ENUM could be added. However, having them use the ENUM
nomenclature is preferable for several reasons. First, because it
expresses a distinction between a comma-separated list of arguments
(variadic content) and a comma-separated list of elements which provides
a definition that avoids the zero-element vs. single-empty-element
problem. Second, is that if something like BOOST_PP_SEQ_ENUM is used to
attempt to create a comma-separated list of _arguments_, the user is in
for a world of hurt trying to write portable code.

>> BOOST_VMD_PP_TUPLE_ELEM(n, tuple)
>>
>> The direct analog of this in Chaos is CHAOS_PP_TUPLE_ELEM(n, tuple)
>> where it ignores the 'n' if variadics are enabled.
>
> The 'n' tells which tuple element to return. How can it be ignored ?
>
>
>> CHAOS_PP_TUPLE_ELEM(?, 0, (a, b, c)) => a
>
> OK, I see. the '?' is just a marker for the size of the tuple when there
> are variadics and must be specified when there are not.

Sorry, I was mixing up the arguments. Without variadics, you must have a
size. With variadics, you don't need it, so you can leave it there for
compatibility with the non-variadic scenario and ignore it and
additionally provide an "overload" that doesn't have it at all. Except
for compiler workarounds (which I'm sure you know how to solve in this
case), detecting the difference between two and three arguments (where it
must be either 2 or 3 arguments) is simple:

#if VARIADICS

#define TUPLE_ELEM(...) \
    CAT( \
        TUPLE_ELEM_, \
        TEST_23(__VA_ARGS__, 3, 2,) \
    )(__VA_ARGS__) \
    /**/

#define TEST_23(_1, _2, _3, n, ...) n

#define TUPLE_ELEM_2(n, tuple) // ...
#define TUPLE_ELEM_3(size, n, tuple) TUPLE_ELEM_2(n, tuple)

#else

#define TUPLE_ELEM(size, n, tuple) // ...

#endif

This is a very fast dispatch.

> That is an interesting technique below. Bravo ! As long as it is
> documented, since it confused me until I took a few more looks and
> realized what you are doing.

It is just a dispatcher to emulate overloading on number of arguments.
You'd actually do something that makes the dispatch as fast as possible
(as above) which is easy with a small set of possibilities (like 1|2 or 2|
3).

>> BOOST_VMD_PP_{ARRAY,LIST,SEQ}_TO_DATA(ds)
>>
>> These I don't see the point of these--particularly with the pp-lib
>> because of compiler issues. These already exist as BOOST_PP_LIST_ENUM
>> and SEQ_ENUM. There isn't an ARRAY_ENUM currently, but it's easy to
>> implement. The distinction between these names and the *_TO_DATA
>> variety is that these are primary output macros.
>
> That's just a name. Names can always be changed. My macros are the same
> as yours in that they are output macros which convert the pp-lib data
> types to variadic data.

The primary distinction is the perspective induced by the names. If
user's attempt to use these macros to produce macro argument lists are in
for portability problems. Particularly:

#define REM(...) __VA_ARGS__

#define A(im) B(im) // im stands for "intermediate"
                    // (chaos-pp nomenclature)
#define B(x, y) x + y

A(REM(1, 2)) // should work, most likely won't on many preprocessors

My understanding is that you want to take a list of macro arguments and
convert it to something that can be processed as a sequential data
structure. That's one concept. Converting from a sequential data
structure to a list of comma-separated values is another concept. But
converting from a sequential data structure to a list of macro arguments
is another concept altogether--one that is fraught with portability
issues that cannot be encapsulated by the library.

>> If an attempt is made by a user
>> to use the result as macro arguments, all of the issues with compilers
>> (e.g. VC++) will be pulled into the user's domain.
>
> The returned variadic data is no different from what the user will enter
> himself when invoking a variadic data macro. The only compiler issue I
> see is the empty variadic data one.

I think the difference is conceptual. A list of comma-separated things
(like function parameters, structure initializers, etc.) is conceptually
different that a list of macro arguments. The going back to a list of
arguments part is where things go wrong.

>> BOOST_VMD_DATA_TO_PP_TUPLE(...)
>> -> (nothing, unless workarounds are necessary)
>
> I know its trivial but I still think it should exist.

It is quite possible that workarounds need to be applied anyway to (e.g.)
force VC++ to "let go" of the variadic arguments as a single entity.

>> BOOST_VMD_DATA_TO_PP_ARRAY(...)
>> -> BOOST_PP_TUPLE_TO_ARRAY((...))
>> or BOOST_PP_TUPLE_TO_ARRAY(size, (...))
>>
>> BOOST_VMD_DATA_TO_PP_LIST(...)
>> -> BOOST_PP_TUPLE_TO_LIST((...))
>> or BOOST_PP_TUPLE_TO_LIST(size, (...))
>>
>> BOOST_VMD_DATA_TO_PP_SEQ(...)
>> -> BOOST_PP_TUPLE_TO_SEQ((...))
>> or BOOST_PP_TUPLE_TO_SEQ(size, (...))
>
> For the previous three, see above discussion about using
> SOME_MACRO(a,b,c) vs. SOME_MACRO((a,b,c)). I do understand your reason
> for this as a means of getting around the empty-variadics user error.

It isn't that. I don't like interface bloat. That's like not being able
to decide on size() versus length() so providing both.

If the use case is something like what you mentioned before:

#define MOC(...) /* yes, that's you, Qt */ \
    GENERATE_MOC_DATA(TUPLE_TO_SEQ((__VA_ARGS__))) \
    /**/

Then why does the TUPLE_TO_SEQ((__VA_ARGS__)) part matter to the
developer who invokes MOC?

> But I still feel that treating variadic data here as tuples is wrong
> from the end-user point of view even though it elegantly solves the
> empty variadic data problem. In my internal code I am solving the
> problem in the exact same way, but I am keeping the syntax as
> SOME_MACRO(a,b,c) as opposed to SOME_MACRO((a,b,c)).
>
> So I would say, please consider using the SOME_MACRO(a,b,c) instead as I
> am doing.
>
> I would even say to change my names to:
>
> BOOST_PP_ENUM_TUPLE(...)
> BOOST_PP_ENUM_ARRAY(...)
> BOOST_PP_ENUM_LIST(...)
> BOOST_PP_ENUM_SEQ(...)

I'm not terribly opposed to just BOOST_PP_TO_TUPLE(...), etc..

#define BOOST_PP_TO_TUPLE(...) (__VA_ARGS__)
#define BOOST_PP_TO_ARRAY(...) \
    (BOOST_PP_VARIADIC_SIZE(__VA_ARGS__), BOOST_PP_TO_TUPLE(__VA_ARGS__) \
    /**/
    // BTW, an "array" is a pointless data structure
    // when you have variadics, but whatever
#define BOOST_PP_TO_LIST(...) \
    BOOST_PP_TUPLE_TO_LIST((__VA_ARGS__)) \
    /**/
#define BOOST_PP_TO_SEQ(...) \
    BOOST_PP_TUPLE_TO_SEQ((__VA_ARGS__)) \
    /**/

I'm a lot more opposed to going back from a proper data structure to an
"argument list".

> Again I value the orthogonality of the pp-data to variadic data idea in
> common names. BOOST_PP_TUPLE_REM_CTOR does not suggest that to the
> end-user. How about:
>
> #define BOOST_PP_TUPLE_ENUM(tuple) \
> BOOST_PP_TUPLE_REM_CTOR(tuple)
>
> in order to mimic your three following names.

Sure, but with a better definition:

#define BOOST_PP_TUPLE_ENUM BOOST_PP_TUPLE_REM_CTOR

>> BOOST_VMD_PP_ARRAY_TO_DATA(array)
>> -> BOOST_PP_ARRAY_ENUM(array)
>>
>> BOOST_VMD_PP_LIST_TO_DATA(list)
>> -> BOOST_PP_LIST_ENUM(list)
>>
>> BOOST_VMD_PP_SEQ_TO_DATA(seq)
>> -> BOOST_PP_SEQ_ENUM(seq)
>>
>> also add:
>> BOOST_PP_REM, BOOST_PP_EAT
>
> OK.

These latter two (REM and EAT) having nothing to do with data structures
per se, but they are extremely useful macros.

>> The basic gist is to add the low-level variadic stuff and adapt the
>> existing tuple stuff to not require the size.
>
> I think our only real disagreements can be summed up as:
>
> I want the end-user to view variadic data as such from a perceptual
> point of view, even with the
> empty-variadics-is-an-error-which-can-not-be-caught problem. That is why
> I supply the various conversions from variadic sequences to pp-lib types
> and back explicitly, and I want some regularity in names reflecting that
> although I do not insist on my own names.
>
> You feel that variadics as input for conversion should in general be
> treated as a pp-lib tuple since creating a tuple from variadic macro
> data is trivial.

I don't like interface bloat, but if it is minor, it isn't the end of the
world.

The one thing that I really don't like is the blending of what I consider
two different concepts: output and return value (even though I'm going
against my own diatribe about macros != functions above by calling it
"return value").

Going from a data structure to a list of comma-separated values (like
enumerators, function arguments, whatever) is output and is reflected by
the name ENUM. Going from a data structure to a list of comma-separated
macro arguments is return value (for input into other macros as disparate
arguments). This latter use scenario is fraught with portability
problems on the user end, and not necessarily ones that immediately show
up.

Regards,
Paul Mensonides


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk