Boost logo

Boost :

Subject: Re: [boost] [vmd] Library Review
From: Edward Diener (eldiener_at_[hidden])
Date: 2014-09-13 20:59:41


On 9/13/2014 7:05 PM, Paul Mensonides wrote:
> On 9/13/2014 1:51 PM, Edward Diener wrote:
>> On 9/13/2014 9:44 AM, Paul Mensonides wrote:
>
>>> Perhaps Edward can enlighten me about the use-cases for which this is
>>> ideal. I do see some use-cases, but in most cases it requires dispatch
>>> based on what the particular "v-identifier" it is anyway.
>>
>> The use case is to be able to determine if some preprocessor input
>> matches a v-identifier and then to logically work off of that decision.
>> The idea is to parse some input and, for instance, if the input is X,
>> create output A, and if the input is Y, create output B etc. etc., where
>> X and Y are v-identifiers.
>>
>> In any given situation since the v-keys are being
>> supplied, the end-user knows that if 1 is returned identifier X has been
>> matched and if 2 is returned identifier Y has been matched, and of
>> course if 0 is returned no identifier match has been found. This enables
>> the end-user to dispatch on the result and to know which identifier has
>> been found. Admittedly it does not extract the identifier itself as
>> output from the macro.
>
> What I meant WRT dispatching directly off of the v-identifier something
> along the lines of extracting the identifier and then using it to form
> the suffix of another macro. I.e. a conditional without going through
> an intermediate integer.

I understood that once you mentioned it.

> However, there are still cases where something
> like
>
> #define COMPARE_ABC_ABC ()
>
> is useful

I understand this. You concatenate your result to 'COMPARE_ABC_' and if
it expands to an empty single element tuple you know your result is
'ABC'. Ditto with any identifier 'XXX' for which you have
'#define COMPARE_XXX_XXX ()' macros.

> and direct comparison can be implemented with something like:
>
> #include <chaos/preprocessor.h>
>
> #define IDENTIFIER_CIRCLE (CIRCLE),
> #define IDENTIFIER_SQUARE (SQUARE),
> #define IDENTIFIER_TRIANGLE (TRIANGLE),
> #define IDENTIFIER_RECTANGLE (RECTANGLE),
>
> #define COMPARE_CIRCLE_CIRCLE ()
> #define COMPARE_RECTANGLE_RECTANGLE ()
> #define COMPARE_TRIANGLE_TRIANGLE ()
> #define COMPARE_RECTANGLE_RECTANGLE ()
>
> #define IS_IDENTIFIER(...) \
> CHAOS_PP_QUICK_OVERLOAD(IS_IDENTIFIER_, __VA_ARGS__)(__VA_ARGS__) \
> /**/
> #define IS_IDENTIFIER_1(vseq) \
> CHAOS_PP_BITAND \
> (CHAOS_PP_COMPL(CHAOS_PP_IS_VARIADIC(vseq))) \
> (IS_IDENTIFIER_1A(vseq)) \
> /**/
> #define IS_IDENTIFIER_1A(vseq) \
> CHAOS_PP_BITAND \
> (CHAOS_PP_IS_VARIADIC(IDENTIFIER_ ## vseq)) \
> (CHAOS_PP_IS_EMPTY_NON_FUNCTION( \
> CHAOS_PP_SPLIT(1, IDENTIFIER_ ## vseq) \
> )) \
> /**/
> #define IS_IDENTIFIER_2(vseq, tuple) \
> IS_IDENTIFIER_3(CHAOS_PP_STATE(), vseq, tuple) \
> /**/
> #define IS_IDENTIFIER_3(s, vseq, tuple) \
> CHAOS_PP_BITAND \
> (IS_IDENTIFIER_1(vseq)) \
> (CHAOS_PP_VARIADIC_ELEM(0, \
> CHAOS_PP_EXPR_S(s)(CHAOS_PP_TUPLE_FOLD_LEFT_S( \
> s, IS_IDENTIFIER_3A, tuple, 0, \
> CHAOS_PP_REM_CTOR( \
> CHAOS_PP_SPLIT(0, IDENTIFIER_ ## vseq) \
> ) \
> )) \
> )) \
> /**/
> #define IS_IDENTIFIER_3A(s, e, b, id) \
> CHAOS_PP_BITOR \
> (b) \
> (CHAOS_PP_IS_VARIADIC(COMPARE_ ## e ## _ ## id)), id \
> /**/
>
> IS_IDENTIFIER(RECTANGLE, (CIRCLE, SQUARE, RECTANGLE))
>
> Note this isn't finding the particular index, but it could be made to do
> so.
>
>>> One could do better for this low-level part with registrations such as
>>> (without a BOOST_VMD_ prefix for brevity)
>>>
>>> #define IDENTIFIER_CIRCLE (CIRCLE),
>>> #define IDENTIFIER_SQUARE (SQUARE),
>>> #define IDENTIFIER_TRIANGLE (TRIANGLE),
>>> #define IDENTIFIER_RECTANGLE (RECTANGLE),
>>
>> Is the ending comma there for a reason ?
>
> Yes, though I think you already figured it out. The parentheses are
> there so that you can detect that your prefix token-pasting did
> something, the "v-identifier" is there so you can extract it, and the
> comma is there to separate the "v-identifier" from the rest of a
> "v-sequence".

When I saw what the comma did that light in my brain went on <g>.

>
>>> These definitions don't allow one to directly test for specific
>>> "v-identifiers", but they do allow one to extract registered
>>> "v-identifiers" from a "v-sequence".
>>>
>>> If all such registrations are required to be the same, there is no need
>>> for separate per-library prefixes. Macros can be defined multiple times
>>> if they are defined the same way.
>>>
>>> Similarly, the "v-numbers" could be pre-registered by the library as
>>>
>>> #define NUMBER_0 (0),
>>> #define NUMBER_1 (1),
>>> #define NUMBER_2 (2),
>>> // ...
>>
>> Ditto about the ending commas.
>
> As above, only difference is prefix allowing a type-dispatch as opposed
> to a value-dispatch.
>
>>> Tests for specific "v-identifiers" can be done via extraction from the
>>> above and using macro definitions such as
>>>
>>> #define COMPARE_CIRCLE_CIRCLE ()
>>> #define COMPARE_SQUARE_SQUARE ()
>>> // ...
>>
>> Sure.
>>
>>> In many cases this sort of comparison is not needed, as one would
>>> directly dispatch off of the "v-identifier" itself.
>>>
>>
>> I appreciate your technique of matching a v-identifier/v-number by
>> returning a registered identifier/number and then dispatching off of the
>> identifier itself. It appears to make parsing a v-sequence much easier,
>> but I need to consider this further.
>
> One of the big things about it (IMO) is to make the registrations
> reusable. If something is defined the same, it can be defined more than
> once. All C/C++ keywords can also be registered by the library.

Understood. I agree this is a nice feature.

My own key-value v-identifier implementation was to try to make things
as simple as possible for the end-user. I now realize that if I keep it,
possibly along with your own methodology, as a primary or alternative
implementation of finding a v-identifier, it would have been better to
have the key-value macros expand to a single comma (',') rather than
expand to emptiness. Then the whole complication of parsing a single
input sequence for an identifier will be much easier. I really
appreciate your alerting me to the 'comma' technique. Substituting a
comma into an input sequence is easy to detect and will tell me if I
have matched an identifier or a number wthout having to recursively
parse the rest of the v-sequence or have to know what type of input
comes next. That is a wonderful improvement to VMD's overly complicated
v-sequence parsing macros, and I thank you very much for pointing it out.

Do realize that your own v-identifier technique requires dispatching off
of the v-identifier itself using concatenation, a technique some macro
programmers may not know as well as dispatching off of a Boost PP number
using BOOST_PP_EQUAL. I agree that dispatching off of the identifier
itself is more elegant and a bit faster as far as the preprocessor is
concerned.

>
>>> In any case, one could "parse" a "v-sequence" into "v-types" without
>>> many of the limitations of the current design. The following example
>>> uses Chaos, but I do not think there is anything in the design that
>>> fundamentally cannot be made to work on (e.g.) VC++.
>>
>> This is a bit unfair. You are asking me to understand Chaos code as
>> opposed to Boost PP and VMD. But I think I get the idea from your
>> original suggestion above and will study it further.
>
> It is just a demonstration of a possibility. I did it quickly so had to
> reuse some existing stuff, but essentially it is just a while loop. It
> terminates when its (constrained) input is empty. Its "operator" (which
> iterates the data) tests the front the the "v-sequence" for a
> parenthetic expression. If it has one, the operator extracts it and
> puts it into the result. If it does not, it tests it (via
> token-pasting) for starting with a "v-number". If it does, the operator
> extracts it and puts it into the result. If it does not, it tests it
> (again via token-pasting) for starting with a registered "v-identifier".
> If it does, the operator extracts it and puts it into the result. If
> it does not, it puts the entire remainder of the "v-sequence" into the
> result.
>

Thanks for the explanation. I can follow it.

>>>
>>> #include <chaos/preprocessor.h>
>>>
>>> #define NUMBER_1 (1),
>>> #define NUMBER_2 (2),
>>> #define NUMBER_3 (3),
>>> #define NUMBER_4 (4),
>>> #define NUMBER_5 (5),
>>> // ...
>>>
>>> // parser implementation...
>>> #define PARSE(vseq) PARSE_S(CHAOS_PP_STATE(), vseq)
>>> #define PARSE_S(s, vseq) \
>>> CHAOS_PP_SPLIT(0, CHAOS_PP_EXPR_S(s)(CHAOS_PP_WHILE_S( \
>>> s, PARSE_P, PARSE_O,, vseq \
>>> ))) \
>>> /**/
>>> #define PARSE_P(s, seq, vseq) \
>>> CHAOS_PP_COMPL(CHAOS_PP_IS_EMPTY_NON_FUNCTION(vseq)) \
>>> /**/
>>> #define PARSE_O(s, seq, vseq) \
>>> seq CHAOS_PP_IIF(CHAOS_PP_IS_VARIADIC(vseq))( \
>>> PARSE_O_A, PARSE_O_B \
>>> )(vseq) \
>>> /**/
>>> #define PARSE_O_A(vseq) PARSE_O_A_A vseq
>>> #define PARSE_O_A_A(...) (3, (__VA_ARGS__)),
>>> #define PARSE_O_B(vseq) \
>>> CHAOS_PP_IIF(CHAOS_PP_IS_VARIADIC(NUMBER_ ## vseq))( \
>>> PARSE_O_B_A, \
>>> CHAOS_PP_IIF(CHAOS_PP_IS_VARIADIC(IDENTIFIER_ ## vseq))( \
>>> PARSE_O_B_B, PARSE_O_B_C \
>>> ) \
>>> )(vseq) \
>>> /**/
>>> #define PARSE_O_B_A(vseq) \
>>> (1, CHAOS_PP_REM_CTOR( \
>>> CHAOS_PP_SPLIT(0, NUMBER_ ## vseq) \
>>> )), \
>>> CHAOS_PP_SPLIT(1, NUMBER_ ## vseq) \
>>> /**/
>>> #define PARSE_O_B_B(vseq) \
>>> (2, CHAOS_PP_REM_CTOR( \
>>> CHAOS_PP_SPLIT(0, IDENTIFIER_ ## vseq) \
>>> )), \
>>> CHAOS_PP_SPLIT(1, IDENTIFIER_ ## vseq) \
>>> /**/
>>> #define PARSE_O_B_C(vseq) (0, vseq),
>>>
>>> // registrations...
>>> #define IDENTIFIER_CIRCLE (CIRCLE),
>>> #define IDENTIFIER_SQUARE (SQUARE),
>>> #define IDENTIFIER_TRIANGLE (TRIANGLE),
>>> #define IDENTIFIER_RECTANGLE (RECTANGLE),
>>>
>>> PARSE(
>>> CIRCLE
>>> RECTANGLE
>>> (a, b, c)
>>> TRIANGLE
>>> 3
>>> (xyz)
>>> UNREGISTERED
>>> SQUARE
>>> )
>>>
>>> (Apologies for any mistakes in the above. I wrote it up quickly.)
>>>
>>> This particular example results in key-value pairs of the form:
>>> (0, <unrecognized>)
>>> (1, <v-number>)
>>> (2, <registered-identifier>)
>>> (3, <tuple>)
>>>
>>> (2, CIRCLE) (2, RECTANGLE) (3, (a, b, c)) (2, TRIANGLE) (1, 3) (3,
>>> (xyz)) (0, UNREGISTERED SQUARE)
>>>
>>> There are a number of things that could be done instead of producing a
>>> binary sequence like this (which Boost.Preprocessor cannot handle). One
>>> could fold the results via a user-supplied macro, for example.
>>>
>>> Aside from the massive amount of clutter introduced by workarounds, many
>>> of the pieces of the above are already available either in
>>> Boost.Preprocessor or in other parts of the VMD library. For example,
>>> BOOST_VMD_IS_EMPTY is essentially CHAOS_PP_IS_EMPTY_NON_FUNCTION.
>
> FYI
>
> The CHAOS_PP_IS_VARIADIC macro is equivalent to your
> BOOST_VMD_IS_BEGIN_TUPLE macro. The CHAOS_PP_REM macro is just a
> parentheses remover (e.g. #define REM(...) __VA_ARGS__). The
> CHAOS_PP_REM_CTOR macro is just a "constructed" parentheses remover
> (e.g. #define REM_CTOR(...) REM __VA_ARGS__). The CHAOS_PP_SPLIT macro
> is basically just a head/tail macro of a comma separated list.
>
>>> -- Implementation --
>>>
>>> I have not looked closely at the implementation, but I do know what a
>>> herculean effort it is to get this type of stuff working with VC++ in
>>> particular.
>>
>> Thanks ! VC++ is horrible to make it work "correctly". But you already
>> know that.
>
> Yes.
>
>>> -- Documentation --
>>>
>>> There are few inaccuracies in the documentation related to motivations
>>> for various Boost.Preprocessor things, but those could be fixed.
>>
>> Please feel free to point them out and I will update the documentation
>> accordingly.
>
> They are minor. They have to do with the supposition that
> Boost.Preprocessor is designed to not produce erroneous results given
> erroneous inputs--which it isn't designed to do at all--

I don't know where in the doc I stated the above. I was trying to
emphasize that Boost PP emphasized safety over the sort of functionality
which VMD provides, which depends more on end-users following certain
constraints on VMD input. Obviously depending on detecting emptiness as
a core functionality of VMD, where the test for emptiness, largely taken
from your own Internet code, can never be perfect, is an example of the
sort of riskier use of macros which VMD represents.

A number of people and a few reviewers also wanted to know why VMD, if
accepted, should not just be an addition to Boost PP and I was
expressing in my remarks to them a difference I perceived in philosophy
between Boost PP and VMD.

> and the reason
> why the low-level detection macros (such as BOOST_PP_IS_NULLARY and
> BOOST_PP_IS_EMPTY) is not because of their constrained input
> requirements, but rather because using them frequently leads to the need
> for hacks and workarounds in the code surrounding their use. They are
> "non-public" because they cannot fully encapsulate preprocessor
> workarounds.

OK. My interpretation of why you did not document them publicly was
wrong. I really thought it was along the lines of "Because these macros
could be dangerous or erroneous given certain input I don't want to
document them for you." But if you are concerned with other issues
related to their use I understand.

>
>>> The documentation for BOOST_VMD_IS_EMPTY is decent except that
>>> essentially any input that ends with a function-like macro name will
>>> cause weird results or errors. So much so that the documentation should
>>> just disallow input that ends with a function-like macro name across the
>>> board.
>>
>> I don't think that input that ends with a function-like macro name, when
>> that macro takes 0 or 1 parameter, causes incorrect results on a C++
>> standard conforming compiler. Can you explain why it would cause an
>> incorrect result or an error ?
>
> Contrived but an easy demonstration:
>
> #define MACRO(x) CAT(+, x C)
> // valid input to MACRO is + or =
>
> MACRO(+) // ++ C
> MACRO() // error
>
> BOOST_VMD_IS_EMPTY(MACRO) // error

You think of things I haven't even considered <g>. Anyway I will also
document this sort of situation in my discussion of emptiness in VMD.

>
> You can get erroneous results too if the macro BOOST_VMD_BEGIN_TUPLE
> used in the BOOST_VMD_DETAIL_IS_EMPTY_PROCESS macro somehow causes
> another scan to be applied to its input.
>

I really did already understand that ( I was bitten by it once when I
was programming VMD ) but I do not see how that could happen unless I
specifically change the BOOST_VMD_DETAIL_IS_EMPTY_PROCESS and mess it
up. But thanks for alerting me about it.

>>> Other than that, the documentation needs some clean-up and organization
>>> (particularly of definitions), but does a reasonable job of documenting
>>> the library. I do feel that a lot of documentation complexity comes
>>> from the way that v-keys/v-identifiers (etc) are handled.
>>
>> I agree that if I use your methodology of registering v-identifiers and
>> v-numbers then the documentation, as well as the code, for parsing
>> v-sequences might be much simplified and I may be automatically able to
>> parse any v-sequence into its v-types. I will have to take a look at
>> this a bit more.
>
> I think it is possible. The main difficulty will be VC++ because you
> will have to bend over backwards to get those macro-generated commas to
> form an argument separator in a variety of situations (if I recall
> correctly).

I am on the alert. Thanks ! I do like the comma technique enormously if
I can get it to work with the usually horrid VC++isms. All that
recursive processing I was doing with v-sequences, along with the
complicated input that was needed to parse v-identifiers and v-numbers
in the v-sequences, was hard work but I was too stupid to realize your
much better way.

>
>>> I had my skull opened up and had "preprocessor metaprogramming" tattooed
>>> directly onto my brain.
>>
>> Who coined that term anyway <g> ?
>
> :) I don't know.

Likely story <g>.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk