Boost logo

Boost :

Subject: Re: [boost] [MPL][vector] Is there interest in mpl::vector using variadic templates?
From: Bruno Dutra (brunocodutra_at_[hidden])
Date: 2017-03-02 19:54:10


On Thu, Mar 2, 2017 at 7:42 PM, Larry Evans via Boost <boost_at_[hidden]
> wrote:

> On 03/02/2017 11:54 AM, Larry Evans via Boost wrote:
>
>> On 03/02/2017 11:27 AM, Peter Dimov via Boost wrote:
>>
>>> Larry Evans wrote:
>>>
>>> One problem with the above cppljevans mpl is there's no at.hpp.
>>>> Instead, the non-variadic boost/mpl/at.hpp was used.
>>>> The reason no variadic at was created was because, AFAICT, there was
>>>> no non-recursive method for picking the I-th element from T... , and,
>>>> IIUC, recursive templates cause compile-time slow downs.
>>>>
>>>
>>> Have you read
>>>
>>> http://pdimov.com/cpp2/simple_cxx11_metaprogramming_2.html
>>>
>>> ?
>>>
>>
>> Nope. Thanks *very much* for the link. I'm impressed (especially
>> with the way you actually cited the parts of the standard to guide
>> your search for the best method!).
>>
>>
>>> Search for mp_at.
>>>
>>> Thanks for that tip.
>>
>> I'm a bit surprised that the mp_map_from_list was fastest.
>> I would have thought that large template classes mean slow
>> compile times, but I guess not.
>>
>
Since you seem to be interested on benchmarking compilation times, check
out http://metaben.ch.

> This:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4235.htm
>
> claims:
>
> template instantiations use memory that is never freed during the
> compilation process
>
> so, maybe mp_map_from_list should be used with caution. If many
> mp_map_from_list instantiations are used, I'm guessing the compiler
> might become pressed for memory. Maybe a benchmark showing
> compile time vs number of instantiations would show this.
> For example, instead of N=800, see what happens when 2
> instantiations with N=400 happen, and compare with a similar
> test for mp_repeat_c.

This is actually not the case, in fact the exact opposite is true. It turns
out mp_map_from_list is actually very cheap to the compiler in terms of
memory allocation, because it takes advantage of template memoization, so
if you retrieve every single element of a list using this trick, the
compiler needs to instantiate it only once. On the other hand, the trick
employing void pointers can't be expressed in a way that take advantage of
memoization, which means that, for every element retrieved from the list,
the entire pack of void pointers must be instantiated again.

I'm not just guessing here, in fact, benchmarks for the algorithm `at` on
metaben.ch prove it by plotting the time it takes to compile a metaprogram
that retrieves every element in a list as a function of its size, from 0 up
to 500 elements. Brigand and Meta (and likely also Hana, but I'm not sure)
rely on the void pointers trick, as you can verify in their source codes,
and as you can see on metaben.ch their benchmarks appear to vary with the
square of the list size. This is precisely because the entire variadic pack
of void pointers must be generated for each of the N elements retrieved. On
the other hand, the fastest implementation on all compilers(*) is that of
Metal, which seems to vary approximately linearly with the list size. That
is because it relies on a variant of the trick to retrieve an element from
a map and was carefully designed to take advantage of memoization, as you
can verify here
https://github.com/brunocodutra/metal/blob/master/include/metal/detail/lookup.hpp

(*) Specifically on Clang, Metal actually relies on a compiler intrinsic
called __type_pack_element to get the N-th type in a variadic pack. This is
arguably as fast as `at` can possibly get and still it is only slightly
faster than the implementation on GCC, which does not rely on any compiler
intrinsic.

Bruno


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk