|
Boost : |
Subject: Re: [boost] tuple benchmarks show marked differences from std::tuple(was Re: Interesting article on stack-based TMP
From: Larry Evans (cppljevans_at_[hidden])
Date: 2012-10-25 12:58:33
On 10/24/12 14:09, Eric Niebler wrote:
[snip]
> I presented at BoostCon my own benchmarks of tuple with and without
> preprocessing. The results were unambiguously and strongly in favor of
> unrolling with the preprocessor. Tested with gcc. The presentation is
here:
>
>
https://github.com/boostcon/cppnow_presentations_2012/blob/master/mon/trouble_with_tuples.pptx
Thanks. I took a look at it with:
http://www.viewdocsonline.com/document/
and saw the comparison chart on slide 12. That chart, as you say above,
unambiguously
shows favorably on the unrolled tuple.
>
> The source code is here:
>
> https://github.com/ericniebler/home/tree/master/src/tuple
>
I downloaded that and AFAICT:
* The preprocessor method is in:
unrolled_tuple.hpp
and is roughly the same as the vertical tuple implementation here:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/slim/test/tuple_impl_vertical.hpp
The main difference, AFAICT, is that unrolled uses aggregation
(via the member declaration:
tuple<Tail...> tail;
on line 133. In contrast, the vertical tuple uses inheritance:
struct tuple_impl<Index, BOOST_PP_ENUM_PARAMS(TUPLE_CHUNK,
TUPLE_IMPL_TYPE_NAME), Others...>
: tuple_impl<Index+TUPLE_CHUNK, Others...>
as shown on line 42 of the .hpp file.
I'm still trying to understand how the get works. What's puzzling
to me is:
template<typename Tuple, int I>
static inline constexpr auto get_elem(Tuple &&that, int_<I>)
RETURN(
impl<I-I>::get_elem(static_cast<Tuple &&>(that).tail,
int_<I-UNROLL_MAX>())
)
since impl<I-I> has got to be 0, why use I-I? Also, the impl
template parameter, J, is not used anywhere. I'm sure I could
figure the reason out eventually, but not yet :(. I brief
explanation would help. Also, it's not obvious to me why:
static_cast<Tuple &&>(that)
is needed because that has been declared as Tuple &&.
I've no idea what are the pros and cons of the two
methods(unrolled vs vertical).
* The variadic template method is in:
tuple.cpp
which is close to that here:
in that both methods use multiple inheritance with an int key type
paired with the tuple element type. In the case of tuple.cpp, the
pairing is done with:
template<int I, typename T>
struct tuple_elem
in tuple_impl_horizontal, pairing is done with:
template<typename Key, typename Value>
struct element
;
template<int Key, typename Value>
struct element<int_key<Key>,Value>
{
Value value;
};
The get functions are essentially the same.
After looking at the code (and Makefile) it's not clear how the
benchmark was done. The Makefile has nothing about timing in it,
and the readme.txt mentions nothing about timing. Looking at the
tuple.cpp code shows something with tree_builder in it, which sounds
like it might be the benchmark code; however, so does
unrolled_tuple.cpp. So, what is the benchmark used to produce the
chart on slide 12 of trouble_with_tuples.pptx?
>> I thought it also interesting that clang seems to do better than gcc,
>> as reported here:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54710#c10
>
> Interesting. I didn't test with clang.
>
I'll try testing your benchmark, if you provide the code, with both
clang and g++ and post the results.
-regards,
Larry
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk