Boost logo

Boost :

Subject: Re: [boost] tuple benchmarks show marked differences from std::tuple(was Re: Interesting article on stack-based TMP
From: Larry Evans (cppljevans_at_[hidden])
Date: 2012-10-25 12:58:33

On 10/24/12 14:09, Eric Niebler wrote:
> I presented at BoostCon my own benchmarks of tuple with and without
> preprocessing. The results were unambiguously and strongly in favor of
> unrolling with the preprocessor. Tested with gcc. The presentation is

Thanks. I took a look at it with:

and saw the comparison chart on slide 12. That chart, as you say above,
shows favorably on the unrolled tuple.

> The source code is here:

I downloaded that and AFAICT:

  * The preprocessor method is in:


    and is roughly the same as the vertical tuple implementation here:

    The main difference, AFAICT, is that unrolled uses aggregation
    (via the member declaration:

      tuple<Tail...> tail;

    on line 133. In contrast, the vertical tuple uses inheritance:

      struct tuple_impl<Index, BOOST_PP_ENUM_PARAMS(TUPLE_CHUNK,
      : tuple_impl<Index+TUPLE_CHUNK, Others...>

    as shown on line 42 of the .hpp file.

    I'm still trying to understand how the get works. What's puzzling
    to me is:

        template<typename Tuple, int I>
        static inline constexpr auto get_elem(Tuple &&that, int_<I>)
            impl<I-I>::get_elem(static_cast<Tuple &&>(that).tail,

    since impl<I-I> has got to be 0, why use I-I? Also, the impl
    template parameter, J, is not used anywhere. I'm sure I could
    figure the reason out eventually, but not yet :(. I brief
    explanation would help. Also, it's not obvious to me why:

      static_cast<Tuple &&>(that)

    is needed because that has been declared as Tuple &&.

    I've no idea what are the pros and cons of the two
    methods(unrolled vs vertical).

  * The variadic template method is in:


    which is close to that here:

    in that both methods use multiple inheritance with an int key type
    paired with the tuple element type. In the case of tuple.cpp, the
    pairing is done with:

      template<int I, typename T>
      struct tuple_elem

    in tuple_impl_horizontal, pairing is done with:

      template<typename Key, typename Value>
      struct element
      template<int Key, typename Value>
      struct element<int_key<Key>,Value>
          Value value;

    The get functions are essentially the same.

  After looking at the code (and Makefile) it's not clear how the
  benchmark was done. The Makefile has nothing about timing in it,
  and the readme.txt mentions nothing about timing. Looking at the
  tuple.cpp code shows something with tree_builder in it, which sounds
  like it might be the benchmark code; however, so does
  unrolled_tuple.cpp. So, what is the benchmark used to produce the
  chart on slide 12 of trouble_with_tuples.pptx?

>> I thought it also interesting that clang seems to do better than gcc,
>> as reported here:
> Interesting. I didn't test with clang.

I'll try testing your benchmark, if you provide the code, with both
clang and g++ and post the results.


Boost list run by bdawes at, gregod at, cpdaniel at, john at