Subject: Re: [boost] [Fusion] port to c++0x
From: Larry Evans (cppljevans_at_[hidden])
Date: 2009-09-27 14:30:12
On 09/26/09 10:40, Christopher Schmidt wrote:
> Larry Evans schrieb:
>> There's no type_at in:
>> There is, however, a tuple<,>::at_type<I> nested struct. Is that what
>> you're referring to? If so, I don't understand why that would be
>> expensive since neither at_type or what it calls, at_elem, use
>> recursion. Could you explain why at_type would be expensive? Or maybe
>> you've run some timing tests on it. If so, could you provide the test
>> driver so I could test to see why it's so expensive?
> I am sorry to have kept you waiting so long. I have been busy with
> moving to a new flat.
No problem at all (I actually thought the response was pretty fast).
> Consider an instantiation of a tuple with 7 elements, the types foo,
> bar, int, char, blub, dub and mup.
> My fusion vector will directly evolve to
> : detail::vector_impl<0,foo,bar,int,char,blub,dub,mup>
> : detail::vector_impl<4,blub,dub,mup>
> As you elaborated, your tuple will evolve to
> tuple<char, foo,bar,int,char,blub,dub,mup>
> : _Tuple_impl<
> : _Tuple_element<mpl::integral_c<char, 0>, foo>
> , _Tuple_element<mpl::integral_c<char, 1>, bar>
> , _Tuple_element<mpl::integral_c<char, 2>, int>
> , _Tuple_element<mpl::integral_c<char, 3>, char>
> , _Tuple_element<mpl::integral_c<char, 4>, blub>
> , _Tuple_element<mpl::integral_c<char, 5>, dub>
> , _Tuple_element<mpl::integral_c<char, 6>, mup>
> Summed up, there are N/4+1 template instantiations for my vector and
> N+2+x for your tuple, with x being the amount of instantiations for
> mpl::package_range_forward. If mpl::package_range_forward<...> is not
> cached, x will be N+C.
> Also you need to consider the overhead of template expressions that
> are not instantiated but still are evaluated.
Good point. I wasn't aware of that.
> I think your at_type implementation is expensive as a complete template
> function invocation expression is evaluated. This involves a lot of
> steps in the lookup process that are not necessary in a meta-only
Good point. I had not considered that.
> Also the lookup process should scale commensurate to N.
Is that not true with my tuple? AFAICT, it's not dependent on N.
OOPS, wait, because the number of template instantiations depends on
N, and the lookup time is dependent on the number of template
instantiations, the lookup time is dependent on N, but AFAICT, it's
commensurate (if by that you mean linear time complexity). Is that
what you mean by commensurate?
> I cannot see how this should be faster than a controlled meta-recursion
> with a break condition.
What's "controlled meta-recursion" and what's "a break condition".
Is "a break condition" just the template recursion termination
condition? Is "controlled meta-recursion" the recursion every 4 (or
whatever) elements while for those 4 elements boost_pp is used to
generate the member variables and access one of those 4 elements?
> On top of that your implementation is limited to compiler with decltype
> support. I tried to avoid implementations in which certain c++0x
> features depend on others. A user, whose compiler does support variadic
> templates but not decltype, should still be able to enjoy the variadic
> template candy. This is not much of a problem in our specific case but
> in other cases an independent implementation is crucial.
I hadn't thought that was important because I'd assumed that by the
time variadic templates were adopted, decltype would surely be
> Either way, there are tons of pros and cons for both of our
> implementations. I really do like your implementation. It looks and
> feels native. Maybe it faster than mine. Maybe it is not. Who knows?
> Compiler are too different to let one state performance predictions a
> I judge compile-time performance of an expression by the number steps
> (and their orders of growth) involved into evaluating it, because I
> think, this is what comes closest to actual real-life performance in
> most cases. This led to my conclusion that an unrolled recursive
> implementation might be faster.
> I attached a test driver to this post. The testcase instantiates 25*25
> distinct tuples (either vertical or horizontal ones) with 0 to 10
> elements. Then optionally a type (meta-at) and/or object lookup (at) is
> done on each element.
Very nice! I love the way you've filtered out anything that's
"extraneous" both in the "vertical" (i.e. the boost_pp method) and
the "horizontal" (the tagged multiple inheritance method). This makes
it easier to see what's going on. However, I did have a hard time
understanding how the test cases were generated. I added several
comments which I hope make it clearer. I could upload this extra
commented version to the boost vault under the variadic_template
directory if you're interested.
> I used gcc 4.4.1 (TDM-1 mingw32) and ptime to measure performance. It is
> pretty interesting that the instantiation of the horizontal tuple type
> is just slightly slower.
gcc 4.5 is now using hashing to speed up the template lookup:
So, that may lessen the difference.
> The at-operations differ all the more surprisingly.
> === g++ -std=c++0x -c test.cpp ===
> Execution time: 5.274 s
> === g++ -std=c++0x -c test.cpp -DAT ===
> Execution time: 11.821 s
> === g++ -std=c++0x -c test.cpp -DMETA_AT ===
> Execution time: 12.161 s
> === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT ===
> Execution time: 19.565 s
> === g++ -std=c++0x -c test.cpp -DVERTICAL ===
> Execution time: 4.884 s
> === g++ -std=c++0x -c test.cpp -DAT -DVERTICAL ===
> Execution time: 7.136 s
> === g++ -std=c++0x -c test.cpp -DMETA_AT -DVERTICAL ===
> Execution time: 5.835 s
> === g++ -std=c++0x -c test.cpp -DAT -DMETA_AT -DVERTICAL ===
> Execution time: 8.156 s
Putting this in a table for easier comparison:
container 4.884 5.274
at 7.136 11.821
meta_at 5.835 12.161
at;meta_at 8.156 19.565
> I hope I made sense.
Yes. Thanks very much. I would like to see this test case as part of
the fusion documentation so that others could see the advantage of
BOOST_PP vs the Tagged MI option. Maybe it should be relegated to a
separate directory for documentation of the implementation instead of
including it in the user docs. That way it would be available for
future maintainers but wouldn't distract current users.
> By the way, your variadic mpl implementation looks great.
> Do you have any plans to integrate your work into the trunk?
I'd be happy to if there's interest. IIUC, there was some work done
during BoostCon09 on that; however, as noted in another post:
there's no code in the sandbox to compare with :(
Also, I've a strong suspicion that other's will suggest the code be
modified, using maybe some BOOST_PP magic, to lessen the number of
template instantiations. In particular, I suspect this will be
suggested for where foldr_pack is used in the supertypes of the
sequences. In particular, I'd think the technique you used in your
test driver( in the VERTICAL part) could be used with some benefit
there. The interesting thing is, the original mpl didn't do
that, instead it used just plain template recursion. OTOH, maybe it
wouldn't really be much use, but who knows?
Thanks for your work, Christopher.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk