From: John Moeller (fishcorn_at_[hidden])
Date: 2008-05-23 12:14:30
Simonson, Lucanus J <lucanus.j.simonson <at> intel.com> writes:
> Bruno wrote:
> >> The gain is in going from O(N) template instantiations to O(log N),
> >> which helps at compile-time.
> >Even at runtime I don't really agree since the solution proposed here
> >was the one I had originally proposed. And when Joaquin corrected it
> >by proposing something that better optimizes the number of
> >multiplications, my tests shew that the gain at runtime was real and
> >that my compiler didn't do that by itself. This being said, I should
> >redo my tests to be sure I didn't miss anything.
> >Anyway, I think all this is a matter of compiler and the best way is
> >to take compiler's hand to force it to write the right code, even for
> >those compilers that would have been able to do it by themselves.
> >Luke, which compiler to you use?
> gcc 4.3.2 -O2
> I didn't mean we should use the O(N) algorithm I tested, I meant we
> shouldn't obsess about improving the O(log N) algorithm already checked
> into the trunk.
No, we shouldn't, but that's not really what we're doing here.
> Whether x^32 results in 8, 6, or 4 template
> instantiations probably doesn't matter as much as clarity or concise
> code. People should be able to use it as an example for how to write
> their own generic functions.
Because people should be able to use it as an example for how to write their
own generic functions is precisely the reason that we should worry about how
many template instantiations are performed. Template instantiations cost
development time, and just because they aren't felt at runtime, doesn't mean
they aren't important. If you can reduce them *in every case* (with a simple
realignment of templates), then you should. At the same time, if you can
reduce the number of excess temporaries in every case just by a wise choice of
parameter type, then you should.
It's not as though we're talking about creating compiler-specific code
specializations just to squeeze out an extra 5% speed out of pow<>. We're
talking about writing good generic code, and good generic code considers the
compile-time factor too.
> Yes, it is better to force the compiler's hand than trust it. I
> certainly won't trust the compiler to unroll a loop for me. For built
> in multiply operations the compiler can optimize the expression tree you
> give it, but for user defined multiply operators on user defined types
> it will give you N multiplies if that is what you ask for and it. I
> think we should get the algorithm right, for sure.
> Have you ever seen the compiler fail to inline pow?
We're just making suggestions here. Bruno's going to measure the results and
do the best thing.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk