Boost logo

Boost :

From: David Abrahams (david.abrahams_at_[hidden])
Date: 2002-04-16 08:45:48


----- Original Message -----
From: "Hamish Mackenzie" <hamish_at_[hidden]>

> On Tue, 2002-04-16 at 01:36, David Abrahams wrote:
> > Were you looking on the mpl_v2 branch?
>
> Ahh... Code looks a lot more like the doc now :-)
>
> > > Wont it still end up with N * fold_step in the message, which has
four
> > > template arguments (as opposed to two)?
> >
> > I don't think so. Why don't you try it and find out? Speculation is
a
> > waste of time when you can verify.
>
> I have had a crack. I have attached the source code and the results.
>
> Here are the commands used to invoke the test
> time g++ mpl_test.cpp -I ../boost/boost/ 2>test1.txt
> time g++ mpl_test.cpp -I ../boost/boost/ -DTEST2 2>test2.txt
> g++ -v
> Reading specs from
/usr/local/lib/gcc-lib/i686-pc-linux-gnu/3.0.3/specs
> Configured with: ../gcc-3.0.3/configure
> Thread model: single
> gcc version 3.0.3
>
> The top-level recursion had far simpler errors and was much faster to
> compile (3.21sec vs 5.26sec). The speed results were not affected by
> removing the BOOST_STATIC_ASSERT. I will try and see if these results
> scale to longer lists.

It doesn't surprise me too much for such a short list. These facts
contribute:

1. You haven't even reached the loop unrolling limit with 3 list
elements.
2. Looking at the implementation, it appears that the loop unrolling is
gone(!) Aleksey must've removed it temporarily for some reason. That is
going to hurt the compilation times mightily.
3. fold<> dispatches through two other algorithms to the underlying
iter_fold_if implementation. This is merely for code-reuse and
convenience. Aleksey always intended to write a fold which doesn't
dispatch to iter_fold_if for optimization reasons

However, you've got me thinking about what optimizations /could/ be done
for lists, and here's what I imagine:

1. If we provided specializations for lists up to N elements, it would
handle short lists quickly.
2. There's no point in recursing in these specializations: we know how
long the list is, so we might as well just unroll the entire
implementation. That would be likely to further improve compile times.
3. The unrolled version for longer sequences would just be a special
case of the short-sequence specializations

I also can't resist pointing out that without an algorithm library, you
can't do this kind of optimization experiment without rewriting lots of
user-level code ;-)

-Dave


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk