From: Steven Ross (spreadsort_at_[hidden])
Date: 2008-07-14 10:21:07
On Sat, Jul 12, 2008 at 11:20 AM, Phil Endecott <
> We want to move element A to the position currently occupied by element B.
> We could swap them, but moving B to A's position is probably not useful.
> So let's first move B to a good position for it, and then move A into its
> space. Of course some other element C will be occupying the position where
> we want to put B, so we have to keep going until we find an element that
> wants to be in the position where A was.
> Whether it actually improves performance will depend on quite how large the
> structs are. It certainly won't help with ints. So let's see your version
> that sorts structs!
This is an excellent idea. You need n + 2 copies to definitely put n
elements in their correct position, and if you're lucky, put n + 1 elements
in the correct position. With a swap, n = 1, so you need 3 copies. Most of
the speedup will probably occur with a small n (n = 4 to 8?). I will try an
explicitly unrolled loop with a small number of elements to see how well
this works. I have the struct code commented out in sample.c, but there is
no code change in spreadsort.hpp required to handle larger structs.
I don't really need to swap the elements into position, what I need is the
fastest way to rearrange elements that are not in the correct order, but
where I do know where they need to be put, in a minimum number of copies and
checks for the correct position. The prior optimization we discussed where
I posted my new code for the swap loop had a net 16% speed improvement
across my regressions, to the point where my regressions ran in less than
half the time that std::sort took on them (for 4e7 elements), so we're
getting susbstantial improvement. I intend to try your version of that
change at some point, but I'll have to verify that it also works well on
Intel processors, and I'm skeptical from prior experience. Your latest
suggestion is more promising and novel, so I will try it first.
Your idea may even make an improvement for integers, if I code it right,
because if you think about it, most of the time swapping is spent with
roughly 512 swaps before an appropriate element for the position is found.
If I can find a way to parameterize the "n" number of copies based upon
size_of(*RandomAccessIter), then I could tune this to the size of the data
type, but even without that, a small number of iterations of this approach
should get a sizeable speedup across all data types.
I had considered some ideas of this type in the past, but got stuck on the
extra memory required, and thought that might add overhead, but with a small
constant number of elements, the overhead should be minimal.
In effect, this placement algorithm (as improved with my last posting) works
in two stages:
first stage, repeated n/512 times:
swap until the right element is in position (roughly 512 times), then go to
the next element
This stage should dominate runtime now, and it's the one your latest
suggestion would improve.
run along the bins, skipping to the first element that is out of place, and
do a few swaps for each out-of-place element until it is in the correct
position. This stage is now very fast.
I was working on fixing my worst-case calculation to be more accurate, which
is most important for worst-case performance, but gave a consistent .1-.2%
improvement in my regressions, and gave LOG_CONST more meaning. I'll
include that fix with my next posting, but I intend to try out your idea
next time I work on integer_sort, and I expect to see a substantial
Thanks again Phil. I think when we're done optimizing we'll have a much
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk