Boost logo

Boost :

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2008-07-12 14:20:25


Steven Ross wrote:
> Are there any more suggestions for the core algorithm?

for each bin {
    for each element between this bin's current_position and its end {
      while (1) {
        determine the bin this element should be in
        if it's in the right bin { break }
        else {
          swap it with the element at the right bin's current_position
          increment that bin's current_position
        }
      }
    }
}

I'm considering the case where the type is relatively large and swap
takes time proportional to its size, e.g. a large struct. Currently
your code (and my version above) will, on average, swap each element
twice: once to what's probably the wrong bin and then into the right
bin. If swap itself uses a temporary then each element is copied on
average 3 times. Maybe the compiler can optimise some of that, and the
cache may make some of those copies faster than others. But I guess
that it can probably be improved to nearer one copy per element.
Here's the idea:

We want to move element A to the position currently occupied by element
B. We could swap them, but moving B to A's position is probably not
useful. So let's first move B to a good position for it, and then move
A into its space. Of course some other element C will be occupying the
position where we want to put B, so we have to keep going until we find
an element that wants to be in the position where A was.

It's relatively easy to code a recursive version of this:

void circulate(Iter i) {
   // Do an N-way circular swap that eventually puts the element at i in
its right place,
   // and puts an element that wants to be at i there.
   *i = circulate_rec(i,i);
}

T circulate_rec(Iter source, Iter current) {
   // Move the element at current to its right position, and return an element
   // that can be stored at source.
   // Find current's right position:
   Iter target = right_bin_for(*current).current_position++;
   // Base case: if the element at target would be happy at position
source, that's easy:
   if (right_bin_for(*target) == bin_of(source)) {
     tmp = *target;

   } else {
     // Otherwise, first move target to a good position
     tmp = circulate_rec(source,target);
   }

   // Now actually copy the element at current to the vacant target
   *target = *current;
   return tmp;
}

The problem with that is that you might recurse a lot and fill your
stack, so in practice you'd want to limit it to a few iterations and
then give up and put a "wrong" value back at the source. Also you need
to be certain that 'tmp' gets some sort of return value optimisation
and isn't copied during each return. An explicitly unrolled iterative
version would probably be best:

Iter a = ...;
Iter b = right_bin_for(*a).current_position++;
if (right_bin_for(*b) != bin_of(a)) {
   Iter c = right_bin_for(*b).current_position++;
   if (right_bin_for(*c) != bin_of(a)) {
     Iter d = right_bin_for(*c).current_position++;
     if (right_bin_for(*d != bin_of(a)) {
       Iter e = right_bin_for(*d);
       tmp = *e;
       *e = *d;
     } else {
       tmp = *d;
     }
     *d = *c;
   } else {
     tmp = *c;
   }
   *c = *b;
} else {
   tmp = *b;
}
*b = *a;
*a = tmp;

Whether it actually improves performance will depend on quite how large
the structs are. It certainly won't help with ints. So let's see your
version that sorts structs!

Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk