Boost logo

Boost :

From: Steven Ross (spreadsort_at_[hidden])
Date: 2008-07-12 13:41:34


On Sat, Jul 12, 2008 at 8:48 AM, Phil Endecott <
spam_from_boost_dev_at_[hidden]> wrote:

> Steven Ross wrote:
>
>> Are there any more suggestions for the core algorithm?
>>
>
> Log2 can be done in O(log Nbits) steps, rather than O(Nbits).

I agree, but do you know a fast way to do so?

> I spent a while trying to understand this bit of code:
>
> //Swap into place
> //This dominates runtime, especially in the swap; std::sort calls are the
> other main time-user
> RandomAccessIter current = first;
> for(unsigned u = 0; current != last; ++u) {
> for(current_bin = (bins + ((*current >> log_divisor) - div_min));
> (current_bin->count > u);
> current_bin = bins + ((*current >> log_divisor) - div_min))
> std::swap(*current,
> *(current_bin->current_position++));
> //Now that we've found the item belonging in this position,
> increment the bucket count
> if(current_bin->current_position == current++)
> ++(current_bin->current_position);
> }
>
> You don't help understanding by changing the meaning of bin.count half way
> through :-(.

Would you like to

> Anyway, I think you're doing more work than necessary here. For each bin,
> the elements between the start of the bin and bin.current_position will be
> correct and can be skipped over; this will be on average half of the
> elements. This doesn't save any swaps but it does save the shifting and
> some comparisons. Pseudo-code:
>
> for each bin {
> for each element between this bin's current_position and its end {
> while (1) {
> determine the bin this element should be in
> if it's in the right bin { break }
> else {
> swap it with the element at the right bin's current_position
> increment that bin's current_position
> }
> }
> }
> }
>
>
> Have I understood this correctly?
> <http://lists.boost.org/mailman/listinfo.cgi/boost>
>

Good analysis.
Yes, you have, but there's a much faster way to do that. I already tried
your technique some time back, and it works well on my system, but because
of the extra nested loop does not optimize well on Intel processors (at
least the way I coded it).

There's a better way to do it, attached, and this time I ran my regression
tests, and it's about 11% faster (they're not quite done yet). Basically I
did what you're talking about, but without the extra loop over the bins,
just adding the else case:

for(unsigned u = 0; current != last;) {
        for(current_bin = (bins + ((*current >> log_divisor) - div_min));
(current_bin->count > u);
            current_bin = bins + ((*current >> log_divisor) - div_min))
                std::swap(*current, *(current_bin->current_position++));
        //Now that we've found the item belonging in this position,
increment the bucket count
        if(current_bin->current_position == current++) {
            ++(current_bin->current_position);
            ++u;
        }
        else {//skipping forward to the first unfinished element in the bin
            current = current_bin->current_position;
            u = current_bin->current_position - first;
        }
    }

I'll try your way again just to see if it's any faster than my newest
optimization.

Steve




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk