Boost logo

Boost :

Subject: Re: [boost] radix sort
From: Jeremy Murphy (jeremy.william.murphy_at_[hidden])
Date: 2013-07-22 01:05:05


On 22 July 2013 00:58, Steven Ross <spreadsort_at_[hidden]> wrote:

> It is MSD radix sort, which is recursive bucket sort, combined with
> switching to
> std::sort once the problem is cut up small enough that std::sort has better
> worst-case performance. Conventional MSD radix sorts use insertion sort
> for
> small problems instead, and switch at much smaller data sizes, which can be
> problematic for performance with data that is unevenly distributed
> (conventional MSD radix sorts take about 4X as long as std::sort in this
> case).
>

Ah, I apologize for the misunderstanding and thank you for clearing it up.

> Reverse sorting could be done by passing reverse iterators, though that
>
eliminates the possibility of using raw pointers, which are slightly faster
> with some of the compilers I tested. I'd be happy to remove the reverse
> calls if there is general agreement that the performance difference isn't a
> problem.

If performance is the only reason can you provide or link to some empirical
data? Presumably the same decision must have been broached with regard to
std::sort() and there is no std::reverse_sort(). This discussion on SO
suggests that with enough optimization all iterators are equal:
http://stackoverflow.com/questions/9025084/sorting-a-vector-in-descending-order
It also points out that reverse sorting can be achieved with forward
iterators by using the opposite sorting functor.

An LSD radix sort could make sense as an additional option, and counting
>
sort would make sense as a part of that. LSD is substantially different
> from
> MSD radix sort, so it seems like a separate project to me.
> LSD radix sort is often slower than std::sort on real problems because it
> can't
> take advantage of easy problems. It also uses more memory. and doesn't
> make sense on variable-length data types. All that said, it is useful for
> some
> types of problems, such as when stability is a requirement.

I certainly haven't done extensive testing, but so far the stable
counting-sort has been much quicker than std::sort(). It has even been
quicker than your code for small (< 1000) values of k (maximum value). :)
 The non-stable counting-sort (based on Rosetta Code) is faster still, so
I'm curious to see some extensive performance tests. My LSD radix-sort is
almost finished, I'll let you know when.

Cheers.

Jeremy


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk