Boost logo

Boost :

Subject: Re: [boost] radix sort
From: Steven Ross (spreadsort_at_[hidden])
Date: 2013-07-21 10:58:29


Just to be clear, Steven's code is not actually radix sort, it's bucket
> sort, right? Which is great, but let's not get confused about what's what.
>
>
It is MSD radix sort, which is recursive bucket sort, combined with
switching to
std::sort once the problem is cut up small enough that std::sort has better
worst-case performance. Conventional MSD radix sorts use insertion sort
for
small problems instead, and switch at much smaller data sizes, which can be
problematic for performance with data that is unevenly distributed
(conventional MSD radix sorts take about 4X as long as std::sort in this
case).

> Steven, I gave it a quick test but it failed when trying to include
> <boost/static_warning.hpp>. Looks like Robert Ramey changed the names of
> things around 1.37.0. (I tested with 1.53.0.) Changing it
> to <boost/serialization/static_warning.hpp> fixed it up.
>
> Looks like <vector>, <cstring> and <limits> are unnecessary includes in
> some of the files?
>

Thanks, I'll clean those up.

>
> Otherwise, it worked, and was much faster than std::sort(), so well done.
> :)
>

Thanks.

> I'm not real sure about the interface, though: do they have to be different
> named functions (integer_, float_, string_)? Can't it be called
> bucket_sort() and specialized for different types? And reverse sorting,
> isn't that done by passing reverse iterators?
>
> Reverse sorting could be done by passing reverse iterators, though that
eliminates the possibility of using raw pointers, which are slightly faster
with some of the compilers I tested. I'd be happy to remove the reverse
calls if there is general agreement that the performance difference isn't a
problem. With regards to different types, this covers three different
cases:
sorting by an integer or integer-like key of finite length, sorting a float
correctly by casting it to an integer (tricky but fast), and sorting
strings of
variable length. Specializing by type isn't ideal as people often want to
sort
larger data types by a key, and the key could itself be a more complex data
type that isn't an integer, float, or string, but supports the necessary
operators.

> Out of curiosity, I decided to tackle the problem and wrote an
> implementation of counting sort, which as you may know, is one way of
> implementing radix sort. The code is not ready for a formal review but I
> would really appreciate feedback on it, as it is written with a view to
> inclusion in Boost.
>
> An LSD radix sort could make sense as an additional option, and counting
sort would make sense as a part of that. LSD is substantially different
from
MSD radix sort, so it seems like a separate project to me.
LSD radix sort is often slower than std::sort on real problems because it
can't
take advantage of easy problems. It also uses more memory. and doesn't
make sense on variable-length data types. All that said, it is useful for
some
types of problems, such as when stability is a requirement.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk