Boost logo

Boost :

Subject: [boost] Review Request: Algorithm.Sorting
From: Steven Ross (spreadsort_at_[hidden])
Date: 2009-04-27 10:32:23


I have uploaded algorithm_sorting.zip for a formal review to become
Boost.Algorithm.Sorting, if people feel that is appropriate.
It is 3 templated hybrid radix-based/comparison-based algorithms,
specialized to sort certain types of data more efficiently.
All 3 have superior worst-case and average-case performance to std::sort,
based upon my testing.
None of these algorithms use much additional memory.
All of them can sort key + data, and as they do fewer swaps, tend to do so
much faster than std::sort.
As a general rule, the speedup of these algorithms relative to std::sort
increases as data set size increases.
These algorithms tend to be faster relative to std::sort when comparisons
are slow, as they do less (or no) comparisons.

string_sort, which sorts strings in O(n*length) time, as compared to
O(n*length*log(n)) for pure comparison-based algorithms.
It is consistently faster than std::sort in string sorting, by a little more
than a factor of 2 based upon my testing.

float_sort, which sorts IEEE floating-point numbers as integers, but
attaining the same ordering as std::sort,
with the exception of -0.0 and NaNs, both of which cannot be ordered based
upon their comparison function.
float_sort is roughly twice as fast as std::sort on UNIX-type OSes I've
tested on, and roughly 7X as fast on Windows.
I believe this speedup to be because floating-point comparisons are slow on
Windows.

integer_sort, which sorts integers around 60% faster than std::sort on
UNIX-type OSes, and 25% faster on Windows.
On Windows with MSVC, std::sort sorts already-sorted and small range data
extremely quickly,
and integer_sort processes already-sorted data slower than it by as much as
a factor of 3X.
I believe this to be due to some optimization for already-sorted data,
as integer_sort takes substantially less absolute time on already-sorted
data than unsorted data.
So integer_sort on Windows has a tradeoff: better worst-case and
average-case performance,
but worse best-case performance.

integer_sort and float_sort share some tuning parameters.
I have set values that seem to work well on both UNIX-type and Windows OSes,

but people are welcome to try to see what values are best for their system
and problem type.
The ideal values for integers and floats may be different on some systems
(such as Windows),
but for simplicity they are kept the same.

A tuning/testing script named tune.pl is included,
for those who wish to see how fast these algorithms are on their system (and
verify correctness).
I don't recommend using it for tuning without the -large option, and that
takes days.
If you do use it for tuning, I also recommend the -verbose option, to see
what it's doing.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk