
Boost : 
Subject: [boost] Review Request: Algorithm.Sorting
From: Steven Ross (spreadsort_at_[hidden])
Date: 20090427 10:32:23
I have uploaded algorithm_sorting.zip for a formal review to become
Boost.Algorithm.Sorting, if people feel that is appropriate.
It is 3 templated hybrid radixbased/comparisonbased algorithms,
specialized to sort certain types of data more efficiently.
All 3 have superior worstcase and averagecase performance to std::sort,
based upon my testing.
None of these algorithms use much additional memory.
All of them can sort key + data, and as they do fewer swaps, tend to do so
much faster than std::sort.
As a general rule, the speedup of these algorithms relative to std::sort
increases as data set size increases.
These algorithms tend to be faster relative to std::sort when comparisons
are slow, as they do less (or no) comparisons.
string_sort, which sorts strings in O(n*length) time, as compared to
O(n*length*log(n)) for pure comparisonbased algorithms.
It is consistently faster than std::sort in string sorting, by a little more
than a factor of 2 based upon my testing.
float_sort, which sorts IEEE floatingpoint numbers as integers, but
attaining the same ordering as std::sort,
with the exception of 0.0 and NaNs, both of which cannot be ordered based
upon their comparison function.
float_sort is roughly twice as fast as std::sort on UNIXtype OSes I've
tested on, and roughly 7X as fast on Windows.
I believe this speedup to be because floatingpoint comparisons are slow on
Windows.
integer_sort, which sorts integers around 60% faster than std::sort on
UNIXtype OSes, and 25% faster on Windows.
On Windows with MSVC, std::sort sorts alreadysorted and small range data
extremely quickly,
and integer_sort processes alreadysorted data slower than it by as much as
a factor of 3X.
I believe this to be due to some optimization for alreadysorted data,
as integer_sort takes substantially less absolute time on alreadysorted
data than unsorted data.
So integer_sort on Windows has a tradeoff: better worstcase and
averagecase performance,
but worse bestcase performance.
integer_sort and float_sort share some tuning parameters.
I have set values that seem to work well on both UNIXtype and Windows OSes,
but people are welcome to try to see what values are best for their system
and problem type.
The ideal values for integers and floats may be different on some systems
(such as Windows),
but for simplicity they are kept the same.
A tuning/testing script named tune.pl is included,
for those who wish to see how fast these algorithms are on their system (and
verify correctness).
I don't recommend using it for tuning without the large option, and that
takes days.
If you do use it for tuning, I also recommend the verbose option, to see
what it's doing.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk