Boost logo

Boost :

Subject: Re: [boost] sorting library proposal (Was: Review Wizard Status Report for June 2009)o
From: Vladimir Prus (vladimir_at_[hidden])
Date: 2009-06-03 12:28:45


Steven Ross wrote:

> On Tue, Jun 2, 2009 at 11:02 PM, Jonathan Franklin <
> franklin.jonathan_at_[hidden]> wrote:
>>
>>
>> Radix sorting is indeed appropriate, IMO. However, there are many
>> "proven" radix sort algorithms, and (probably) many more "unproven"
>> algorithms. Seems to me that boost is a set of "generally useful"
>> libraries, and not a "proving ground" for new algorithms.
>>
>> The point in question is then: At what point is an algorithm
>> adequately "proven" for inclusion in a boost library?
>>
>
> There are multiple papers in the literature about Adaptive Left Radix, which
> is fast on random data (and like Spreadsort, it wasn't published a day ago;
> it's been over 6 years).

Can you give references to this papers and include them in your docs. I think
one of the references might be the link you previously posted:
http://www.nik.no/2002/Maus.pdf ?

> ALR has inferior worst-case performance because it
> uses insertion_sort as its fallback sort; worst-case performance is the top
> reason to use Spreadsort. Is insertion_sort to std::sort such a big
> complicated change that it can't be accepted?

If it's indeed the only change, then it's probably OK, as soon as you explicitly
say that std::sort is guaranteed, by the C++ standard, to have the same or
better complexity than insertion sort. While insertion sort is O(n^2), std::sort
is required to be O(n*log(n)) in the *average case*, so it permitted to be same
O(n^2). Consequently, you are likely to get better average performance, but
better worst-case performance is not guaranteed. Unless you make some claims
about the algorithm std::sort uses.

> Going back to a higher level, it's easy to test a sorting algorithm for
> correctness, and radix sorting isn't that hard to understand; the general
> approach is not novel, just some optimizations and worst-case handling. It
> would get expensive to publish a paper for every little tweak I've stuck in
> a sorting algorithm, so depending on publications for every improvement is a
> great way to slow down progress, especially if you demand that people cite
> said publications.

> In closing, this has:
> insertion_sort -> std::sort
> a worst-case fallback check (integer_sort/float_sort), falling back to
> std::sort
> an optimization for long substrings (string_sort)
> float to integer cast (float_sort)
> assorted performance tweaks
>
> Are those changes really so dangerous?

I don't understand, and can't find a description of, for most of them. If you
give a paper describing base algorithm, and then describe in detail refinements
you did, this will enable reviewers and users to figure if those refinemenst
are safe enough and how they might affect performance. Right now, docs neither
provide references nor explain your refinements.

- Volodya


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk