Boost announcement mailing page: [Boost-announce] [review] [sort] Sort library review manager results

Subject: [Boost-announce] [review] [sort] Sort library review manager results
From: Edward Diener (eldiener_at_[hidden])
Date: 2014-11-26 23:57:16

This is the results from the recent review of the Sort library of Steven Ross.

First I would like to thank all those who made comments during the review, whether or not they officially gave a final Yes or No vote to whether the Sort library should be accepted as a Boost library. This list includes:

Niall Douglas, Julian Gonggrijp, Phil Endecott, Vladimir Prus, Mathias Gaunard, Jeremy Murphy, Peter Dimov, Robert Ramey, Adam Walling, Anthony Polukhin, Phil Endecott, Paul Bristow, Thijs (M.A.) van den Berg
Dupuis Etienne, and Frank Gennari

If I have missed anyone I do apologize.

Secondly I would like to thank Steven Ross for patiently answering all of the review comments to the best of his ability.

My tally of Yes and No votes for acceptance are:

Yes votes (5) :

Niall Douglas ( conditional ), Julian Gonggrijp ( conditional ),
Frank Gennari, Phil Endecott, Paul Bristow.

No votes (3) :

Vladimir Prus, Adam Walling ( conditional ), Anthony Polukhin

I believe the condtional Yes vote from Julian Gonggrijp was completely met in the discussions of issues with Steven Ross and the conditional Yes vote from Niall Douglas was almost completely met in the discussion with Steven Ross about the issues mentioned, and enough to maintain his Yes vote.

I want to also mention that the conditional No vote by Adam Walling has an implied Yes vote to it if the Sort library were one among other sort implementations.

Since my final decision is not entirely based on the Yes and No votes I want to adumbrate some of the major issues brought up by the review without necessarily focussing on every one of the people who brought them up initially, as well as my own reactions to them as Review Manager.

1) The first major issue was whether a library whose basic merit lies in its algorithm and its speed/space constraints needs better theoretical backing. A number of reviewers discussed this after it was brought up. I tend to agree with reviewers that while the best theoretical basis is always desirable, it is not necessary for a Boost library whose empirical evidence can be and has been measured by its implementor and can be measured by any user. Furthermore Steven Ross has provided an extensive discussion in his documentation, as well as his original paper, about the theoretical merits of his technique. While much of this discussion is probably beyond the understanding of any but sorting experts and afficionados enough of it adequately explains the basic ideas behind SpreadSort for those who understand and have knowledge of basic sorting ideas and popular sorting techniques.

2) An important issue is the need for the Sorting library to provide extensive timing charts/tables comparing SpreadSort to at least std::sort ( and possibly other popular mainstream sorts ) with both different numbers of sort keys and different initial unsorted distributions. Steven Ross has been providing these charts/tables in the 'develop' branch of the library.

3) Connected to the previous issue is that a number of reviewers asked for better documentation on the conditions in which SpreadSort will show a marked improvement in speed compared to std::sort, as well as any condition(s) where SpreadSort performs worse than std::sort. While Steven Ross has explained that generally SpreadSort will fall back to using std::sort for some cases ( low N ), I too feel that the documentation needs to be more extensive for the average user in attempting to delineate the conditions where SpreadSort works best.

4) A really important issue discussed by a number of reviewers involves the creation of a synthetic radix, where most users of sorting algorithm are used to comparison based sorting functors. While the documentation provides information on the various individual sorts, as well as examples, I also feel that a discussion in the documentation of how to use SpreadSort when sorting structs/classes with multiple keys of different types would be invaluable. In other words, what and if are the common techniques for using SpreadSort with complicated keys which, when using comparison-based sorting, can be represented fairly easily be a functor.

5) Niall Douglas brought up the issue of exception guarantees when using SpreadSort, and I concur. The documentation should be stronger specifying what happens during the sorting algorithms if an exception occurs and/or does the sorting algorithm itself ever throw any exceptions and, if so, under what conditions. Steven Ross mentioned that he is adding exception guarantee information to his docs.

6) Many reviewers did not like the idea that the library should be called Sort when it is basically implementing a single sorting algorithm called SpreadSort. Alternate suggestions are:

a) The library should be called SpreadSort.
b) The library can be called Sort only if other sorting algorithms are added to it.
c) The library can be called Sort with the proviso that it serves as a general library for sorting algorithms potentially added to it in the future.
d) The library should be part of the Boost Algorithm library.

Steven Ross mentioned other potential sorting implementations as part of the Sort library, under that general name, but all providing a good deal of extra work. These included a copy-based stable sorting algorithm, an LSD radix sorting, and an OpenMP based parallel implementation. Other reviewers mentioned some of the same types of sorting implementations and a few reviewers remarked that they also have created sorting implementations of their own. While I will give my own view at the end, I would like to mention here that I strongly believe that each sorting implementation needs to be reviewed separately for acceptance into Boost and never automatically added to a general library, whether it be called Sort or Algorithm or whatever.

7) Random issues brought up in comments I will mention, which have importance but which I do not believe are absolutely intrinsic yet still might be considered.

a) C++11 techniques in general and inline namespaces in particular.
b) A version that can work with less than random access.
c) Optimizations for small sets of data ( Steven Ross has already addressed this to an extent in the 'developer' branch of the library ).
d) Possible documentation of using SpreadSort as a subsort for parallel sorting techniques and possible discusion in the docs about SpreadSort and parallel sorts.
e) General O() information in docs.
f) SpreadSort examples hoisted directly into docs using Quickbook technique and updating the Spreadort docs to use Quickbook.

Review Result:

Based on the voting for acceptance or not, based on the discussions during the review period, and based on Steven Ross's responses, I have decided that Steven Ross's library should be accepted into Boost.

My main reason for my decision, aside from the votes and comments and specific issues discussed, is twofold:

1) There is a general consensus that Steven Ross has an excellent understanding of sorting techniques and algorithms and will support and refine his library for the benefit of programmers.
2) The empirical tests show that SpreadSort offers an improvement over std::sort, significantly in the area of large data sets, and this could be a major benefit to programmers.

I do not know if the review manager has a say in this but based on the remarks of the reviewers I would like to see the library as accepted be named 'SpreadSort' rather than just 'Sort'. I do think that Boost can have a library called 'Sort' but I agree with the general consensus that a 'Sort' library needs more than one type of sorting algorithm. I would like to see other people, who mentioned in the reviews/comments that they have their own sorting implementation, also submit their own implementations to Boost and, if this happens and they are accepted, I can see combining them with 'SpreadSort' into a general Boost sorting library called 'Sort' in the future.

Boost Announcement :