|
Boost : |
Subject: [boost] [review] [sort] Sort library review manager results
From: Edward Diener (eldiener_at_[hidden])
Date: 2014-11-26 20:32:00
This is the results from the recent review of the Sort library of Steven
Ross.
First I would like to thank all those who made comments during the
review, whether or not they officially gave a final Yes or No vote to
whether the Sort library should be accepted as a Boost library. This
list includes:
Niall Douglas, Julian Gonggrijp, Phil Endecott, Vladimir Prus, Mathias
Gaunard, Jeremy Murphy, Peter Dimov, Robert Ramey, Adam Walling, Anthony
Polukhin, Phil Endecott, Paul Bristow, Thijs (M.A.) van den Berg
Dupuis Etienne, and Frank Gennari
If I have missed anyone I do apologize.
Secondly I would like to thank Steven Ross for patiently answering all
of the review comments to the best of his ability.
My tally of Yes and No votes for acceptance are:
Yes votes (5) :
Niall Douglas ( conditional ), Julian Gonggrijp ( conditional ),
Frank Gennari, Phil Endecott, Paul Bristow.
No votes (3) :
Vladimir Prus, Adam Walling ( conditional ), Anthony Polukhin
I believe the condtional Yes vote from Julian Gonggrijp was completely
met in the discussions of issues with Steven Ross and the conditional
Yes vote from Niall Douglass was almost completely met in the discussion
with Steven Ross about the issues mentioned, and enough to maintain his
Yes vote.
I want to also mention that the conditional No vote by Adam Walling has
an implied Yes vote to it if the Sort library were one among other sort
implementations.
Since my final decision is not entirely based on the Yes and No votes I
want to adumbrate some of the major issues brought up by the review
without necessarily focussing on every one of the people who brought
them up initially, as well as my own reactions to them as Review Manager.
1) The first major issue was whether a library whose basic merit lies in
its algorithm and its speed/space constraints needs better theoretical
backing. A number of reviewers discussed this after it was brought up. I
tend to agree with reviewers that while the best theoretical basis is
always desirable, it is not necessary for a Boost library whose
empirical evidence can be and has been measured by its implementor and
can be measured by any user. Furthermore Steven Ross has provided an
extensive discussion in his documentation, as well as his original
paper, about the theoretical merits of his technique. While much of this
discussion is probably beyond the understanding of any but sorting
experts and afficionados enough of it adequately explains the basic
ideas behind SpreadSort for those who understand and have knowledge of
basic sorting ideas and popular sorting techniques.
2) An important issue is the need for the Sorting library to provide
extensive timing charts/tables comparing SpreadSort to at least
std::sort ( and possibly other popular mainstream sorts ) with both
different numbers of sort keys and different initial unsorted
distributions. Steven Ross has been providing these charts/tables in the
'develop' branch of the library.
3) Connected to the previous issue is that a number of reviewers asked
for better documentation on the conditions in which SpreadSort will show
a marked improvement in speed compared to std::sort, as well as any
condition(s) where SpreadSort performs worse than std::sort. While
Steven Ross has explained that generally SpreadSort will fall back to
using std::sort for some cases ( low N ), I too feel that the
documentation needs to be more extensive for the average user in
attempting to delineate the conditions where SpreadSort works best.
4) A really important issue discussed by a number of reviewers involves
the creation of a synthetic radix, where most users of sorting algorithm
are used to comparison based sorting functors. While the documentation
provides information on the various individual sorts, as well as
examples, I also feel that a discussion in the documentation of how to
use SpreadSort when sorting structs/classes with multiple keys of
different types would be invaluable. In other words, what and if are the
common techniques for using SpreadSort with complicated keys which, when
using comparison-based sorting, can be represented fairly easily be a
functor.
5) Niall Douglass brought up the issue of exception guarantees when
using SpreadSort, and I concur. The documentation should be stronger
specifying what happens during the sorting algorithms if an exception
occurs and/or does the sorting algorithm itself ever throw any
exceptions and, if so, under what conditions. Steven Ross mentioned that
he is adding exception guarantee information to his docs.
6) Many reviewers did not like the idea that the library should be
called Sort when it is basically implementing a single sorting algorithm
called SpreadSort. Alternate suggestions are:
a) The library should be called SpreadSort.
b) The library can be called Sort only if other sorting algorithms are
added to it.
c) The library can be called Sort with the proviso that it serves as a
general library for sorting algorithms potentially added to it in the
future.
d) The library should be part of the Boost Algorithm library.
Steven Ross mentioned other potential sorting implementations as part of
the Sort library, under that general name, but all providing a good deal
of extra work. These included a copy-based stable sorting algorithm, an
LSD radix sorting, and an OpenMP based parallel implementation. Other
reviewers mentioned some of the same types of sorting implementations
and a few reviewers remarked that they also have created sorting
implementations of their own. While I will give my own view at the end,
I would like to mention here that I strongly believe that each sorting
implementation needs to be reviewed separately for acceptance into Boost
and never automatically added to a general library, whether it be called
Sort or Algorithm or whatever.
7) Random issues brought up in comments I will mention, which have
importance but which I do not believe are absolutely intrinsic yet still
might be considered.
a) C++11 techniques in general and inline namespaces in particular.
b) A version that can work with less than random access.
c) Optimizations for small sets of data ( Steven Ross has already
addressed this to an extent in the 'developer' branch of the library ).
d) Possible documentation of using SpreadSort as a subsort for parallel
sorting techniques and possible discusion in the docs about Spreadort
and parallel sorts.
e) General O() information in docs.
f) SpreadSort examples hoisted directly into docs using Quickbook
technique and updating the Spreadort docs to use Quickbook.
Review Result:
Based on the voting for acceptance or not, based on the discussions
during the review period, and based on Steven Ross's responses, I have
decided that Steven Ross's library should be accepted into Boost.
My main reason for my decision, aside from the votes and comments and
specific issues discussed, is twofold:
1) There is a general consensus that Steven Ross has an excellent
understanding of sorting techniques and algorithms and will support and
refine his library for the benefit of programmers.
2) The empirical tests show that SpreadSort offers an improvement over
std::sort, significantly in the area of large data sets, and this could
be a major benefit to programmers.
I do not know if the review manager has a say in this but based on the
remarks of the reviewers I would like to see the library as accepted be
named 'SpreadSort' rather than just 'Sort'. I do think that Boost can
have a library called 'Sort' but I agree with the general consensus that
a 'Sort' library needs more than one type of sorting algorithm. I would
like to see other people, who mentioned in the reviews/comments that
they have their own sorting implementation, also submit their own
implementations to Boost and, if this happens and they are accepted, I
can see combining them with 'SpreadSort' into a general Boost sorting
library called 'Sort' in the future.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk