|
Boost Announcement :
|
This is the results from the
recent review of the Sort library of Steven Ross.
First I would like to thank all those who made comments during the
review, whether or not they officially gave a final Yes or No vote
to whether the Sort library should be accepted as a Boost library.
This list includes:
Niall Douglas, Julian Gonggrijp, Phil Endecott, Vladimir Prus,
Mathias Gaunard, Jeremy Murphy, Peter Dimov, Robert Ramey, Adam
Walling, Anthony Polukhin, Phil Endecott, Paul Bristow, Thijs
(M.A.) van den Berg
Dupuis Etienne, and Frank Gennari
If I have missed anyone I do apologize.
Secondly I would like to thank Steven Ross for patiently answering
all of the review comments to the best of his ability.
My tally of Yes and No votes for acceptance are:
Yes votes (5) :
Niall Douglas ( conditional ), Julian Gonggrijp ( conditional ),
Frank Gennari, Phil Endecott, Paul Bristow.
No votes (3) :
Vladimir Prus, Adam Walling ( conditional ), Anthony Polukhin
I believe the condtional Yes vote from Julian Gonggrijp was
completely met in the discussions of issues with Steven Ross and
the conditional Yes vote from Niall Douglas was almost completely
met in the discussion with Steven Ross about the issues mentioned,
and enough to maintain his Yes vote.
I want to also mention that the conditional No vote by Adam
Walling has an implied Yes vote to it if the Sort library were one
among other sort implementations.
Since my final decision is not entirely based on the Yes and No
votes I want to adumbrate some of the major issues brought up by
the review without necessarily focussing on every one of the
people who brought them up initially, as well as my own reactions
to them as Review Manager.
1) The first major issue was whether a library whose basic merit
lies in its algorithm and its speed/space constraints needs better
theoretical backing. A number of reviewers discussed this after it
was brought up. I tend to agree with reviewers that while the
best theoretical basis is always desirable, it is not necessary
for a Boost library whose empirical evidence can be and has been
measured by its implementor and can be measured by any user.
Furthermore Steven Ross has provided an extensive discussion in
his documentation, as well as his original paper, about the
theoretical merits of his technique. While much of this discussion
is probably beyond the understanding of any but sorting experts
and afficionados enough of it adequately explains the basic ideas
behind SpreadSort for those who understand and have knowledge of
basic sorting ideas and popular sorting techniques.
2) An important issue is the need for the Sorting library to
provide extensive timing charts/tables comparing SpreadSort to at
least std::sort ( and possibly other popular mainstream sorts )
with both different numbers of sort keys and different initial
unsorted distributions. Steven Ross has been providing these
charts/tables in the 'develop' branch of the library.
3) Connected to the previous issue is that a number of reviewers
asked for better documentation on the conditions in which
SpreadSort will show a marked improvement in speed compared to
std::sort, as well as any condition(s) where SpreadSort performs
worse than std::sort. While Steven Ross has explained that
generally SpreadSort will fall back to using std::sort for some
cases ( low N ), I too feel that the documentation needs to be
more extensive for the average user in attempting to delineate the
conditions where SpreadSort works best.
4) A really important issue discussed by a number of reviewers
involves the creation of a synthetic radix, where most users of
sorting algorithm are used to comparison based sorting functors.
While the documentation provides information on the various
individual sorts, as well as examples, I also feel that a
discussion in the documentation of how to use SpreadSort when
sorting structs/classes with multiple keys of different types
would be invaluable. In other words, what and if are the common
techniques for using SpreadSort with complicated keys which, when
using comparison-based sorting, can be represented fairly easily
be a functor.
5) Niall Douglas brought up the issue of exception guarantees when
using SpreadSort, and I concur. The documentation should be
stronger specifying what happens during the sorting algorithms if
an exception occurs and/or does the sorting algorithm itself ever
throw any exceptions and, if so, under what conditions. Steven
Ross mentioned that he is adding exception guarantee information
to his docs.
6) Many reviewers did not like the idea that the library should be
called Sort when it is basically implementing a single sorting
algorithm called SpreadSort. Alternate suggestions are:
a) The library should be called SpreadSort.
b) The library can be called Sort only if other sorting algorithms
are added to it.
c) The library can be called Sort with the proviso that it serves
as a general library for sorting algorithms potentially added to
it in the future.
d) The library should be part of the Boost Algorithm library.
Steven Ross mentioned other potential sorting implementations as
part of the Sort library, under that general name, but all
providing a good deal of extra work. These included a copy-based
stable sorting algorithm, an LSD radix sorting, and an OpenMP
based parallel implementation. Other reviewers mentioned some of
the same types of sorting implementations and a few reviewers
remarked that they also have created sorting implementations of
their own. While I will give my own view at the end, I would like
to mention here that I strongly believe that each sorting
implementation needs to be reviewed separately for acceptance into
Boost and never automatically added to a general library, whether
it be called Sort or Algorithm or whatever.
7) Random issues brought up in comments I will mention, which have
importance but which I do not believe are absolutely intrinsic yet
still might be considered.
a) C++11 techniques in general and inline namespaces in
particular.
b) A version that can work with less than random access.
c) Optimizations for small sets of data ( Steven Ross has already
addressed this to an extent in the 'developer' branch of the
library ).
d) Possible documentation of using SpreadSort as a subsort for
parallel sorting techniques and possible discusion in the docs
about SpreadSort and parallel sorts.
e) General O() information in docs.
f) SpreadSort examples hoisted directly into docs using Quickbook
technique and updating the Spreadort docs to use Quickbook.
Review Result:
Based on the voting for acceptance or not, based on the
discussions during the review period, and based on Steven Ross's
responses, I have decided that Steven Ross's library should be
accepted into Boost.
My main reason for my decision, aside from the votes and comments
and specific issues discussed, is twofold:
1) There is a general consensus that Steven Ross has an excellent
understanding of sorting techniques and algorithms and will
support and refine his library for the benefit of programmers.
2) The empirical tests show that SpreadSort offers an improvement
over std::sort, significantly in the area of large data sets, and
this could be a major benefit to programmers.
I do not know if the review manager has a say in this but based on
the remarks of the reviewers I would like to see the library as
accepted be named 'SpreadSort' rather than just 'Sort'. I do think
that Boost can have a library called 'Sort' but I agree with the
general consensus that a 'Sort' library needs more than one type
of sorting algorithm. I would like to see other people, who
mentioned in the reviews/comments that they have their own sorting
implementation, also submit their own implementations to Boost
and, if this happens and they are accepted, I can see combining
them with 'SpreadSort' into a general Boost sorting library called
'Sort' in the future.
Boost-announce list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk