|
Boost : |
Subject: Re: [boost] [review] Formal review period for Sort library continuing through November 19
From: Julian Gonggrijp (j.gonggrijp_at_[hidden])
Date: 2014-11-16 21:22:22
Dear list, dear Edward, dear Steven,
First of all I would like to congratulate Steven for having his library
finally reviewed. I have been aware of the library for years.
> - How much effort did you put into your evaluation? A glance? A
> quick reading? In-depth study?
When I first learned about the existence of the library (as I said,
several years ago), I read the code and the documentation closely.
Today, I have quickly scanned through the previous discussion, re-read
the introduction and the rationale from the documentation and made a
quick glance over the code to check for any important changes.
> - Are you knowledgeable about the problem domain?
Yes, quite. I have strong formal education in algorithm design and
analysis. I also have a passion for sorting algorithms and I have been in
the process of writing my own C++ sorting library, though I have not had
the chance to work on the latter since early 2012. My own library mostly
contains well-known comparison sorts but it also contains a few
interesting innovations of my own. I have done a lot of benchmarking and
analysis of my own algorithm implementations so Im well aware of the
issues that are relevant in comparing sorting algorithms.
Please note that I do not consider my own library a competitor for Steven
Ross, though. The aims and contents of the libraries seem to be
sufficiently different that there can be a place for both. Apart from
that, Steven Ross library is in a much more production-ready state
than mine.
> - What is your evaluation of the design?
Im not convinced that string_sort, float_sort and integer_sort need to
be separate interfaces. I believe that the function to extract digits
from the values for the MSD part of the algorithm can be captured in a
fully generic way (that should be applicable to all radix-like sorting
algorithms), coupled with two additional, optional parameters (pre-
transform and post-transform) specifying functions for doing the
temporary treat floats as integers trick.
Apart from that, I think the design is sensible.
> - What is your evaluation of the implementation?
Clever.
> - What is your evaluation of the documentation?
Poor.
The rationale, which in my opinion is the most important part of the
documentation and should be in the front, is too lengthy and misses its
target. It does discuss the theoretical advantages of radix sort as well
as the (technical) relative merits of MSD vs LSD and several hybrid
sorting algorithms, to great length, but it does not address the question
why somebody might want to use a hybrid radix/comparison sort in the
first place. It seems to be a defense of design decisions rather than a
justification of the very existence of the library.
The introduction severely and needlessly overstates the value of the
algorithm, starting from the very first sentence:
The Boost Sort library provides a generic implementation of high-speed
sorting algorithms that outperform those in the C++ standard in both
average and worst case performance.
strongly implying that spreadsort is fit for replacing introsort
(std::sort) in all possible situations, which it just isnt.
Then the introduction mentions three conditions under which spreadsort is
fastest relative to std::sort, but it should instead mention two
conditions under which a user may want to consider using it at all:
1. Value type can be treated as a sequence of digits/characters.
2. Many keys to sort and/or very long keys.
This would be more honest about the scope of application, hence more
helpful to the potential user. I believe this does not invalidate the
library. Under the two conditions mentioned above, spreadsort can in fact
offer significant performance improvements. I think the latter has been
sufficiently proven.
The title of the library is also too pretentious, as other reviewers have
already pointed out. I think it should be called Spreadsort rather than
Sort.
So to summarize this part of my review, I think that the documentation is
insufficient as long as it doesnt properly address the following points
in the following order:
1. Why and under which conditions the library may be useful.
2. How to use the library.
3. What is going on inside the library (and why in that way).
The good news is that parts 2 and 3 have already been provided for and
that part 1 should be very short. So I think relatively little work is
needed to fix the documentation.
> - What is your evaluation of the potential usefulness of the
> library?
As I explained above, I think the library has somewhat narrow
applications, but it can be highly valuable when applicable. This is true
of most other sorting algorithms as well. Spreadsort is definitely useful
enough for inclusion in Boost, especially as long as no generic
implementations of other radix/comparison hybrids are available.
> - Did you try to use the library? With what compiler? Did you
> have any problems?
Not this year. I may have tested the library when I first learned about
its existence, but I dont have any clear memory of the results.
>
> And finally, every review should attempt to answer this question:
>
> - Do you think the library should be accepted as a Boost library?
Yes, sort of. I think spreadsort should be added to Boost.Algorithm
(please!), but first the documentation should be improved as I discussed.
While I think that the interface should be more general, that is not a
condition for acceptance as far as Im concerned.
I do not agree with Niall Douglas that the library should add fallbacks
for ranges that dont offer random access. In general, I think that if
you want to sort a range then you should ensure from the outset that it
is a random access range, with the exception of very short ranges, for
which quadratic complexity does not hurt, or linked lists, which can be
efficiently sorted but only with intrusive sorting methods. Neither
situation is relevant for this implementation of spreadsort.
Kind regards,
Julian Gonggrijp
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk