Boost logo

Boost :

Subject: Re: [boost] [review] Formal review period for Sort library continuing through November 19
From: Steven Ross (spreadsort_at_[hidden])
Date: 2014-11-16 23:02:06


Thank you for your feedback Julian. I'll improve the documentation.

On Sun Nov 16 2014 at 9:22:36 PM Julian Gonggrijp <j.gonggrijp_at_[hidden]>
wrote:

>
> > - What is your evaluation of the design?
>
> I’m not convinced that string_sort, float_sort and integer_sort need to
> be separate interfaces. I believe that the function to extract “digits”
> from the values for the MSD part of the algorithm can be captured in a
> fully generic way (that should be applicable to all radix-like sorting
> algorithms), coupled with two additional, optional parameters (pre-
> transform and post-transform) specifying functions for doing the
> temporary “treat floats as integers” trick.
>
> string_sort is this fully generic approach. integer_sort and float_sort
can be seen as specializations that allow bit-level granularity of the
radix operations and work well for keys that fit into a machine word. If
you can think of some way to enable the sort-floats-as-ints trick as an
option without harming performance for non-floats, I'd be very interested.

> The rationale, which in my opinion is the most important part of the
> documentation and should be in the front, is too lengthy and misses its
> target. It does discuss the theoretical advantages of radix sort as well
> as the (technical) relative merits of MSD vs LSD and several hybrid
> sorting algorithms, to great length, but it does not address the question
> why somebody might want to use a hybrid radix/comparison sort in the
> first place. It seems to be a defense of design decisions rather than a
> justification of the very existence of the library.
>
> The introduction severely and needlessly overstates the value of the
> algorithm, starting from the very first sentence:
>
> “The Boost Sort library provides a generic implementation of high-speed
> sorting algorithms that outperform those in the C++ standard in both
> average and worst case performance.”
>
> strongly implying that spreadsort is fit for replacing introsort
> (std::sort) in all possible situations, which it just isn’t.
>
> Then the introduction mentions three conditions under which spreadsort is
> “fastest relative to std::sort”, but it should instead mention two
> conditions under which a user may want to consider using it at all:
>
> 1. Value type can be treated as a sequence of digits/characters.
>

Find me a type that you don't think this applies to, and I'll show you how
it can. If std::sort can sort it, so can string_sort if you use the
available functors properly. Maybe I should explain this in the
documentation?

I will agreed to the extent that if performance isn't very critical
relative to the code complexity of creating an efficient radix key
expansion equivalent to the comparison operation, then people probably are
better off with std::sort.

> 2. Many keys to sort and/or very long keys.
>
> I agree, if most of your sorting runtime is going to spent on lists of <
1000 elements, this library is of little use, though it should also do
little harm due to the fallback to std::sort. I'll edit the docs in
develop to make that more clear.

> So to summarize this part of my review, I think that the documentation is
> insufficient as long as it doesn’t properly address the following points
> in the following order:
>
> 1. Why and under which conditions the library may be useful.
> 2. How to use the library.
> 3. What is going on inside the library (and why in that way).
>
> The good news is that parts 2 and 3 have already been provided for and
> that part 1 should be very short. So I think relatively little work is
> needed to fix the documentation.
>
> Will do.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk