From: Malte Skarupke (malteskarupke_at_[hidden])
Date: 2020-01-30 02:14:13
> Francisco Tapia, fjtapia_at_[hidden] wrote:
> In addition to all this, I would like to express my discomfort at having to
> waste my time, which, as I said before, is very scarce, in repeating a
> test, which does not contribute anything, because it has not followed any
> of our recommendations.
I'm sorry as well that you had to waste your time, but I also don't know what I could have done here... I wasn't on the mailing list at the time and didn't see those recommendations until you just pointed them out to me. But a simple "hey, I think we already looked at this a year ago and here is a link to that discussion" would have been perfectly fine.
Alternatively, if you are having a discussion about one of my algorithms, include me in the discussion. I would have loved to have known about the discussion when it first happened.
So, having now finally heard your recommendations, here are my responses:
> It looks like ska_sort is a radix sort designed mostly for the best-case,
> and that if you feed it the alrbreaker or binaryalrbreaker distributions it
> will perform poorly (though not as poorly as radix implementations with no
> fallback or that fallback to insertionsort).
I can try to optimize more for these cases. I did of course handle those cases without degrading performance too much (and when sorting containers I skip shared prefixes to make this less likely) but I didn't try at all to optimize this. I didn't think that these were common cases, so I figured "just make sure performance doesn't degrade too much" would be enough. If you happen to remember, do you know what spreadsort does for cases that are traditionally really bad for radix sort? Maybe I can use the same optimization.
> It is not possible to pass a part of the library to C++17 and another part
> to remain with C++11. Passing it all to C++17 means leaving out many people
> and many companies, and that goes against the spirit of the library which
> is to reach as many users as possible.
Can you elaborate on that? Why is this not possible? Is there anything I can do to help? For example I could remove the ska_sort.hpp include from the common boost/sort/sort.hpp header. That way anyone who includes that header will continue to compile with C++11, and to get ska_sort you would have to directly include boost/sort/ska_sort/ska_sort.hpp.
> 2.- The algorithm has not implemented worst-case control, nor when all
> elements are the same or are ordered, as it was indicated in February.
It looks like spreadsort handles both of these by essentially doing std::is_sorted() on the input before doing any other work. I wouldn't want to do that in ska_sort because it slows down all the partially sorted cases to first have to do this check. But if you want, I can add an overload:
void ska_sort_check_sorted(It begin, It end)
if (!std::is_sorted(begin, end))
With that overload it would be up to the user to decide if they want this check or not. They could either call this overload or ska_sort directly. But I still would prefer to not do that. My philosophy on this follows iammilind's response on this stack overflow question:
> 3.- The algorithm is incomplete. It is to be expected from the algorithm
> that it orders, not only integers but also signed and unsigned integers, 32
> 64 and 128-bit floating-point numbers and strings.
It already sorts all kinds of integers as well as float, double and long double, strings, tuples, vectors, variants, optionals, lists, maps and other types that I'm forgetting. I can add support for 128 bit floating point numbers (it's exactly the same logic as for long doubles) but it doesn't seem that widely used. Microsoft doesn't even support it in their compiler. Also spreadsort does not support it.
>4.- The only argument of the algorithm is its speed. About this, I have
> several comments
> a) When the implementation incorporates worst-case control of repeated and
> equal elements, the speed may change
See discussion about why I don't do this above. I expect it to slow down if I call std::is_sorted() first and if the user already knows that it's not sorted, they don't want that overhead.
> b) The speed of an algorithm is a factor to be taken into account, but it
> is not the only one. For example, the Timsort algorithm is slower than any
> other algorithm with random data. But it is extremely fast when the data
> are almost ordered, a situation that occurs with much more frequency than
> we imagine. This algorithm is highly valued by programmers and companies,
> and even the Java language uses it for its sort function.
> Ska_sort is very fast with the random data, but it is quite mediocre when
> the data are almost ordered.
According to your own benchmark results from a year ago, ska_sort does better on almost ordered data than spreadsort and pdqsort. But I confess that it's not a case that I optimized for, mainly because I don't know a good way to make radix sorts behave well on almost sorted data. But if this is a requirement for all new sorting algorithms, I would probably look into wrapping ska_sort in vergesort:
However if I were to do that, it almost feels like it should be a separate algorithm. Just like pdqsort and spinsort are separate algorithms. Between those two, if users want something that works fast on almost sorted data, they can use spinsort, otherwise they can use pdqsort.
And if we follow that thinking, and we say that the version that works well on almost sorted data is a separate algorithm, then what's the benefit of doing that over just telling people to use spinsort on almost sorted data? (which is what the documentation currently does)
> 5) Having a hybrid algorithm like Spreadsort, having another hybrid
> algorithm is not very interesting, because with small data sets, the time
> differences, if any, will be small, and with large data sets, parallel
> algorithms are used, which provide much more speed than any single-stranded
I am also thinking about providing a parallel algorithm, but it's a large amount of work. These algorithms take me months to write. So before I do that, this submission of ska_sort is kind of my testing ground to see if this is something I'm interested in doing again. If this goes well, I may work on a parallel version in the future.
If you're worried about having to maintain two different sorting algorithms, I think that we might be able to deprecate spreadsort and to make it just call ska_sort internally. Since the ska_sort interface can sort more than the spreadsort interface, I think it should be possible to keep all old users working. (with the one complication that ska_sort uses C++17 of course...)
As for whether people care about single threaded performance: In my environment almost all sorts are single threaded. Most cores already have work allocated on them, so if you want to "go wide" for a piece of code, you need to know that there are gaps on other threads running in parallel. If you're not sure about that, just use a single threaded algorithm. Because if the other cores are busy, your parallel tasks get put at the end of the task queue and you've just increased your tail latency a lot. And the performance does really matter. We have used ska_sort to speed things up to run in 1.8 milliseconds instead of 3.3 milliseconds, and that was a big deal. In video games you only have 16ms to do all your work to get something on the screen, so a speedup by a whole millisecond is rare.
But there is a bigger point for why we should have a new algorithm: I believe that in the future everyone will use radix sort. There is a unique opportunity here for boost to define the interface that those sorting algorithms will use in the future. The spreadsort interface is certainly not the right interface for a generic radix sort, so we should use this opportunity to come up with one.
-- Malte Skarupke malteskarupke_at_[hidden]
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk