From: Steven Ross (spreadsort_at_[hidden])
Date: 2021-11-24 03:05:17
On Wed, Jan 29, 2020 at 9:21 PM Malte Skarupke via Boost
> > Francisco Tapia, fjtapia_at_[hidden] wrote:
> > In addition to all this, I would like to express my discomfort at having to
> > waste my time, which, as I said before, is very scarce, in repeating a
> > test, which does not contribute anything, because it has not followed any
> > of our recommendations.
> I'm sorry as well that you had to waste your time, but I also don't know what I could have done here... I wasn't on the mailing list at the time and didn't see those recommendations until you just pointed them out to me. But a simple "hey, I think we already looked at this a year ago and here is a link to that discussion" would have been perfectly fine.
> Alternatively, if you are having a discussion about one of my algorithms, include me in the discussion. I would have loved to have known about the discussion when it first happened.
> So, having now finally heard your recommendations, here are my responses:
> > It looks like ska_sort is a radix sort designed mostly for the best-case,
> > and that if you feed it the alrbreaker or binaryalrbreaker distributions it
> > will perform poorly (though not as poorly as radix implementations with no
> > fallback or that fallback to insertionsort).
> I can try to optimize more for these cases. I did of course handle those cases without degrading performance too much (and when sorting containers I skip shared prefixes to make this less likely) but I didn't try at all to optimize this. I didn't think that these were common cases, so I figured "just make sure performance doesn't degrade too much" would be enough. If you happen to remember, do you know what spreadsort does for cases that are traditionally really bad for radix sort? Maybe I can use the same optimization.
Sorry, this whole email chain was filtered by gmail for some reason,
until I made a related search.
What you're looking for is the get_min_count function in spreadsort,
the main part that makes it unique. The source is in here:
If a bucket has less than this size, it will use comparison-based
sorting (or skip it if size < 2), using std::sort or pdqsort, which
are very fast with N < 1000, and provide O(Nlog(N)) worst-case
performance. If it has more elements than this size, it will recurse
again with radix sorting. There is a special case if it can complete
with 1 final radix iteration.
The logic is explained here:
It is really easy to beat spreadsort's performance on evenly
distributed random data if you simplify this step, but that sets your
code up for doing much worse than comparison-based sorting on certain
distributions. In fact, you can do this with spreadsort by tweaking
its constants: https://github.com/boostorg/sort/blob/develop/include/boost/sort/spreadsort/detail/constants.hpp
Some particularly nasty distributions you might want to try:
I'd be happy to take enhancements to spreadsort itself if you can
demonstrate they improve the overall performance.
> > It is not possible to pass a part of the library to C++17 and another part
> > to remain with C++11. Passing it all to C++17 means leaving out many people
> > and many companies, and that goes against the spirit of the library which
> > is to reach as many users as possible.
> Can you elaborate on that? Why is this not possible? Is there anything I can do to help? For example I could remove the ska_sort.hpp include from the common boost/sort/sort.hpp header. That way anyone who includes that header will continue to compile with C++11, and to get ska_sort you would have to directly include boost/sort/ska_sort/ska_sort.hpp.
> > 2.- The algorithm has not implemented worst-case control, nor when all
> > elements are the same or are ordered, as it was indicated in February.
> It looks like spreadsort handles both of these by essentially doing std::is_sorted() on the input before doing any other work. I wouldn't want to do that in ska_sort because it slows down all the partially sorted cases to first have to do this check. But if you want, I can add an overload:
> template<typename It>
> void ska_sort_check_sorted(It begin, It end)
> if (!std::is_sorted(begin, end))
> ska_sort(begin, end);
> With that overload it would be up to the user to decide if they want this check or not. They could either call this overload or ska_sort directly. But I still would prefer to not do that. My philosophy on this follows iammilind's response on this stack overflow question:
It also uses that loop to find the min and max; which isn't much
faster if you skip the sorting check. Finding the min and max help
substantially to speed up radix sorting on distributions that aren't
random, and make it much harder to construct the worst-case
distribution (I have code to do generate it, but it's a pretty weird
distribution). Also, sorted (or partially sorted) input to a sort is
a common case I've seen repeatedly in industry.
> >4.- The only argument of the algorithm is its speed. About this, I have
> > several comments
> > a) When the implementation incorporates worst-case control of repeated and
> > equal elements, the speed may change
> See discussion about why I don't do this above. I expect it to slow down if I call std::is_sorted() first and if the user already knows that it's not sorted, they don't want that overhead.
> > b) The speed of an algorithm is a factor to be taken into account, but it
> > is not the only one. For example, the Timsort algorithm is slower than any
> > other algorithm with random data. But it is extremely fast when the data
> > are almost ordered, a situation that occurs with much more frequency than
> > we imagine. This algorithm is highly valued by programmers and companies,
> > and even the Java language uses it for its sort function.
> > Ska_sort is very fast with the random data, but it is quite mediocre when
> > the data are almost ordered.
> According to your own benchmark results from a year ago, ska_sort does better on almost ordered data than spreadsort and pdqsort. But I confess that it's not a case that I optimized for, mainly because I don't know a good way to make radix sorts behave well on almost sorted data. But if this is a requirement for all new sorting algorithms, I would probably look into wrapping ska_sort in vergesort:
> However if I were to do that, it almost feels like it should be a separate algorithm. Just like pdqsort and spinsort are separate algorithms. Between those two, if users want something that works fast on almost sorted data, they can use spinsort, otherwise they can use pdqsort.
> And if we follow that thinking, and we say that the version that works well on almost sorted data is a separate algorithm, then what's the benefit of doing that over just telling people to use spinsort on almost sorted data? (which is what the documentation currently does)
> > 5) Having a hybrid algorithm like Spreadsort, having another hybrid
> > algorithm is not very interesting, because with small data sets, the time
> > differences, if any, will be small, and with large data sets, parallel
> > algorithms are used, which provide much more speed than any single-stranded
> > algorithm
> I am also thinking about providing a parallel algorithm, but it's a large amount of work. These algorithms take me months to write. So before I do that, this submission of ska_sort is kind of my testing ground to see if this is something I'm interested in doing again. If this goes well, I may work on a parallel version in the future.
> If you're worried about having to maintain two different sorting algorithms, I think that we might be able to deprecate spreadsort and to make it just call ska_sort internally. Since the ska_sort interface can sort more than the spreadsort interface, I think it should be possible to keep all old users working. (with the one complication that ska_sort uses C++17 of course...)
C++17 is a serious issue; we try to be compatible with older versions of C++.
Are you saying ska_sort can sort more without using functors (which
seems possible though I'm curious if this is possible without C++17),
or that it can sort more period, which seems doubtful as stringsort
can sort anything with a strict weak ordering with functors.
> As for whether people care about single threaded performance: In my environment almost all sorts are single threaded. Most cores already have work allocated on them, so if you want to "go wide" for a piece of code, you need to know that there are gaps on other threads running in parallel. If you're not sure about that, just use a single threaded algorithm. Because if the other cores are busy, your parallel tasks get put at the end of the task queue and you've just increased your tail latency a lot. And the performance does really matter. We have used ska_sort to speed things up to run in 1.8 milliseconds instead of 3.3 milliseconds, and that was a big deal. In video games you only have 16ms to do all your work to get something on the screen, so a speedup by a whole millisecond is rare.
> But there is a bigger point for why we should have a new algorithm: I believe that in the future everyone will use radix sort. There is a unique opportunity here for boost to define the interface that those sorting algorithms will use in the future. The spreadsort interface is certainly not the right interface for a generic radix sort, so we should use this opportunity to come up with one.
> Malte Skarupke
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk