|
Boost : |
From: Malte Skarupke (malteskarupke_at_[hidden])
Date: 2019-10-20 21:22:25
Thanks for the replies. I finished a first pass of the documentation, tests, and other work of including my algorithm in boost and uploaded it here:
https://github.com/skarupke/sort/tree/master
I also re-ran all the single threaded benchmarks and updated the documentation.
For the big O complexity: I am actually still not quite sure what it is. I used similar notation to what spreadsort uses. I think O(n*key_length) is correct for the worst case, but for long keys (like when sorting long strings) the algorithm may fall back to comparison based sorting, so I used the same thing as spreadsort: min(O(n log n), O(n key_length)).
For speed: ska_sort should be the fastest sorting algorithm even for small numbers of items. I included three graphs of benchmarks, for sorting ints, floats and strings:
https://github.com/skarupke/sort/blob/master/doc/images/sorting_ints.png
https://github.com/skarupke/sort/blob/master/doc/images/sorting_strings.png
https://github.com/skarupke/sort/blob/master/doc/images/sorting_floats.png
For ints and strings ska_sort is always fastest, for floats pdqsort is briefly faster, but starting at N >= 128 ska_sort is faster. Part of the work of this new version of ska_sort was to make radix sort really fast even for small N. Before I would use std::sort for any small array of less than 128 items. Now that cutoff is at 32 items, so I skip straight to insertion sort and never do std::sort.
The biggest effect of this change is that it removes the "waves" from the benchmark graph, like we still see for spreadsort in the above picture. I used to also have those when I gave a talk about this algorithm. Now the graph is much more smooth. (this also confuses my answer to the question "what is the big O complexity of this algorithm?" Because the O(n*b) answer that I gave in my talk was justified by the presence of the waves)
The biggest questions I have at this point are around the interface. I reworked that completely so that even relatively advanced use cases are now pretty straightforward. I don't think that anyone else will ever need to use those advanced features, so I kept the documentation of them small, choosing instead to focus on the common use cases. And I made sure to give lots of examples of converting existing comparison-based sorting examples to my interface.
Oh and for the big point of a parallel version of this algorithm: At this point I have no idea how to do that. I haven't even begun to look into parallel algorithms. The current version was a lot of work already. I think that'll have to be a project for another time. (it's definitely appealing though since currently the sorted algorithms are only 2.4 times as fast as ska_sort on a machine with 8 cores/16 hyper threads in benchmark_numbers.cpp, so even a little bit of parallelism would beat the existing algorithms...)
Francisco, since you'll be busy in the near term it sounds like I can't expect too much response for another month, but I would really appreciate it if you could look at it a little just to see if anything obvious stands out to you that I can work on while I wait. (otherwise no biggie, I understand how hard it can be to find time for maintaining projects, since that is exactly one of the reasons why I want to get this into boost...)
Thanks,
Malte
-- Malte Skarupke malteskarupke_at_[hidden] > Date: Sun, 20 Oct 2019 13:24:06 +0200 > From: Francisco Jos? Tapia <fjtapia_at_[hidden]> > To: boost_at_[hidden] > Subject: Re: [boost] Adding my sorting algorithm, ska_sort to boost > Message-ID: > <CADBNGqQJpHux3AdRJON-nh76BaouRn1FRO4_LrMR76TywmCH+g_at_[hidden]> > Content-Type: text/plain; charset="UTF-8" > > Hi Malte, > I am Francisco Tapia, one of the co-authors of the Sort library. > We will examine your algorithm. But we must examine the internal parts of > the algorithm, and evaluate the correctness of the algorithm, how is > implemented the worst case and a couple of details. After, we check the > speed with the benchmarks used in the library, with several sets of data. > Since the near sorted to the fully unsorted, and from the single numbers to > structs with heavy comparison functions. > > Now, I am overburdened by work, and I can't revise until December. But we > will do and provide to you a reasoned response, and according to it, decide > the inclusion or not in the library. > As commented by Phil Endecott, the most interesting now is the parallel > algorithms. Today, we don't have any parallel algorithm based on radix sort > or any other hybrid algorithm. > Because with small sets of data the differences are insignificant. But when > you have big sets of data, use parallel algorithms, and then, the new > single thread algorithms are in the middle, in the no man's land. > > If you have any doubt or question, contact me, and I will try to answer you. > > El mar., 15 oct. 2019 a las 23:06, Kostas Savvidis via Boost (< > boost_at_[hidden]>) escribi?: > > > > > > On Oct 13, 2019, at 03:06, Malte Skarupke via Boost < > > boost_at_[hidden]> wrote: > > > > > > Hi, > > > a couple of years ago I wrote a generalized version of radix sort, > > called ska_sort. I gave a talk about it at CppNow, available here: > > > https://www.youtube.com/watch?v=zqs87a_7zxw > > > > > > I have a new version of that algorithm that is both faster and more > > general. I can now sort almost anything. For example when I gave the talk I > > couldn't sort a std::vector<std::list<int>> because std::list doesn't have > > a random access interface, but now that works. (and it's faster than > > sorting the same thing with std::sort) > > > > > > > This seems like a product of a large investment of effort. > > The big advantage over any previously available radix_sort > > is the ability to sort almost any datatype if a key maps to an integer > > somehow. > > Plus, an advantage in measured practical speed, and apparently no claims > > about Big O improvement. > > > > The average performance is O(n*b) where b is unknown and/or dependent on n > > ? > > > > Is the theoretical worst-case O(n^2) ? Why? Is it because of working > > in-place? > > (yes, I understand you fall back to std::sort if you hit this case) > > > > Is this software/algorithm fully documented? - I mean by another way than > > reading the source code. > > > > Maybe too many questions at once, but it was not clear after watching your > > talk at cppcon and the blog posts. > > > > Cheers, > > Kostas > > > > <https://www.youtube.com/watch?v=Nz1KZXbghj8> > > > > _______________________________________________ > > Unsubscribe & other changes: > > http://lists.boost.org/mailman/listinfo.cgi/boost > > > > > ------------------------------ > > Subject: Digest Footer >
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk