Performance isn't only time. It's also memory.
Hello Peter, I had a little bit of spare time to run some quick and simple benchmarks. I can push a branch to github, but in a nutshell: int main() { constexpr size_t n = 500000000; // 500M std::vector<double> v(n); std::mt19937 rng(42); std::uniform_real_distribution<double> d(0,1e6); std::generate(v.begin(),v.end(),[&]{return d(rng);}); // std::sort(v.begin(), v.end()); // boost::algorithm::statsort(v.begin(), v.end()); // boost::sort::spreadsort::spreadsort(v.begin(), v.end()); boost::sort::block_indirect_sort(v.begin(), v.end()); return 0; } Some results below. First some explanation. I'm interested primarily in large arrays, into the billions of elements. I'm interested in processing large data sets as interactively or responsively as possible. The data I have in mind is sensor data, it's unclear how suitable that is for statsort. The distribution varies and has significant variation in density (or clusters). I think the speed of statsort is interesting and noteworthy. (Impressive even!) However the memory overhead is significant. Multithreading would also be a big advantage. spreadsort and block_indirect_sort are known and reliable workhorses and two of my favourite "finds" in boost. (I didn't get fired for using them) I'm inclined to explore this for more complex distributions. And also from an overhead and multi-threading point of view. Fewer buckets? But for the memory reason I don't think it's production-ready, yet. Comments: * std::sort isn't the "gold standard", but it's a reasonable reference. * performance is about both time and space. * broadness of the applicability of statsort isn't so clear, yet. * I have some doubts about the attached memory graph, suggest a double-check. -------------------- std::sort $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): *0:40.50* Maximum resident set size (kbytes): 3908704 boost::algorithm::statsort $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.80 Maximum resident set size (kbytes): *11721376* boost::sort::spreadsort::spreadsort $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.30 Maximum resident set size (kbytes): 3908704 ---> Similar speed but a third of the memory. boost::sort::block_indirect_sort (24 threads) $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): *0:05.00* Maximum resident set size (kbytes): 3916088 ---> AMD Ryzen 9 7900 (24 cores), *3x faster than statsort, 1/3 memory of statsort* no sort, just populate the vector. (for reference) $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.52 Maximum resident set size (kbytes): *3908704*