is statsort ready to be part of Boost library?
Hello Peter Performance isn't only time. It's also memory. Are there numbers available for that too? - Nigel
Performance isn't only time. It's also memory.
Hello Peter, I had a little bit of spare time to run some quick and simple benchmarks. I can push a branch to github, but in a nutshell: int main() { constexpr size_t n = 500000000; // 500M std::vector<double> v(n); std::mt19937 rng(42); std::uniform_real_distribution<double> d(0,1e6); std::generate(v.begin(),v.end(),[&]{return d(rng);}); // std::sort(v.begin(), v.end()); // boost::algorithm::statsort(v.begin(), v.end()); // boost::sort::spreadsort::spreadsort(v.begin(), v.end()); boost::sort::block_indirect_sort(v.begin(), v.end()); return 0; } Some results below. First some explanation. I'm interested primarily in large arrays, into the billions of elements. I'm interested in processing large data sets as interactively or responsively as possible. The data I have in mind is sensor data, it's unclear how suitable that is for statsort. The distribution varies and has significant variation in density (or clusters). I think the speed of statsort is interesting and noteworthy. (Impressive even!) However the memory overhead is significant. Multithreading would also be a big advantage. spreadsort and block_indirect_sort are known and reliable workhorses and two of my favourite "finds" in boost. (I didn't get fired for using them) I'm inclined to explore this for more complex distributions. And also from an overhead and multi-threading point of view. Fewer buckets? But for the memory reason I don't think it's production-ready, yet. Comments: * std::sort isn't the "gold standard", but it's a reasonable reference. * performance is about both time and space. * broadness of the applicability of statsort isn't so clear, yet. * I have some doubts about the attached memory graph, suggest a double-check. -------------------- std::sort $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): *0:40.50* Maximum resident set size (kbytes): 3908704 boost::algorithm::statsort $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.80 Maximum resident set size (kbytes): *11721376* boost::sort::spreadsort::spreadsort $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.30 Maximum resident set size (kbytes): 3908704 ---> Similar speed but a third of the memory. boost::sort::block_indirect_sort (24 threads) $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): *0:05.00* Maximum resident set size (kbytes): 3916088 ---> AMD Ryzen 9 7900 (24 cores), *3x faster than statsort, 1/3 memory of statsort* no sort, just populate the vector. (for reference) $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.52 Maximum resident set size (kbytes): *3908704*
500,000,000 >> 100,000,000
I can't afford more RAM memory. Sorry. So I am not going to speak about
this.
Best,
Peter
Sent: Thursday, May 21, 2026 at 4:27 AM
From: "Nigel Stewart via Boost" <boost@lists.boost.org>
To: "Boost developers' mailing list" <boost@lists.boost.org>
Cc: "Nigel Stewart" <nigels.com@gmail.com>
Subject: [boost] Re: is statsort ready to be part of Boost library?
> Performance isn't only time. It's also memory.
Hello Peter,
I had a little bit of spare time to run some quick and simple
benchmarks.
I can push a branch to github, but in a nutshell:
int main()
{
constexpr size_t n = 500000000; // 500M
std::vector<double> v(n);
std::mt19937 rng(42);
std::uniform_real_distribution<double> d(0,1e6);
std::generate(v.begin(),v.end(),[&]{return d(rng);});
// std::sort(v.begin(), v.end());
// boost::algorithm::statsort(v.begin(), v.end());
// boost::sort::spreadsort::spreadsort(v.begin(), v.end());
boost::sort::block_indirect_sort(v.begin(), v.end());
return 0;
}
Some results below.
First some explanation.
I'm interested primarily in large arrays, into the billions of
elements.
I'm interested in processing large data sets as interactively or
responsively as possible.
The data I have in mind is sensor data, it's unclear how suitable that
is
for statsort.
The distribution varies and has significant variation in density (or
clusters).
I think the speed of statsort is interesting and noteworthy.
(Impressive
even!)
However the memory overhead is significant.
Multithreading would also be a big advantage.
spreadsort and block_indirect_sort are known and reliable workhorses
and
two of my favourite "finds" in boost. (I didn't get fired for using
them)
I'm inclined to explore this for more complex distributions.
And also from an overhead and multi-threading point of view. Fewer
buckets?
But for the memory reason I don't think it's production-ready, yet.
Comments:
* std::sort isn't the "gold standard", but it's a reasonable reference.
* performance is about both time and space.
* broadness of the applicability of statsort isn't so clear, yet.
* I have some doubts about the attached memory graph, suggest a
double-check.
--------------------
std::sort
$ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)'
Elapsed (wall clock) time (h:mm:ss or m:ss): *0:40.50*
Maximum resident set size (kbytes): 3908704
boost::algorithm::statsort
$ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)'
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.80
Maximum resident set size (kbytes): *11721376*
boost::sort::spreadsort::spreadsort
$ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)'
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.30
Maximum resident set size (kbytes): 3908704
---> Similar speed but a third of the memory.
boost::sort::block_indirect_sort (24 threads)
$ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)'
Elapsed (wall clock) time (h:mm:ss or m:ss): *0:05.00*
Maximum resident set size (kbytes): 3916088
---> AMD Ryzen 9 7900 (24 cores), *3x faster than statsort, 1/3 memory
of
statsort*
no sort, just populate the vector. (for reference)
$ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)'
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.52
Maximum resident set size (kbytes): *3908704*
_______________________________________________
Boost mailing list -- boost@lists.boost.org
To unsubscribe send an email to boost-leave@lists.boost.org
[1]https://lists.boost.org/mailman3/lists/boost.lists.boost.org/
Archived at:
[2]https://lists.boost.org/archives/list/boost@lists.boost.org/message/
3POI7STJJ3KV3QQ7HVNTMQRFP2E325JX/
References
1. https://lists.boost.org/mailman3/lists/boost.lists.boost.org/
2. https://lists.boost.org/archives/list/boost@lists.boost.org/message/3POI7STJJ3KV3QQ7HVNTMQRFP2E325JX/
I can't afford more RAM memory. Sorry. So I am not going to speak about this.
Hello Peter, Same observations apply for 100M if that's more practical for everyone. I've been tinkering some more and came across a nice improvement affecting both time and memory use. Would like to talk about that. These numbers are using size of 500M. (Apologies) There is a phase of statsort that seems concerned with detecting the case that all the items are in the same bucket. I think the hazard is that if a large number of items are the same, or nearly the same, statsort will keep recursing on that bucket, but not necessarily succeed at that. std::size_t nonempty = 0; for (std::size_t i = 0; i < m; ++i) if (cnt[i] > 0) ++nonempty; if (nonempty == 1) { std::sort(scratch, scratch + n); std::copy(scratch, scratch + n, data); return; } So using a distribution that is 25% uniform and 75% fixed value: constexpr size_t n = 500000000; // 500M std::vector<double> v(n); std::mt19937 rng(42); std::uniform_real_distribution<double> d(0, 1.0); std::generate(v.begin(),v.end(),[&]{return d(rng);}); std::fill(v.begin(), v.begin() + n/2 + n/4, 0.5); // 75% of values are the same statsort does handle this correctly. But it turns out this for loop isn't the only way. The last loop can check if the bucket size equals the number of items: if (bsize < n) { const double bmin = (b) * range / m + min; const double bmax = (b + 1) * range / m + min; statsort(scratch + bstart, bsize, bmin, bmax, data + bstart, sort); } else { const auto minmax = std::minmax_element(scratch, scratch + n); if (*minmax.first != *minmax.second) { statsort(scratch, n, *minmax.first, *minmax.second, data, sort); } break; } And the benefit of this seems significant for this situation. It doesn't seem worthwhile to do the std::minmax_element for every recursion. But in this specific "heavy bucket" situation it seems worthwhile indeed. Before: $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.37 Maximum resident set size (kbytes): 11723140 After: $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.28 Maximum resident set size (kbytes): 781727 Keeping in mind that skipping the sort takes 4s. $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.05 Maximum resident set size (kbytes): 7815532
Feel free to do pull request on codeberg.... -- Sent with [1]mail.com Mail app On 5/23/26, 3:52 PM Nigel Stewart via Boost <boost@[2]lists.boost.org> wrote: > > I can't afford more RAM memory. Sorry. So I am not going to speak about > this. > Hello Peter, Same observations apply for 100M if that's more practical for everyone. I've been tinkering some more and came across a nice improvement affecting both time and memory use. Would like to talk about that. These numbers are using size of 500M. (Apologies) There is a phase of statsort that seems concerned with detecting the case that all the items are in the same bucket. I think the hazard is that if a large number of items are the same, or nearly the same, statsort will keep recursing on that bucket, but not necessarily succeed at that. std::size_t nonempty = 0; for (std::size_t i = 0; i < m; ++i) if (cnt[i] > 0) ++nonempty; if (nonempty == 1) { std::sort(scratch, scratch + n); std::copy(scratch, scratch + n, data); return; } So using a distribution that is 25% uniform and 75% fixed value: constexpr size_t n = 500000000; // 500M std::vector<double> v(n); std::mt19937 rng(42); std::uniform_real_distribution<double> d(0, 1.0); std::generate(v.begin(),v.end(),[&]{return d(rng);}); std::fill(v.begin(), v.begin() + n/2 + n/4, 0.5); // 75% of values are the same statsort does handle this correctly. But it turns out this for loop isn't the only way. The last loop can check if the bucket size equals the number of items: if (bsize < n) { const double bmin = (b) * range / m + min; const double bmax = (b + 1) * range / m + min; statsort(scratch + bstart, bsize, bmin, bmax, data + bstart, sort); } else { const auto minmax = std::minmax_element(scratch, scratch + n); if (*minmax.first != *minmax.second) { statsort(scratch, n, *minmax.first, *minmax.second, data, sort); } break; } And the benefit of this seems significant for this situation. It doesn't seem worthwhile to do the std::minmax_element for every recursion. But in this specific "heavy bucket" situation it seems worthwhile indeed. Before: $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.37 Maximum resident set size (kbytes): 11723140 After: $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.28 Maximum resident set size (kbytes): 781727 Keeping in mind that skipping the sort takes 4s. $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.05 Maximum resident set size (kbytes): 7815532 _______________________________________________ Boost mailing list -- boost@[3]lists.boost.org To unsubscribe send an email to boost-leave@[4]lists.boost.org [5]https://lists.boost.org/mailman3/lists/boost.lists.boost.org/ Archived at: [6]https://lists.boost.org/archives/list/boost@lists.boost.org/messa ge/ILQDIB4MLTB5FWAI7UALQEBMZRLVFI7M/ References 1. http://mail.com/ 2. http://lists.boost.org/ 3. http://lists.boost.org/ 4. http://lists.boost.org/ 5. https://lists.boost.org/mailman3/lists/boost.lists.boost.org/ 6. https://lists.boost.org/archives/list/boost@lists.boost.org/message/ILQDIB4M...
participants (2)
-
Nigel Stewart -
Peter Taraba