is statsort ready to be part of Boost library?

15 May 2026

   500,000,000 >> 100,000,000

   I can't afford more RAM memory. Sorry. So I am not going to speak about
   this.

   Best,
   Peter

   Sent: Thursday, May 21, 2026 at 4:27 AM
   From: "Nigel Stewart via Boost" <boost@lists.boost.org>
   To: "Boost developers' mailing list" <boost@lists.boost.org>
   Cc: "Nigel Stewart" <nigels.com@gmail.com>
   Subject: [boost] Re: is statsort ready to be part of Boost library?
   > Performance isn't only time. It's also memory.
   Hello Peter,
   I had a little bit of spare time to run some quick and simple
   benchmarks.
   I can push a branch to github, but in a nutshell:
   int main()
   {
   constexpr size_t n = 500000000; // 500M
   std::vector<double> v(n);
   std::mt19937 rng(42);
   std::uniform_real_distribution<double> d(0,1e6);
   std::generate(v.begin(),v.end(),[&]{return d(rng);});
   // std::sort(v.begin(), v.end());
   // boost::algorithm::statsort(v.begin(), v.end());
   // boost::sort::spreadsort::spreadsort(v.begin(), v.end());
   boost::sort::block_indirect_sort(v.begin(), v.end());
   return 0;
   }
   Some results below.
   First some explanation.
   I'm interested primarily in large arrays, into the billions of
   elements.
   I'm interested in processing large data sets as interactively or
   responsively as possible.
   The data I have in mind is sensor data, it's unclear how suitable that
   is
   for statsort.
   The distribution varies and has significant variation in density (or
   clusters).
   I think the speed of statsort is interesting and noteworthy.
   (Impressive
   even!)
   However the memory overhead is significant.
   Multithreading would also be a big advantage.
   spreadsort and block_indirect_sort are known and reliable workhorses
   and
   two of my favourite "finds" in boost. (I didn't get fired for using
   them)
   I'm inclined to explore this for more complex distributions.
   And also from an overhead and multi-threading point of view. Fewer
   buckets?
   But for the memory reason I don't think it's production-ready, yet.
   Comments:
   * std::sort isn't the "gold standard", but it's a reasonable reference.
   * performance is about both time and space.
   * broadness of the applicability of statsort isn't so clear, yet.
   * I have some doubts about the attached memory graph, suggest a
   double-check.
   --------------------
   std::sort
   $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)'
   Elapsed (wall clock) time (h:mm:ss or m:ss): *0:40.50*
   Maximum resident set size (kbytes): 3908704
   boost::algorithm::statsort
   $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)'
   Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.80
   Maximum resident set size (kbytes): *11721376*
   boost::sort::spreadsort::spreadsort
   $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)'
   Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.30
   Maximum resident set size (kbytes): 3908704
   ---> Similar speed but a third of the memory.
   boost::sort::block_indirect_sort (24 threads)
   $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)'
   Elapsed (wall clock) time (h:mm:ss or m:ss): *0:05.00*
   Maximum resident set size (kbytes): 3916088
   ---> AMD Ryzen 9 7900 (24 cores), *3x faster than statsort, 1/3 memory
   of
   statsort*
   no sort, just populate the vector. (for reference)
   $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)'
   Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.52
   Maximum resident set size (kbytes): *3908704*
   _______________________________________________
   Boost mailing list -- boost@lists.boost.org
   To unsubscribe send an email to boost-leave@lists.boost.org
   [1]https://lists.boost.org/mailman3/lists/boost.lists.boost.org/
   Archived at:
   [2]https://lists.boost.org/archives/list/boost@lists.boost.org/message/
   3POI7STJJ3KV3QQ7HVNTMQRFP2E325JX/

References

   1. https://lists.boost.org/mailman3/lists/boost.lists.boost.org/
   2. https://lists.boost.org/archives/list/boost@lists.boost.org/message/3POI7STJJ3KV3QQ7HVNTMQRFP2E325JX/

Peter Taraba

Nigel Stewart

Peter Taraba

Peter Taraba

Nigel Stewart

Peter Taraba

Nigel Stewart

Peter Taraba

tags

participants (2)