I can't afford more RAM memory. Sorry. So I am not going to speak about this.
Hello Peter, Same observations apply for 100M if that's more practical for everyone. I've been tinkering some more and came across a nice improvement affecting both time and memory use. Would like to talk about that. These numbers are using size of 500M. (Apologies) There is a phase of statsort that seems concerned with detecting the case that all the items are in the same bucket. I think the hazard is that if a large number of items are the same, or nearly the same, statsort will keep recursing on that bucket, but not necessarily succeed at that. std::size_t nonempty = 0; for (std::size_t i = 0; i < m; ++i) if (cnt[i] > 0) ++nonempty; if (nonempty == 1) { std::sort(scratch, scratch + n); std::copy(scratch, scratch + n, data); return; } So using a distribution that is 25% uniform and 75% fixed value: constexpr size_t n = 500000000; // 500M std::vector<double> v(n); std::mt19937 rng(42); std::uniform_real_distribution<double> d(0, 1.0); std::generate(v.begin(),v.end(),[&]{return d(rng);}); std::fill(v.begin(), v.begin() + n/2 + n/4, 0.5); // 75% of values are the same statsort does handle this correctly. But it turns out this for loop isn't the only way. The last loop can check if the bucket size equals the number of items: if (bsize < n) { const double bmin = (b) * range / m + min; const double bmax = (b + 1) * range / m + min; statsort(scratch + bstart, bsize, bmin, bmax, data + bstart, sort); } else { const auto minmax = std::minmax_element(scratch, scratch + n); if (*minmax.first != *minmax.second) { statsort(scratch, n, *minmax.first, *minmax.second, data, sort); } break; } And the benefit of this seems significant for this situation. It doesn't seem worthwhile to do the std::minmax_element for every recursion. But in this specific "heavy bucket" situation it seems worthwhile indeed. Before: $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.37 Maximum resident set size (kbytes): 11723140 After: $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.28 Maximum resident set size (kbytes): 781727 Keeping in mind that skipping the sort takes 4s. $ /usr/bin/time -v ./statsort_bench 2>&1 | egrep -E '(Elapsed|Maximum)' Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.05 Maximum resident set size (kbytes): 7815532