[boost] Re: Proposing statsort for Boost.Sort

20 May 2026

      Hello,

maybe I didn’t explain myself clearly about how the library works.

Once you add an algorithm to the library, you can’t remove it later, even
if it’s surpassed by newer algorithms, because there are software programs
and companies that are using it.

To add a new algorithm to the library, the criteria are that it must
provide some new functionality or clearly improve the functionality of the
algorithms in the library. For example:

   - - Stable or non-stable sorting
   - - Single-threaded or parallel execution
   - - That the auxiliary memory required for sorting is smaller
   - - Shorter execution times

The goal is not to have a catalog of algorithms, but to have the best
possible algorithms. Therefore, a new algorithm must clearly outperform
existing algorithms.

I have rerun the library benchmark, adding the newly downloaded version of
statsort. The results are

   - **************************************************************
   - **               B O O S T S O R T                          **
   - **          S I N G L E      T H R E A D                    **
   - **      I N T E G E R     B E N C H M A R K                 **
   - **                                                          **
   - ** SORT OF 100,000,000 64-BIT NUMBERS                       **
   - **                                                          **
   - ************************************************************
   -
   - [ 1 ] std::sort [ 2 ] pdqsort [ 3 ] std::stable_sort
   - [ 4 ] spinsort [ 5 ] flat_stable_sort [ 6 ] spreadsort [7] statsort
   -
   -                     |      |      |      |      |      |      |      |
   -                     | [ 1 ]| [ 2 ]| [ 3 ]| [ 4 ]| [ 5 ]| [ 6 ]| [ 7 ]|
   - --------------------+------+------+------+------+------+------+------+
   - random              | 7.96 | 3.56 |10.39 |10.65 |11.79 | 5.12 | 5.93 |
   -                     |      |      |      |      |      |      |      |
   - sorted              | 1.62 | 0.13 | 2.82 | 0.07 | 0.07 | 0.07 | 2.63 |
   - sorted + 0.1% end   | 5.76 | 2.58 | 2.80 | 0.55 | 0.36 | 3.92 | 2.97 |
   - sorted + 1% end     |10.24 | 3.02 | 2.91 | 0.73 | 0.50 | 4.16 | 3.02 |
   - sorted + 10% end    | 7.17 | 3.74 | 3.82 | 1.58 | 1.49 | 4.98 | 3.65 |
   -                     |      |      |      |      |      |      |      |
   - sorted + 0.1% mid   | 3.42 | 2.58 | 3.03 | 2.15 | 2.62 | 2.67 | 2.97 |
   - sorted + 1% mid     | 3.25 | 2.82 | 3.11 | 2.32 | 3.17 | 2.87 | 3.09 |
   - sorted + 10% mid    | 3.79 | 3.57 | 3.97 | 3.49 | 4.74 | 3.59 | 3.70 |
   -                     |      |      |      |      |      |      |      |
   - reverse sorted      | 1.16 | 0.25 | 3.04 | 0.13 | 0.13 | 1.67 | 2.84 |
   - rv sorted + 0.1% end| 6.52 | 2.63 | 3.10 | 0.66 | 0.41 | 3.87 | 3.15 |
   - rv sorted + 1% end  | 4.61 | 2.98 | 3.19 | 0.82 | 0.56 | 4.21 | 3.20 |
   - rv sorted + 10% end | 4.28 | 3.77 | 4.07 | 1.70 | 1.55 | 5.20 | 3.78 |
   -                     |      |      |      |      |      |      |      |
   - rv sorted + 0.1% mid| 6.02 | 2.63 | 3.30 | 2.66 | 3.20 | 2.49 | 3.15 |
   - rv sorted + 1% mid  | 3.78 | 2.88 | 3.45 | 2.86 | 3.81 | 2.76 | 3.27 |
   - rv sorted + 10% mid | 4.41 | 3.64 | 4.26 | 4.04 | 5.24 | 3.69 | 3.86 |
   - --------------------+------+------+------+------+------+------+------
   - Wed, May 20, 2026 12:35:56 AM CES
   - E N D

As you can see, statsort offers no significant advantage over the library
algorithms, being very similar to spreadsort.

Just in case, I ran the benchmark you have in your library, and the results
are

   - ------ benchmark -----
   - === Basic usage ==
   -
   - Sorted prices: 0.5 1.49 3.75 5 8.2 9.99
   - Sorted scores: 17 42 55 63 71 88 95
   - Sorted times: 35.9 36.6 37 38.1 39
   -
   - === Performance comparison ===
   -
   - | Distribution / N    | std::sort | statsort | spreadsort | pdqsort|
   flat_stable_sort| speedup |
   -
   ------------------------------------------------------------------------------------------
   - | Uniform n=10000     | 0.46 ms   |0.25 ms   |**0.19** ms |0.24 ms
   |0.59 ms          | 1.81x |
   - | Gaussian n=10000    | 0.46 ms   |0.26 ms   |**0.18** ms |0.24 ms
   |0.59 ms          | 1.74x |
   - | Exponential n=10000 | 0.47 ms   |0.26 ms   |**0.19** ms |0.25 ms
   |0.58 ms          | 1.81x |
   -
   |-----------------------|---------|--------|------------|--------|--------|-------|
   - | Uniform n=100000      | 5.53 ms |3.06 ms |**2.72** ms |2.74 ms |7.42
   ms | 1.81x |
   - | Gaussian n=100000     | 5.59 ms |3.01 ms |**2.71** ms |2.74 ms |7.32
   ms | 1.86x |
   - | Exponential n=100,000 | 5.66 ms |3.01 ms |**2.67** ms |2.72 ms |7.35
   ms | 1.88x |
   -
   |-----------------------|----------|---------|---------|-------------|---------|-------|
   - | Uniform n=1,000,000   | 67.11 ms |33.30 ms |31.87 ms |**31.18** ms
   |89.85 ms | 2.02x |
   - | Gaussian n=1000000    | 66.50 ms |32.35 ms |35.74 ms |**31.22** ms
   |89.96 ms | 2.06x |
   - | Exponential n=1000000 | 66.96 ms |32.75 ms |31.09 ms |**31.06** ms
   |89.54 ms | 2.04x |
   -
   |-------------------------|-----------|----------|----------|--------------|-----------|-------|
   - | Uniform n=10000000      | 784.89 ms |406.79 ms |389.40 ms
   |**353.94** ms |1064.88 ms | 1.93x |
   - | Gaussian n=10000000     | 784.90 ms |368.30 ms |456.99 ms
   |**351.65** ms |1063.12 ms | 2.13x |
   - | Exponential n=10000000  | 786.76 ms |367.39 ms |373.36 ms
   |**352.91** ms |1063.40 ms | 2.14x |
   -
   |-------------------------|-------------|------------|------------|----------------|---|---|
   - | Uniform n=100000000     | 9033.28 ms  |5041.56 ms  |4565.69 ms
   |**3959.85** ms  |12299.53 ms  | 1.79x |
   - | Gaussian n=100,000,000  | 8,980.69 ms |4,621.05 ms |4,848.83 ms
   |**3,918.20** ms |12,281.62 ms | 1.94x |
   - | Exponential n=100000000 | 9009.31 ms  | 4116.37 ms | 4419.91 ms |
   **3929.04** ms | 12283.25 ms | 2.19x |

In my opinion (I haven’t heard from my two colleagues at the library yet),
statsort is not suitable for inclusion in the Boost Sort library

Francisco José Tapia

fjtapia@gmail.com

El jue, 9 abr 2026 a las 20:20, Peter Taraba via Boost (<
boost@lists.boost.org>) escribió:
...
Hi Nigel,
thank you for the feedback and all the information. Highly appreciated.
Best,
Peter
*Sent:* Tuesday, April 07, 2026 at 4:16 AM
*From:* "Nigel Stewart" <nigels.com@gmail.com>
*To:* "Boost developers' mailing list" <boost@lists.boost.org>
*Cc:* "Peter Taraba" <taraba.peter@mail.com>
*Subject:* Re: [boost] Re: Proposing statsort for Boost.Sort
Hello Peter,
I took a fresh look just now, indeed the polishing work is evident.  I had
some thoughts I'd like to share.
One thing I think is great about boost::sort is the multi-threading.  For
some workloads sorting can definitely be a bottleneck and throwing more
cores at that is beneficial.  In rendering for example, knowing the order
of things towards or away from the current viewpoint, can be useful.  It
does seem like this algorithm is amenable to multi-threading, and think
would be a requirement for production use.
If I had a precomputed bucket per item, that would likely fit into a
16-bit unsigned integer.  And if evaluating that is costly, or not all the
items are changing, would be an additional performance advantage.  So
consider a variant of statsort accepting a seperate array of buckets,
rather than always "lazily" evaluated.  One other advantage of decoupling
the bucket evaluation is that non-uniform distributions can be dealt with
as a seperate concern.  (Such as percentile via CDF by some good enough
method)
I'm curious to know if sqrt(n) buckets is somehow optimal, or if the
performance has been evaluated for other powers such as 0.4 or 0.6?  One
reason I'm curious is if 5 billion items works fine for 16-bit bucket
index, to spare the memory expense of 32-bit bucket index.
One code suggestion is to nest statsort namespace inside detail, rather
than prefixing the function names there.
Best regards,
Nigel Stewart
_______________________________________________
Boost mailing list -- boost@lists.boost.org
To unsubscribe send an email to boost-leave@lists.boost.org
https://lists.boost.org/mailman3/lists/boost.lists.boost.org/
Archived at:
https://lists.boost.org/archives/list/boost@lists.boost.org/message/UYEMTAK7...