Hello, maybe I didn’t explain myself clearly about how the library works. Once you add an algorithm to the library, you can’t remove it later, even if it’s surpassed by newer algorithms, because there are software programs and companies that are using it. To add a new algorithm to the library, the criteria are that it must provide some new functionality or clearly improve the functionality of the algorithms in the library. For example: - - Stable or non-stable sorting - - Single-threaded or parallel execution - - That the auxiliary memory required for sorting is smaller - - Shorter execution times The goal is not to have a catalog of algorithms, but to have the best possible algorithms. Therefore, a new algorithm must clearly outperform existing algorithms. I have rerun the library benchmark, adding the newly downloaded version of statsort. The results are - ************************************************************** - ** B O O S T S O R T ** - ** S I N G L E T H R E A D ** - ** I N T E G E R B E N C H M A R K ** - ** ** - ** SORT OF 100,000,000 64-BIT NUMBERS ** - ** ** - ************************************************************ - - [ 1 ] std::sort [ 2 ] pdqsort [ 3 ] std::stable_sort - [ 4 ] spinsort [ 5 ] flat_stable_sort [ 6 ] spreadsort [7] statsort - - | | | | | | | | - | [ 1 ]| [ 2 ]| [ 3 ]| [ 4 ]| [ 5 ]| [ 6 ]| [ 7 ]| - --------------------+------+------+------+------+------+------+------+ - random | 7.96 | 3.56 |10.39 |10.65 |11.79 | 5.12 | 5.93 | - | | | | | | | | - sorted | 1.62 | 0.13 | 2.82 | 0.07 | 0.07 | 0.07 | 2.63 | - sorted + 0.1% end | 5.76 | 2.58 | 2.80 | 0.55 | 0.36 | 3.92 | 2.97 | - sorted + 1% end |10.24 | 3.02 | 2.91 | 0.73 | 0.50 | 4.16 | 3.02 | - sorted + 10% end | 7.17 | 3.74 | 3.82 | 1.58 | 1.49 | 4.98 | 3.65 | - | | | | | | | | - sorted + 0.1% mid | 3.42 | 2.58 | 3.03 | 2.15 | 2.62 | 2.67 | 2.97 | - sorted + 1% mid | 3.25 | 2.82 | 3.11 | 2.32 | 3.17 | 2.87 | 3.09 | - sorted + 10% mid | 3.79 | 3.57 | 3.97 | 3.49 | 4.74 | 3.59 | 3.70 | - | | | | | | | | - reverse sorted | 1.16 | 0.25 | 3.04 | 0.13 | 0.13 | 1.67 | 2.84 | - rv sorted + 0.1% end| 6.52 | 2.63 | 3.10 | 0.66 | 0.41 | 3.87 | 3.15 | - rv sorted + 1% end | 4.61 | 2.98 | 3.19 | 0.82 | 0.56 | 4.21 | 3.20 | - rv sorted + 10% end | 4.28 | 3.77 | 4.07 | 1.70 | 1.55 | 5.20 | 3.78 | - | | | | | | | | - rv sorted + 0.1% mid| 6.02 | 2.63 | 3.30 | 2.66 | 3.20 | 2.49 | 3.15 | - rv sorted + 1% mid | 3.78 | 2.88 | 3.45 | 2.86 | 3.81 | 2.76 | 3.27 | - rv sorted + 10% mid | 4.41 | 3.64 | 4.26 | 4.04 | 5.24 | 3.69 | 3.86 | - --------------------+------+------+------+------+------+------+------ - Wed, May 20, 2026 12:35:56 AM CES - E N D As you can see, statsort offers no significant advantage over the library algorithms, being very similar to spreadsort. Just in case, I ran the benchmark you have in your library, and the results are - ------ benchmark ----- - === Basic usage == - - Sorted prices: 0.5 1.49 3.75 5 8.2 9.99 - Sorted scores: 17 42 55 63 71 88 95 - Sorted times: 35.9 36.6 37 38.1 39 - - === Performance comparison === - - | Distribution / N | std::sort | statsort | spreadsort | pdqsort| flat_stable_sort| speedup | - ------------------------------------------------------------------------------------------ - | Uniform n=10000 | 0.46 ms |0.25 ms |**0.19** ms |0.24 ms |0.59 ms | 1.81x | - | Gaussian n=10000 | 0.46 ms |0.26 ms |**0.18** ms |0.24 ms |0.59 ms | 1.74x | - | Exponential n=10000 | 0.47 ms |0.26 ms |**0.19** ms |0.25 ms |0.58 ms | 1.81x | - |-----------------------|---------|--------|------------|--------|--------|-------| - | Uniform n=100000 | 5.53 ms |3.06 ms |**2.72** ms |2.74 ms |7.42 ms | 1.81x | - | Gaussian n=100000 | 5.59 ms |3.01 ms |**2.71** ms |2.74 ms |7.32 ms | 1.86x | - | Exponential n=100,000 | 5.66 ms |3.01 ms |**2.67** ms |2.72 ms |7.35 ms | 1.88x | - |-----------------------|----------|---------|---------|-------------|---------|-------| - | Uniform n=1,000,000 | 67.11 ms |33.30 ms |31.87 ms |**31.18** ms |89.85 ms | 2.02x | - | Gaussian n=1000000 | 66.50 ms |32.35 ms |35.74 ms |**31.22** ms |89.96 ms | 2.06x | - | Exponential n=1000000 | 66.96 ms |32.75 ms |31.09 ms |**31.06** ms |89.54 ms | 2.04x | - |-------------------------|-----------|----------|----------|--------------|-----------|-------| - | Uniform n=10000000 | 784.89 ms |406.79 ms |389.40 ms |**353.94** ms |1064.88 ms | 1.93x | - | Gaussian n=10000000 | 784.90 ms |368.30 ms |456.99 ms |**351.65** ms |1063.12 ms | 2.13x | - | Exponential n=10000000 | 786.76 ms |367.39 ms |373.36 ms |**352.91** ms |1063.40 ms | 2.14x | - |-------------------------|-------------|------------|------------|----------------|---|---| - | Uniform n=100000000 | 9033.28 ms |5041.56 ms |4565.69 ms |**3959.85** ms |12299.53 ms | 1.79x | - | Gaussian n=100,000,000 | 8,980.69 ms |4,621.05 ms |4,848.83 ms |**3,918.20** ms |12,281.62 ms | 1.94x | - | Exponential n=100000000 | 9009.31 ms | 4116.37 ms | 4419.91 ms | **3929.04** ms | 12283.25 ms | 2.19x | In my opinion (I haven’t heard from my two colleagues at the library yet), statsort is not suitable for inclusion in the Boost Sort library Francisco José Tapia fjtapia@gmail.com El jue, 9 abr 2026 a las 20:20, Peter Taraba via Boost (< boost@lists.boost.org>) escribió:
Hi Nigel,
thank you for the feedback and all the information. Highly appreciated.
Best, Peter
*Sent:* Tuesday, April 07, 2026 at 4:16 AM *From:* "Nigel Stewart" <nigels.com@gmail.com> *To:* "Boost developers' mailing list" <boost@lists.boost.org> *Cc:* "Peter Taraba" <taraba.peter@mail.com> *Subject:* Re: [boost] Re: Proposing statsort for Boost.Sort Hello Peter,
I took a fresh look just now, indeed the polishing work is evident. I had some thoughts I'd like to share.
One thing I think is great about boost::sort is the multi-threading. For some workloads sorting can definitely be a bottleneck and throwing more cores at that is beneficial. In rendering for example, knowing the order of things towards or away from the current viewpoint, can be useful. It does seem like this algorithm is amenable to multi-threading, and think would be a requirement for production use.
If I had a precomputed bucket per item, that would likely fit into a 16-bit unsigned integer. And if evaluating that is costly, or not all the items are changing, would be an additional performance advantage. So consider a variant of statsort accepting a seperate array of buckets, rather than always "lazily" evaluated. One other advantage of decoupling the bucket evaluation is that non-uniform distributions can be dealt with as a seperate concern. (Such as percentile via CDF by some good enough method)
I'm curious to know if sqrt(n) buckets is somehow optimal, or if the performance has been evaluated for other powers such as 0.4 or 0.6? One reason I'm curious is if 5 billion items works fine for 16-bit bucket index, to spare the memory expense of 32-bit bucket index.
One code suggestion is to nest statsort namespace inside detail, rather than prefixing the function names there.
Best regards,
Nigel Stewart
_______________________________________________ Boost mailing list -- boost@lists.boost.org To unsubscribe send an email to boost-leave@lists.boost.org https://lists.boost.org/mailman3/lists/boost.lists.boost.org/ Archived at: https://lists.boost.org/archives/list/boost@lists.boost.org/message/UYEMTAK7...