Hi all,
Following the feedback on this thread, I've pushed two substantive updates to the statsort repository:
https://github.com/drpt78/statsort
1. Boost-convention directory layout
The repository is now structured as a proper Boost library:
include/boost/algorithm/statsort.hpp (canonical install path)
test/statsort_test.cpp
test/Jamfile.v2 (b2 / Boost CI support)
example/statsort_example.cpp
benchmark/statsort_bench.cpp
CMakeLists.txt
The Jamfile covers all test cases, so the library can now be validated through Boost's b2-based CI pipeline as well as CMake.
2. Imbalance fallback — now actually implemented
The README previously claimed that statsort falls back to std::sort on pathologically skewed inputs. That claim was not reflected in the code. It is now.
After the bucket-counting pass, both the plain and projection overloads check whether any single bucket received more than 50% of the elements. If so, std::sort is called on that subrange immediately and recursion stops. The threshold is exposed as a named constant (STATSORT_IMBALANCE_RATIO) in the detail namespace. This guarantees O(n log n) worst-case behaviour rather than O(n²) degradation on adversarial inputs such as spike distributions or tightly clustered data.
Four new test cases exercise this path directly: spike distribution, two-cluster input, geometric sequence, and the projection-overload variant of the spike case. All 27 tests pass.
3. Reproducible benchmarks
The benchmark source is now included under benchmark/statsort_bench.cpp. It covers uniform, Gaussian, exponential, nearly-sorted, and spike distributions, and uses median-of-7 timing to reduce noise. The spike row makes the fallback behaviour explicitly visible in the numbers. Build with -DSTATSORT_BUILD_BENCHMARKS=ON.
Happy to hear any further feedback.
Hope everyone is going to enjoy Albert's Pi day tomorrow and have a nice weekend.
Best,
Peter & Claude AI
https://orcid.org/0000-0002-8199-3723