Subject: Re: [boost] [review] Formal review period for Sort library begins today, November 10, and ends Wednesday, November 19
From: Steven Ross (spreadsort_at_[hidden])
Date: 2014-11-10 10:48:14
> Implementation is not tuned.
> For example, take a look
> and here
> Memory allocations occur at those points, however they could be easily
> avoided. Here's a typical use case of `size_bins` function:
> Using a stack based buffer would significantly improve sort times on small
You have a good point that a stack-based buffer would eliminate memory
allocation for the bin_sizes (thanks!). On large data sets, there are
very few memory allocations, as it quickly hits the maximum size that
can be used. My optimization has concentrated on minimizing memory
overhead and optimizing speed on large data sets. Would you agree
that a fixed stack-based array of bin_sizes, passed down into the
recursive calls, would provide a good trade-off between speed and
It will still need memory allocation for the bin_cache, unless it
computes the worst-case size based on the size of the input data type
and the maximum number of bins per iteration. That could end up being
a substantial chunk of RAM to put on the stack (10s of kilobytes!) for
the non-functor versions of integer_sort and float_sort. It would be
tricky to add for the functor versions. string_sort can iterate to an
incredible depth in the worst-case, so it just doesn't seem practical
to put the bin_cache on the stack there. Do you think I should add
the bin_cache to the stack where feasible for integer_sort and
float_sort? It may add significant complication.
> Not in its current state. And not as a separate library. This must be a
> part of Boost.Algorithm library.
That's how I originally planned to add it, but apparently that doesn't
work so well with modular boost. I'm open to that possibility.
> This library must be accepted only after some good optimization. As a
> result, I'd like to see this library outperforming std::sort even on small
> datasets (~150 elements). At the current point std::sort and sort from the
> subject library differ on some constant value (see
> graph/windows_integer_sort.htm or graph/windows_string_sort.htm), not
> O(n*log(n)) vs O(n*log(k/s+s)).
n * (k/s + s) is the worst-case, rarely encountered for integer_sort
for normal data. What ends up happening on most data is that it tends
to run around log(n)/s expensive radix-based based iterations, plus
around s fast comparison-based iterations.
So It works out to comparing log(n) vs C1*log(n)/s + s. This ends up
looking like a constant, though it does get a bit faster as the "+s"
term get proportionately smaller with large n (and the asymptote in
this regime is a constant). Eventually n gets close to k and the
worst-case guarantee kicks in. See the left side of
graph/osx_integer_sort.htm for examples of this faster bucket sorting.
Note that in the speed vs size graphs each increment in the normal
number of radix-based iterations causes a jump in relative
performance: this can be seen in windows_integer_sort between 2^23 and
2^24, and 2^13 to 2^14. string_sort shows similar changes in relative
performance vs size as the number of radix-sorted bytes increases,
though I should note that for string_sort, the worst-case is much
superior to comparison-based sorting, as each comparison may run over
the entire length of the string, where radix-based sorting will only
pass over each character once (or twice, if you include the scan for
all characters being identical). I didn't use this worst-case for
comparison-based sorting when making the graphs, but I could if you'd
like to see spectacular speedups.