Boost logo

Boost :

Subject: Re: [boost] [review] Formal review period for Sort library begins today, November 10, and ends Wednesday, November 19
From: Steven Ross (spreadsort_at_[hidden])
Date: 2014-11-10 23:21:08


I've added fasterbzip to the develop doc directory:
https://github.com/spreadsort/sort/blob/develop/doc/fasterbzip2-1.0.5.zip,
but haven't referenced it from any documentation. bzip has a bizarre,
highly optimized sort, and stuffing the Spreadsort algorithm into it was
non-trivial. Still, I see substantial speedup on data sets with high
compression levels (and little speedup on random or near-random data that
doesn't compress much).

Antony,

> Implementation is not tuned.
>
> For example, take a look
> here
>
https://github.com/spreadsort/sort/blob/master/include/boost/sort/detail/spread_sort_common.hpp#L110
> and here
>
https://github.com/spreadsort/sort/blob/master/include/boost/sort/detail/spread_sort_common.hpp#L116
> .
> Memory allocations occur at those points, however they could be easily
> avoided. Here's a typical use case of `size_bins` function:
>
https://github.com/spreadsort/sort/blob/master/include/boost/sort/detail/string_sort.hpp#L170
.
>
> Using a stack based buffer would significantly improve sort times on small
> datasets.

The develop branch now has the change you suggested, and it sped up sort
times (with the minimum threshold disabled) on 700 element arrays by 37%,
making 700 elements the new crossover point. I'm going to test the impact
on large arrays more, but so far I see no measurable impact. I could try
using the tricks used in "Engineering Radix Sort" to logarithmically bound
the bin_cache size, add additional stack when it is needed via
co-recursion, and put the bin_cache on the stack (or I could just allocate
the bin_cache on the stack every iteration). As int_min_split_count (min
s) is tuned to 9, I doubt that I'll be able to get the crossover point
below around 500 elements. Do you think bringing the crossover point down
to 500 or so is worth significant additional code complexity or stack
memory usage to avoid heap allocation completely?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk