Boost logo

Boost :

From: Steven Ross (spreadsort_at_[hidden])
Date: 2008-06-30 20:26:14


On Mon, Jun 30, 2008 at 12:26 PM, Simonson, Lucanus J <
lucanus.j.simonson_at_[hidden]> wrote:

> Steven Ross wrote:
> >For larger n, the advantage vs. pure comparison-based algorithms
> increases.
> >In effect, Spreadsort smoothly transitions from pure MSB Radix sort to
> >std::sort
> >with its data cut up by a couple iterations of MSB Radix sort
> >(reducing the size of the log(n) factor),
> >depending upon the distribution and size of the data.
>
> The analysis seems a little too optimistic and seems to be based upon
> the assumption that the input is evenly distributed. To analyze the
> complexity you need to think like an adversary who knows your algorithm
> and intentionally creates an input that will make it perform poorly. In
> the worst case your algorithm would be slower than std::sort because a
> data set with a cluster of values that all fall within the same bin and
> a handful of outliers that give rise to the unfortunate binning would
> result in runtime for std::sort + runtime for one or more recursive
> stages of profitless binning. So, the algorithm should be expected to
> be faster than std::sort alone for some inputs, but slower for others.
>

The worst case is actually binning that only marginally splits up the data
each iteration in such a fashion that std::sort must be called on an only
slightly reduced size list at the end. This corresponds to the
square_root(bit_sizeof(T)) + a constant number of std::sort
iterations situation. Because each radix-based sorting iteration is
significantly slower than an introsort iteration, this situation may
actually be slightly slower than introsort
(depending on the system configuration), but the worst-case
performance is still nicely bounded.
The situation you describe can actually perform quite well.
Under that situation it takes bit_sizeof(T)/x iterations until the data
is fully radix sorted, where x is the lesser of MAX_SPLITS (usually 9) and
log(n).
MAX_SPLITS is picked for performance reasons and is not limited in any
fundamental way, but even operating with 9 as the divisor, for a 64-bit
integer
chunky data with outliers as you describe it will take 8 iterations until
the data is
fully radix sorted. It's important to note that Spreadsort will give up and

call std::sort on the current bin for iterations after the first if
log(remaining_range)*LOG_CONST > (log(n_before_sort) * log(n_in_bin))
(LOG_CONST is normally 3). So for chunky data as you describe, if the data
is 64
bits long and we just completed the first iteration, log(remaining_range) =
64 - 9 = 55,
and n_before_sort is 16, n_after_sort is 16 (very few outliers), then the
comparison is:
55 * 3 > 16 * 16 : 165 > 256, it would continue radix-sorting all the way to
the end.
If n was smaller (12 or less), it would stop and call std::sort.
If on the other hand, it only slowly diminished in size:
starting with log(n) = 16
second iteration: 15 (16 x 15 = 240 > 165)
third iteration: 14 (15 x 14 = 210 > (46 * 3) = 138)
fourth iteration: 13 (14 x 13 = 182 > (37 * 3) = 111)
fifth iteration: 12 (13 x 12 = 156 > (28 * 3) = 84)
sixth iteration: 11 (12 x 11 = 132 > (19 * 3) = 57)
seventh iteration: 10 (11 x 10 = 110 > (10 * 3) = 30)
eighth iteration: hits minimum sort size (9) and calls std::sort on n=512
chunks.
That's the worst case. Note that if LOG_CONST were larger or n diminished
in size faster,
std::sort would be called earlier. A LOG_CONST of 5 would significantly
improve
worst-case performance relative to std::sort for 64-bit integers. Note also
that for larger n,
the worst-case runtime does not increase.
Technically, the greater of bit_sizeof(T)/MAX_SPLITS and
square_root(bit_sizeof(T)) are
both worst-case performance situations; which one is more applicable as
"worst" depends on
the value of LOG_CONST and the size of n. For large n, the
bit_sizeof(T)/MAX_SPLITS
is accurate, but on the other hand, MAX_SPLITS can be increased to any value
up to log(n)/8
with reasonable performance, which is why I don't like to use the
bit_sizeof(T)/MAX_SPLITS
form. Using a smaller MAX_SPLITS avoids cache misses.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk