Boost logo

Boost :

Subject: Re: [boost] [sorting] Implementation of histogram sort
From: Steven Ross (spreadsort_at_[hidden])
Date: 2009-05-12 17:51:10


The three included algorithms area a hybrid of in-place histogram bucket
sorting and std::sort. integer_sort and float_sort use the spreadsort
algorithm. string_sort has strong similarities to American flag sort; the
primary differences are:
1) it uses std::sort as its fallback comparison sort (instead of insertion
sort) and switches to comparison-based sorting at 256 elements, as opposed
to American flag sort's 16
2) It does not bother trying to determine the maximum and minimum buckets
(always uses 256 + empty), to avoid the overhead of a linear search through
all elements
3) An optimization for long identical substrings, and also a comparison
method that only looks at the portion of the string that is unsorted
Worst-case runtime and memory usage is of the same order as American flag
sort.

Maximum memory overhead for integer_sort/float_sort is on the order of (2 to
the power MAX_SPLITS) * (bit_length/MAX_SPLITS), which for MAX_SPLITS of 11
and 32-bit integers, works out to some kilobytes (roughly 64kB, on a 64-bit
system is my estimate, not measured). They share the same basic algorithm,
though float_sort has some additional code that handles casting floats to
integers for more efficient sorting. For integer_sort and float_sort you
may want to try different values of MAX_SPLITS to see which one works best
for your processor; the default value of 11 works reasonably well on various
systems, but the value of MAX_SPLITS can have a large performance impact.

On Mon, May 11, 2009 at 10:16 AM, Jeremiah Willcock <jewillco_at_[hidden]>wrote:

> Does the proposed Boost sorting library contain an implementation of
> histogram sort or some other in-place variant of bucket sort (American flag
> sort, etc)? I would like to use an integer sort for a problem I have but
> need to save as much memory as possible. Having one element of temporary
> storage per bucket is fine, though.
>
> -- Jeremiah Willcock
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk