Boost logo

Boost :

Subject: Re: [boost] [review] Formal review period for Sort library continuing through November 19
From: Julian Gonggrijp (j.gonggrijp_at_[hidden])
Date: 2014-11-17 16:55:53


Steven Ross wrote:

> I updated the first paragraph in the introduction to make clearer what
> this library is NOT good for, and added the change to the develop
> branch. Here is the new first paragraph.

> "The Boost Sort library provides a generic implementation of
> high-speed sorting algorithms that outperform those in the C++
> standard in both average and worst case performance when there are
> over 1000 elements in the list to sort. They fall back to std::sort on
> small data sets. These algorithms only work on random access
> iterators. They are hybrids using both radix and comparison-based
> sorting, specialized to sorting common data types, such as integers,
> floats, and strings. These algorithms are encoded in a generic fashion
> and accept functors, enabling them to sort any object that can be
> processed like these basic data types. In the case of string_sort,
> this includes anything with a defined strict weak ordering that
> std::sort can sort, but writing efficient functors for some complex
> key types may not be worth the additional effort relative to just
> using std::sort, depending on how important speed is to your
> application. Sample usages are available in the samples directory.”

After my reply to your previous email, it may be clear that this is not
good enough to my taste, yet. Let me offer you a suggestion for a
different approach:

“The Boost Spreadsort library provides a generic implementation of
spreadsort, a hybrid radix/comparison sorting algorithm. The relation
between spreadsort and std::sort is comparable to the relation between
std::sort and insertion sort. In general, std::sort can be expected to be
much faster than insertion sort, but std::sort falls back on insertion
sort for small subranges because insertion sort beats std::sort at that
scale. Insertion sort also beats std::sort for nearly sorted ranges of
any size. Likewise, spreadsort can be expected to outperform std::sort in
general except for small ranges, in which it falls back on std::sort,
with the reservation that “small” here means something rather large!
There also exist situations in which spreadsort is never better than
std::sort, regardless of the size of the range.

Spreadsort is likely to make you a happier person if all of the following
criteria apply:

 - you are sorting very large sequences of data, possibly millions of
   keys or more;
 - your keys can be treated as strings of digits or charachters that can
   be lexicographically compared (fortunately this is true of many common
   types, including fixed-width floats and integers);
 - you do not need to preserve the order of equivalent keys (spreadsort
   is not stable);
 - your sequence is stored in a random access container.”

HTH,
Julian


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk