Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2008-07-06 19:03:07


on Sun Jul 06 2008, "Steven Ross" <spreadsort-AT-gmail.com> wrote:

>> > That may be true, but Introsort is significantly faster both on
>> > average and worst-case than Quicksort,
>>
>> As I have always understood it (and as Musser's paper describes it),
>> introsort *is* Quicksort until its behavior starts to stray from the
>> average case, when it switches to heapsort as a fallback. How could it
>> possibly have better average case complexity than Quicksort?
>
>
> Three reasons:
> 1) Mean performance across all testcases does not mean just Quicksort's
> best-case. Heapsort is a good general algorithm, and improves the average
> performance even if it doesn't change much on random data.

You're saying that by improving the worst cases, introsort improves the
average. I guess the worst cases for median-of-3 quicksort are more
common than I understood them to be? They would have to be pretty
common to make introsort *significantly* faster on average, wouldn't
they?

> 2) Introsort also uses insertion_sort, which is really fast on small
> data sets.

Yeah, but that's essentially an implementation detail optimization.
Many Quicksort implementations do the same thing. I don't believe that
changes the overall big-O behavior.

> I used to use Quicksort for recursive calls from Spreadsort, then
> found that for sizes up to 16 it was much better to use insertion_sort. So
> for the final sorting of very small data sets after it's finished cutting
> them up, introsort is faster.
> 3) In benchmark testing I find introsort to be about 3X as fast as the qsort
> library call. That should be the strongest argument of all.

Well that's no surprise at all. You have to pay for a function call
indirection with qsort.

>> > The first version of Spreadsort I wrote in 2001 allocated separate
>> > memory for each bin, and with a linked-list approach instead each
>> > sorting operation could be done in a single pass (as opposed to the
>> > three passes used in the version I'm turning into integer_sort),
>> > though without finding the maximum and minimum values. That would
>> > make more sense to use with a linked-list data set. It's not
>> > complicated to implement, but my testing shows that iterating
>> > through linked lists is slow because they often have cache misses,
>> > and besides, binary search only makes sense in a data structure
>> > that supports random access.
>>
>> You might consider why the STL provides binary search for forward
>> iterators, then ;-)
>
> That might save a little iteration over a search over every element, but
> wouldn't it still be O(n)?

It saves no iteration at all. It is O(n) iterator steps but O(log N)
comparisons. If comparison is expensive enough, it adds up.

> I'd think it's there just because it's possible, not because it has much
> use. I have no problem with providing general capabilities just because
> they're possible.
>
>
>> > Do you have a genuine desire for a real application to see a
>> > bidirectional (or unidirectional) iterator version of a hybrid sort?
>>
>> I'm just generally interested in algorithms.
>
> It would be at least a few hours of development plus testing to make sure it
> isn't broken. I'd be happy to tell you how to do it if you wish, but
> without a need, I don't see why I should bother. I have many things to do,
> and little time.

I'm not trying to get you to do it. I'm just pointing out that
std::sort is overconstrained and should at least be made to work on
bidirectional iterators (or have its worst-case complexity bound
tightened in the case of random access iterators, or both).

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk