
Boost : 
From: Steven Ross (spreadsort_at_[hidden])
Date: 20080707 12:35:17
> > Three reasons:
> > 1) Mean performance across all testcases does not mean just Quicksort's
> > bestcase. Heapsort is a good general algorithm, and improves the
> average
> > performance even if it doesn't change much on random data.
>
> You're saying that by improving the worst cases, introsort improves the
> average. I guess the worst cases for medianof3 quicksort are more
> common than I understood them to be? They would have to be pretty
> common to make introsort *significantly* faster on average, wouldn't
> they?
>
The bestcase for medianof3 quicksort assumes that the medianof3 evenly
bisects the data. Depending upon the distribution, this is not a reasonable
assumption.
If it divides the data roughly 1/4  3/4, it'll take roughly twice as long.
I would expect such uneven divisions to be relatively common.
If it divides 1/87/8, it'll take roughly three times as long, but will be
even worse if it always happens to one side.
These situations are far from the worstcase, but explain why even on
average, Quicksort is somewhat worse than Introsort, because some
distributions are uneven enough for Introsort to call Heapsort.
> > 2) Introsort also uses insertion_sort, which is really fast on small
> > data sets.
>
> Yeah, but that's essentially an implementation detail optimization.
> Many Quicksort implementations do the same thing. I don't believe that
> changes the overall bigO behavior.
No, it doesn't, but the overhead of finding the median becomes more
significant on overall performance on the smaller subsets, so it is a
significant implementation detail.
> > 3) In benchmark testing I find introsort to be about 3X as fast as the
> qsort
> > library call. That should be the strongest argument of all.
>
> Well that's no surprise at all. You have to pay for a function call
> indirection with qsort.
Then why hasn't the compiler/library provider inlined the qsort call so the
uninformed don't have such a huge speed penalty?
I'm not claiming Introsort is vastly superior to Quicksort; I'm just saying
it is enough faster to be significant.
> >> You might consider why the STL provides binary search for forward
> >> iterators, then ;)
> >
> > That might save a little iteration over a search over every element, but
> > wouldn't it still be O(n)?
>
> It saves no iteration at all. It is O(n) iterator steps but O(log N)
> comparisons. If comparison is expensive enough, it adds up.
That sounds like a cornercase to me; somebody with a situation like that
might want to consider a radix sort, if they can radixsort their data, and
a trielike structure to minimize comparisons would probably save some
runtime. If radixbased operations were cheap enough relative to
comparisons, they might want a pureradix sort and radixbased lookup.
> >> > Do you have a genuine desire for a real application to see a
> >> > bidirectional (or unidirectional) iterator version of a hybrid sort?
>
> I'm not trying to get you to do it. I'm just pointing out that
> std::sort is overconstrained and should at least be made to work on
> bidirectional iterators (or have its worstcase complexity bound
> tightened in the case of random access iterators, or both).
> <http://lists.boost.org/mailman/listinfo.cgi/boost>
>
I thought about what would be required to make integer_sort work with
forward iterators:
1) I would have to either eliminate the check to see whether the data set is
small enough that
std::sort would be better, require the user to provide a count of the
number of items being sorted, or be able to determine that the iterator
isn't randomaccess and thus not do the check.
2) I would have to eliminate recursive calls to std::sort, and instead use
an algorithm that works safely using the iterator I'm provided for recursive
comparisonbased sorting calls.
3) I would need an inlined function call for going forward n places from an
iterator that takes constant time for a random access iterator, and the
minimum number of operations necessary for a more limited forward iterator.
With that said, it should add an additional pass through the data for a
nonrandomaccess iterator.
Given all that, I can make integer_sort work off a forward iterator, without
impairing randomaccess iterator performance, and still having decent
performance for a forward iterator.
So if you can provide me all of these:
1) A way to determine that an iterator is not randomaccess.
2) An efficient generic comparisonbased recursive algorithm that takes
forward iterators that is competitive with Introsort in performance.
3) An efficient way to go forward n steps from an iterator that is just as
fast as + n for a random access iterator, but also works for normal
iterators.
Then I could make integer_sort work with a plain forwarditerator. I'd
still have to be able to swap elements.
For linked lists, a specialized version of Spreadsort would be significantly
faster.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk