|
Boost : |
From: David Abrahams (dave_at_[hidden])
Date: 2008-07-07 16:05:54
on Mon Jul 07 2008, "Steven Ross" <spreadsort-AT-gmail.com> wrote:
>> > Three reasons:
>> > 1) Mean performance across all testcases does not mean just Quicksort's
>> > best-case. Heapsort is a good general algorithm, and improves the
>> average
>> > performance even if it doesn't change much on random data.
>>
>> You're saying that by improving the worst cases, introsort improves the
>> average. I guess the worst cases for median-of-3 quicksort are more
>> common than I understood them to be? They would have to be pretty
>> common to make introsort *significantly* faster on average, wouldn't
>> they?
>>
>
> The best-case for median-of-3 quicksort assumes that the median-of-3 evenly
> bisects the data. Depending upon the distribution, this is not a reasonable
> assumption.
> If it divides the data roughly 1/4 - 3/4, it'll take roughly twice as long.
> I would expect such uneven divisions to be relatively common.
> If it divides 1/8-7/8, it'll take roughly three times as long, but will be
> even worse if it always happens to one side.
> These situations are far from the worst-case, but explain why even on
> average, Quicksort is somewhat worse than Introsort, because some
> distributions are uneven enough for Introsort to call Heapsort.
I'm having a hard time believing that the average case for QuickSort is
as bad as you say. I have done a good deal of searching and I see quite
a few test results showing introsort and quicksort have approximately
identical average-case behaviors.
>> > 2) Introsort also uses insertion_sort, which is really fast on small
>> > data sets.
>>
>> Yeah, but that's essentially an implementation detail optimization.
>> Many Quicksort implementations do the same thing. I don't believe that
>> changes the overall big-O behavior.
>
> No, it doesn't, but the overhead of finding the median becomes more
> significant on overall performance on the smaller subsets, so it is a
> significant implementation detail.
It's not significant in the comparison with QS if QS implementations do
it too!
>
>> > 3) In benchmark testing I find introsort to be about 3X as fast as the
>> qsort
>> > library call. That should be the strongest argument of all.
>>
>> Well that's no surprise at all. You have to pay for a function call
>> indirection with qsort.
>
> Then why hasn't the compiler/library provider inlined the qsort call so the
> uninformed don't have such a huge speed penalty?
Uh... because qsort is a 'C' library routine?
> I'm not claiming Introsort is vastly superior to Quicksort; I'm just saying
> it is enough faster to be significant.
I don't believe that's true in the average case.
>> >> You might consider why the STL provides binary search for forward
>> >> iterators, then ;-)
>> >
>> > That might save a little iteration over a search over every element, but
>> > wouldn't it still be O(n)?
>>
>> It saves no iteration at all. It is O(n) iterator steps but O(log N)
>> comparisons. If comparison is expensive enough, it adds up.
>
> That sounds like a corner-case to me; somebody with a situation like that
> might want to consider a radix sort, if they can radix-sort their data, and
> a trie-like structure to minimize comparisons would probably save some
> runtime. If radix-based operations were cheap enough relative to
> comparisons, they might want a pure-radix sort and radix-based lookup.
Not all data can be radix sorting. And anyway, what does sorting have
to do with binary search?
>
>> >> > Do you have a genuine desire for a real application to see a
>> >> > bidirectional (or unidirectional) iterator version of a hybrid sort?
>>
>> I'm not trying to get you to do it. I'm just pointing out that
>> std::sort is overconstrained and should at least be made to work on
>> bidirectional iterators (or have its worst-case complexity bound
>> tightened in the case of random access iterators, or both).
>> <http://lists.boost.org/mailman/listinfo.cgi/boost>
>>
>
> I thought about what would be required to make integer_sort work with
> forward iterators:
> 1) I would have to either eliminate the check to see whether the data set is
> small enough that
> std::sort would be better, require the user to provide a count of the
> number of items being sorted, or be able to determine that the iterator
> isn't random-access and thus not do the check.
> 2) I would have to eliminate recursive calls to std::sort, and instead use
> an algorithm that works safely using the iterator I'm provided for recursive
> comparison-based sorting calls.
> 3) I would need an inlined function call for going forward n places from an
> iterator that takes constant time for a random access iterator, and the
> minimum number of operations necessary for a more limited forward iterator.
> With that said, it should add an additional pass through the data for a
> non-random-access iterator.
>
> Given all that, I can make integer_sort work off a forward iterator, without
> impairing randomaccess iterator performance, and still having decent
> performance for a forward iterator.
> So if you can provide me all of these:
> 1) A way to determine that an iterator is not random-access.
> 2) An efficient generic comparison-based recursive algorithm that takes
> forward iterators that is competitive with Introsort in performance.
> 3) An efficient way to go forward n steps from an iterator that is just as
> fast as + n for a random access iterator, but also works for normal
> iterators.
>
> Then I could make integer_sort work with a plain forward-iterator. I'd
> still have to be able to swap elements.
> For linked lists, a specialized version of Spreadsort would be significantly
> faster.
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>
-- Dave Abrahams BoostPro Computing http://www.boostpro.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk