Boost logo

Boost :

Subject: Re: [boost] [sort] Timsort review result
From: Steven Ross (spreadsort_at_[hidden])
Date: 2017-06-16 23:39:14


On Fri, Jun 16, 2017, 6:08 PM Soul Studios via Boost <boost_at_[hidden]>
wrote:

> Time complexity is not the only consideration when reviewing sorting
> techniques. Smoothsort is purported to have much better time complexity
> across a range of scenarios, but in reality it's cache performance is
> rubbish. That's where quicksort variants and timsort work well. I
> haven't investigated spreadsort personally.
>

Agreed, that's why spreadsort has to limit itself to dividing into a number
of buckets that will fit in L1 cache (its theoretical time complexity would
be better otherwise). It works out to between slightly faster than
std::sort to about 2X as fast on most data. With string or other multiple
piece keys the spreadsort speedup can be spectacular, as it will traverse
the full length of the key at most once during the radix portion of the
sort, where comparison sorts can do so for each comparison. When I
modified bzip to use spreadsort the overall compression runtime (more than
just sorting) dropped 40%.

Smoothsort has the same nlogn worst case as other comparison sorts, and it
has the cache issues of heapsort, which makes for poor performance on
average.

>
>
> On 13/06/2017 11:09 p.m., Steven Ross via Boost wrote:
> > After reviewing the responses, I have rejected Timsort. Here are the
> > reasons:
> >
> > 1. No author/maintainer responses to the questions on boost (except
> for
> > forwarding some documentation links). We need a maintainer for all
> our
> > source, and I'm not volunteering to maintain new functions as
> complex as
> > Timsort.
> > 2. There was no response on the Timsort bug reported during the
> review,
> > nor any tests for it.
> > 3. We need better tests.
> > 4. Francisco is including a more efficient stable sort as part of his
> > additions to the library, including handling special cases like
> Timsort
> > handles better than stable_sort does. Francisco will maintain his
> code.
> >
> > I'd like to note that many people THINK they are experts on sorting
> because
> > they covered it in class or may have read about it in a textbook, but
> most
> > do not understand hybrid sorts well. As a specific example, I saw
> > O(N*log(N)) mentioned as the worst-case speed limit of general-case
> sorts,
> > when in fact Spreadsort beats that as described in its docs:
> >
> http://www.boost.org/doc/libs/1_61_0/libs/sort/doc/html/index.html#sort.overview.performance
> >
> > Many people also don't carefully analyze for worst-case performance,
> which
> > can be tricky.
> >
> > With regards to future sorting library reviews, I'll get at least a
> > one-page doc describing the algorithm for the review in advance, but the
> > boost documentation formatting is an unreasonable request for a simple
> > sorting function, and I don't see the value it adds for these simple
> > reviews. It is useful for a full library, but we'll be integrating the
> > documentation into our full library after a new function is accepted.
> >
> > It isn't accurate to call it a mini-review (as described here:
> > http://www.boost.org/community/reviews.html#Fast-Track) because it's
> > reviewing completely new source we're talking about adding to the
> library.
> > That said, do you still want me to a call the upcoming one for pdqsort a
> > "mini-review"?
> >
> > _______________________________________________
> > Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
> >
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk