Boost logo

Boost :

Subject: Re: [boost] [review] [sort] Sort library review manager results
From: Steven Ross (spreadsort_at_[hidden])
Date: 2014-12-03 20:13:02


Franciso

On Mon Dec 01 2014 at 1:52:00 PM Francisco José Tapia <fjtapia_at_[hidden]>
wrote:

> The algorithms must be refined. The size limit for a thread is (
> total_size / (nthreads *8) ). The number of 8 is arbitrary, run well, but I
> need more benchmarks for to obtain the best value.
>

That's reasonable, but the constant (8) should be somewhere easy to find
and modify in case a different value is better on other systems.

> The 1 thread algorithms must be improved, specially the merge_sort, the
> first solution can be copy the algorithm used in GCC, but I would like to
> examine more in deep.
>

Agreed. Don't be afraid to just fall back on the version in the standard
library if you can't improve on it. I recommend doing that for the
single-threaded part of the algorithm (unless you want to try my idea of
replacing insertionsort with switch statement that selects between sorting
networks, which will require a more careful performance review, but might
be faster).

>
> The detection about the sorted elements is easy in intro sort with a
> counter of movements, when you detect zero movements form one side to
> other, you can check if they are ordered. In the merge sort is more
> difficult, but I will examine.

That sounds reasonable.

> The worst case in intro sort is extremely difficult to obtain. I checked
> with a counter when the algorithm use heap sort, and with hundred of
> millions elements, with different data inputs, sometimes the counter is to
> 1, and the number of elements sorted is small.
>

What partitioning algorithm are you using? Can't you design an input
deliberately to mess up the partitioning, so only one or two items go into
one of the two halves? Introsort is optimized to still perform decently in
this worst case, and it'd be good to confirm that with worst-case data.

About the size, intro sort don't use additional memory. Merge sort use an
> array of a half of the size of the input data. This array is used, in the
> parallel and in the 1 thread function, each call have the range of elements
> and the related part of this array.
>

That sounds reasonable, though my understanding is that std::stable_sort is
capable of running with less RAM than your merge_sort.

> I have done other things about sorting which perhaps can be interesting. I
> am testing indirect sort. When the size of the elements grows, the data bus
> is stressed moving the elements and the performance of the algorithms
> drops. If you create an array with pointers to the elements, and sort this
> array, you have a penalty because you must access to the elements from a
> pointer, but you only move pointers. At end with the pointers sorted, a
> O(N) method sort the original elements. And with the size of the elements
> grows, this method is faster than the original.
>

Yes, indirect sorting is a well-known good technique. It's usually not
difficult, but if you can create a general wrapper that makes it even
easier to do efficiently, that would be interesting.

> I hope have time this weekend for to pack and send the code. I don't know
> if as zip file or create a git repository for this code. If I can help you,
> please say me, and I will try. And if you want to use something say me for
> to write the documentation, because now I have only the code, test programs
> and several benchmarks.
>
> If you can get the library fast enough that it's within 5% of the best
competitor on both Linux and Windows for random, sorted, reverse-sorted,
and mostly-sorted data (both integers and strings), and has reasonable
worst-case performance, then I'll definitely be interested in taking a look.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk