Boost logo

Boost :

Subject: Re: [boost] [sort] Parallel sorting sub-library mini-review. Performance tests.
From: Steven Ross (spreadsort_at_[hidden])
Date: 2016-11-21 21:42:08


On Sun, Nov 20, 2016 at 3:24 PM Christophe Henry <
christophe.j.henry_at_[hidden]> wrote:

> I did not manage yet to write a full review. One week is a bit short so a
> few more days will be necessary. I provide some performance tests first.
>

Thanks Christophe. We will be interested in incorporating your feedback to
improve the library.

>
> OTOH the single-threaded ones are interesting, especially the stable_sort
> and intro_sort for cases where usage of spreadsort is not possible.
>

I challenge you to find a sorting problem where it isn't possible to use
spreadsort. string_sort can sort anything that std::sort can; you just
need to convert the sort key comparison sequence into a sequence of bytes
and feed string_sort the correct functors. That said, it may not be
efficient to sort some problems with string_sort, and it'll probably be too
much work to be worth the effort for many problems.

>
> A short summary:
>
> 100000000 uint64_t elements already sorted:
> Boost parallel sort : 0.064965 secs
>
> Asynchronous parallel sort : 0.0167134 secs (same with other
> algorithms)
>
> Asynchronous provides a special optimization for this case.
>

If this optimization is practical to incorporate in Boost parallel sort, it
is a common case and your solution seems noticeably faster, so I would like
Francisco to consider it. That said, .06s isn't bad.

>
> I added this one:
> 100000000 uint64_t elements reverse sorted
>
> Boost parallel sort : 0.581381 secs
>
> Asynchronous parallel sort : 0.0478701 secs
>
> Asynchronous provides a special optimization for this case.
> I this the library should do it too. This one is pretty common and a
> popular DOS attack.
>

This optimization would be nice to have, if it's cheap.

>
> 100000000 uint64_t elements randomly filled
>
> Asynchronous parallel quickspreadsort: 0.587432 secs
> Asynchronous quick_intro_sort : 0.728393 secs
>
> This mixing of quicksort degrading into a spreadsort works here best.

I find this algorithm interesting; have you considered using MSD
radix-based sorting to break up the data for the threads in parallel? For
example, take T threads, have each thread take 1/T of the data and bucket
it into N MSD-based buckets (just like Spreadsort), where T < N <= ~2^11,
then merge the equivalent buckets, and as you merge them hand them off to
another thread to sort. This will probably divide better if you find the
max and min before bucketing using a parallel min/max search.

> 10000000 strings randomly filled
>
> OMP parallel sort : 1.05803 secs
> Boost parallel sort : 0.933055 secs
>
> OMP parallel stable sort : 1.12269 secs
> Boost sample sort : 0.889216 secs
> Boost parallel stable sort : 1.56564 secs
>
> Asynchronous parallel quickspreadsort: 0.788856 secs
> Asynchronous quick_intro_sort : 0.893652 secs
>
> Asynchronous parallel stable sort : 1.23495 secs
> Asynchronous boost::stable sort : 1.21817 secs
>
> Similar results.
>
>
> Let's move on to big objects:
>
> 1562500 elements of size 512 randomly filled
> H E A V Y C O M P A R I S O N
>
> OMP parallel sort : 0.308923 secs
> Boost parallel sort : 0.342992 secs
>
> Asynchronous parallel_quicksort : 0.246709 secs
> Asynchronous quick_intro_sort : 0.269666 secs
>
> Both quicksorts are best, with a slight advantage to intro_sort.
>
> The light comparison is similar.
>

Do we have your permission to copy and/or adapt your quick_intro_sort
algorithm into the Boost.Sort library? How is the worst-case performance
relative to Francisco's library?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk