|
Boost : |
Subject: Re: [boost] [sort] Parallel sorting sub-library mini-review. Performance tests.
From: Christophe Henry (christophe.j.henry_at_[hidden])
Date: 2016-11-20 15:24:30
Hi Boost Community,
I did not manage yet to write a full review. One week is a bit short so a
few more days will be necessary. I provide some performance tests first.
To check the library performance, I pitted it against my own versions of a
parallel mergesort and quicksort provided by the asynchronous library (
https://github.com/henry-ch/asynchronous), which is now ready for review,
so we have a stable state for testing.
I also added a parallel version inside the asynchronous library using
Francisco's single-thread implementations to compare with his parallel ones.
I tested on a i7-5960X. Due to time limits, I could not test on more
interesting NUMA platforms (I did not get the reviewed library to compile
for my KNC) so the tests do not pretend to have value outside the i7
architecture.
I linked with tbb_malloc_proxy to limit memory influence.
In a nutshell.
I'm not a big fan of the parallel version of the algorithms. It seems to be
based on std::async, so that a lot of threads are started and joined at
every call. I would suggest using a threadpool.
OTOH the single-threaded ones are interesting, especially the stable_sort
and intro_sort for cases where usage of spreadsort is not possible.
Cheers,
Christophe
A short summary:
100000000 uint64_t elements already sorted:
OMP parallel sort : 0.390899 secs
Boost parallel sort : 0.064965 secs
OMP parallel stable sort : 1.06128 secs
Boost sample sort : 0.0695357 secs
Boost parallel stable sort : 0.0662401 secs
Asynchronous parallel sort : 0.0167134 secs (same with other
algorithms)
Asynchronous provides a special optimization for this case.
I added this one:
100000000 uint64_t elements reverse sorted
OMP parallel sort : 0.317039 secs
Boost parallel sort : 0.581381 secs
OMP parallel stable sort : 1.06448 secs
Boost sample sort : 0.524746 secs
Boost parallel stable sort : 0.73036 secs
Asynchronous parallel sort : 0.0478701 secs
Asynchronous provides a special optimization for this case.
I this the library should do it too. This one is pretty common and a
popular DOS attack.
100000000 uint64_t elements randomly filled
OMP parallel sort : 1.03594 secs
Boost parallel sort : 0.796447 secs
OMP parallel stable sort : 1.28601 secs
Boost sample sort : 0.818954 secs
Boost parallel stable sort : 1.13604 secs
Asynchronous parallel quickspreadsort: 0.587432 secs
Asynchronous quick_intro_sort : 0.728393 secs
This mixing of quicksort degrading into a spreadsort works here best. The
parallel adaptation of intro_sort is not bad either and best of other
library algorithms.
Asynchronous parallel stable sort : 1.26141 secs
Asynchronous boost::stable sort : 0.804814 secs
Interesting. The stable version of the library easily beats
std::stable_sort I used until now.
10000000 strings randomly filled
OMP parallel sort : 1.05803 secs
Boost parallel sort : 0.933055 secs
OMP parallel stable sort : 1.12269 secs
Boost sample sort : 0.889216 secs
Boost parallel stable sort : 1.56564 secs
Asynchronous parallel quickspreadsort: 0.788856 secs
Asynchronous quick_intro_sort : 0.893652 secs
Asynchronous parallel stable sort : 1.23495 secs
Asynchronous boost::stable sort : 1.21817 secs
Similar results.
Let's move on to big objects:
1562500 elements of size 512 randomly filled
H E A V Y C O M P A R I S O N
OMP parallel sort : 0.308923 secs
Boost parallel sort : 0.342992 secs
Asynchronous parallel_quicksort : 0.246709 secs
Asynchronous quick_intro_sort : 0.269666 secs
Both quicksorts are best, with a slight advantage to intro_sort.
The light comparison is similar.
My test code (the modified benchmark.cpp provided by the library):
https://github.com/henry-ch/asynchronous/blob/master/libs/asynchronous/test/perf/boost_sort/benchmark_sort.cpp
The full test results:
https://github.com/henry-ch/asynchronous/blob/master/libs/asynchronous/test/perf/boost_sort/Results_sort.txt
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk