Boost logo

Boost :

Subject: Re: [boost] [SORT] Parallel Algorithms
From: Steven Ross (spreadsort_at_[hidden])
Date: 2014-12-14 21:52:47


Franciso,

On Sun, Dec 14, 2014 at 1:03 PM, Francisco José Tapia <fjtapia_at_[hidden]> wrote:
> Sorry by the delay, As promised, the sorting parallel algorithms
>
> On my computer (Quad Core) the GCC std::sort and stable_sort are slight
> faster than my version of intro_sort and merge_sort (stable sort is a
> synonym of merge_sort). In other computer with an Intel I3, my algorithms
> are slight faster than in the GCC version.

An I3 is low-end hardware. I recommend testing on an i7 or faster system.

> This code is a preliminary version, done for my countertree library. If
> you consider useful, I will change the name space and document the code,
> the algorithms and the ideas obtained from the benchmarks.
>
> Please feel free for ask me any question or for to modify or change
> anything. The goal is to provide good sort methods to the users.
>
> In the zip file you have the code and a file with a description of each
> code.

First, thanks for sending this in a fairly easy to test format.
I suggest updating your docs to recommend single-line compilation (if
appropriate), which saves a step:
Sort/Modified/benchmark/GCC/util_sort$ g++ -std=c++11 -Wall
-fexceptions -fopenmp -fomit-frame-pointer -fexpensive-optimizations
-O3 -I../../../src benchmark_sort_03.cpp -fopenmp -s -lpthread -ltbb
-o benchmark_sort_03 -pthread

I was unable to run your indirect sort benchmark:
Sort/Modified/benchmark/GCC/util_sort$ g++ -std=c++11 -Wall
-fexceptions -fopenmp -fomit-frame-pointer -fexpensive-optimizations
-O3 -I../../../src -c benchmark_sort_indirect.cpp -o
benchmark_sort_indirect.o -pthread
Sort/Modified/benchmark/GCC/util_sort$ g++ -o benchmark_sort_indirect
benchmark_sort_indirect.o -pthread -fopenmp -s -lpthread -ltbb
Sort/Modified/benchmark/GCC/util_sort$ ./benchmark_sort_indirect
Size of element 8 Number of elements :100000000
terminate called after throwing an instance of 'std::system_error'
  what(): Enable multithreading to use std::thread: Operation not permitted
Aborted (core dumped)

I'm actually quite interested in the generalized indirect sort idea (I
looked at your two functions). Do you think you could package that up
so it's a simple wrapper, where the user passes in the sort function
they want to use, and it takes care of changing the input and
comparison (if possible) to be indirect, running the sort, and then
swapping into place based on the results of the indirect sort? If
that can be done without significant loss of efficiency, it would seem
quite useful (I've had do indirect sorts multiple times, and it's
always some code). Even your two functions look useful on their own.

I suggest using the boost random number generator (or if you stick
with the c++11 requirement, which I don't recommend, the std::
equivalent):
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int_distribution.hpp>
random::mt19937 generator;
random::uniform_int_distribution<int> distribution;
And for each random number:
int result = distribution(generator)
You may want to reuse and modify some of my tests in
https://github.com/spreadsort/sort/tree/master/example

rand() is returning lots of duplicate elements, as you're sorting more
than RAND_MAX elements.

I'm not sure why you're putting so much effort into competing with
stable_sort. Some people use it, but I've never seen it in actual
production code.
std::stable_sort is not just merge sort, as based on the documentation
it can use less memory than merge sort:
http://www.cplusplus.com/reference/algorithm/stable_sort/
Complexity
If enough extra memory is available, linearithmic in the distance
between first and last: Performs up to N*log2(N) element comparisons
(where N is this distance), and up to that many element moves.
Otherwise, polyloglinear in that distance: Performs up to N*log22(N)
element comparisons, and up to that many element swaps.

The speed results I get on my system for your library are below. It
just doesn't seem competitive with tbb for speed on my system. If you
get it there (within 5% on random, sorted, and mostly-sorted data on
both Windows and Linux), or if you find someone else with a genuine
need to use this in production code, I'll be interested to take
another look. This is compiled using g++ (Ubuntu 4.8.2-19ubuntu1)
4.8.2 and run on a Intel(R) Core(TM) i7-3612QM CPU @ 2.10GHz:

max. OpenMP threads:8 (processors)8
Sort of 200000000 elements in a vector -------------------

Sorted elements -----------------------------------------------
STL std::sort :3.60897 secs
parallel::sort :1.24468 secs
tbb::parallel_sort :0.059404 secs
countertree::parallel_sort :1.25563 secs

std::stable_sort :10.3108 secs
parallel::stable_sort :3.1866 secs
countertree::parallel_merge_sort :3.71677 secs

Reverse sorted elements ---------------------------------------
STL std::sort :2.57237 secs
parallel::sort :1.02421 secs
tbb::parallel_sort :0.895415 secs
countertree::parallel_sort :1.30754 secs

std::stable_sort :10.4286 secs
parallel::stable_sort :3.21695 secs
countertree::parallel_merge_sort :4.08913 secs

Random elements, few repeated ( rand() )-----------------------
STL std::sort :19.7451 secs
parallel::sort :3.98043 secs
tbb::parallel_sort :4.28722 secs
countertree::parallel_sort :6.85041 secs

std::stable_sort :18.6291 secs
parallel::stable_sort :4.08416 secs
countertree::parallel_merge_sort :6.09388 secs

Random elements, quite repeated ( rand() % (NELEM/2) )---------
STL std::sort :19.5069 secs
parallel::sort :3.93207 secs
tbb::parallel_sort :4.66967 secs
countertree::parallel_sort :6.29049 secs

std::stable_sort :18.8519 secs
parallel::stable_sort :4.31173 secs
countertree::parallel_merge_sort :6.1165 secs

Random element many repeated (rand() % 10000)--------------------
STL std::sort :11.8256 secs
parallel::sort :2.65534 secs
tbb::parallel_sort :3.44815 secs
countertree::parallel_sort :3.79712 secs

std::stable_sort :18.7368 secs
parallel::stable_sort :3.70629 secs
countertree::parallel_merge_sort :6.09851 secs

Equal elements --------------------------------------------------
STL std::sort :4.50343 secs
parallel::sort :1.44262 secs
tbb::parallel_sort :0.0588514 secs
countertree::parallel_sort :1.53466 secs

std::stable_sort :10.382 secs
parallel::stable_sort :3.22236 secs
countertree::parallel_merge_sort :4.17672 secs


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk