Boost logo

Boost :

Subject: [boost] [SORT] Parallel Algorithms
From: Francisco José Tapia (fjtapia_at_[hidden])
Date: 2014-12-14 13:03:06

Sorry by the delay, As promised, the sorting parallel algorithms

 On my computer (Quad Core) the GCC std::sort and stable_sort are slight
faster than my version of intro_sort and merge_sort (stable sort is a
synonym of merge_sort). In other computer with an Intel I3, my algorithms
are slight faster than in the GCC version.

 Trying to optimize the parallel implementation. I modified the parallel
version for to use the std::sort and std::stable_sort. The parallel_sort
version response good, but the parallel_stable_sort increase the time
around a 60% compared with the version with
countertree::parallel_merge_sort .

 After many test, I discover the cause of the problem. The demonstration is
in the test_stress_sort_01.cpp.

In this program we have a big vector of 100 000 000 elements uint64_t. The
program make parts of the same size for each HW core. On my computer 4
parts and 4 cores. The program execute the part sequentially checking the
time, and after do the same but in parallel, each thread sort a part, and
check the time. This is done with the std::stable sort and
countertree::merge_sort. The data input is the same for all. The next lines
are the result show

 max. OpenMP threads: 4 (4 processors)

Sort of 100000000 elements in a vector ---------------------

 Random elements, few repeated ( rand() )-----------------------

Stable Sort

Part 0 secs :2.77955

Part 1 secs :2.75015

Part 2 secs :2.76068

Part 3 secs :2.76284

Parallel Sort secs :8.49796

Countertree stable sort

Part 0 secs :2.72802

Part 1 secs :2.71614

Part 2 secs :2.72074

Part 3 secs :2.722

Parallel Sort secs :4.80692

 Surprisingly the parallel sort with countertree::stable_sort is much more
faster than the sort with std::stable_sort in the GCC 4.8 and 4.9 compiler.
This is the secret of the speed of the countertree::parallel_merge_sort.

 This is very important, because when the size of the elements is small,
the GCC and the countertree algorithms have similar performance. But when
the size of the elements grow, the std::stable sort is slower than the
countertree::merge_sort. This difference is increased in the parallel
version. The next lines are part of benchmark_sort_05.cpp


Sort of N = 6250000 elements of size 128


Random elements, few repeated ( rand() )-----------------------

STL std::sort :2.9188 secs

countertree::intro_sort :2.87864 secs

parallel::sort :2.46113 secs

tbb::parallel_sort :1.86458 secs

countertree::parallel_sort:1.89263 secs

 std::stable_sort :8.77963 secs

countertree::merge_sort :5.70694 secs

parallel::stable_sort :8.03946 secs

countertree::parallel_merge_sort :4.90815 secs

 I have in mind a new version of the stable sort in order to improve the
speed, reducing the number of internal copy of elements. I hope to have
time in Christmas, but I can't guarantee. If the results are the expected I
will send you

 You have too, a version of indirect_sort, which sort pointers to the
elements, and at end sort the elements according the pointers. You can
replace any sort algorithm by their indirect version, obtaining the same
results, with the only difference of the time needed. You have a
benchmark_indirect_sort.cpp where see the comparison of the time needed of
several sort methods and their indirect version with elements of different

 This code is a preliminary version, done for my countertree library. If
you consider useful, I will change the name space and document the code,
the algorithms and the ideas obtained from the benchmarks.

 Please feel free for ask me any question or for to modify or change
anything. The goal is to provide good sort methods to the users.

In the zip file you have the code and a file with a description of each


Boost list run by bdawes at, gregod at, cpdaniel at, john at