|
Boost : |
Subject: [boost] [SORT] Parallel Algorithms
From: Francisco José Tapia (fjtapia_at_[hidden])
Date: 2014-12-14 13:03:06
Sorry by the delay, As promised, the sorting parallel algorithms
On my computer (Quad Core) the GCC std::sort and stable_sort are slight
faster than my version of intro_sort and merge_sort (stable sort is a
synonym of merge_sort). In other computer with an Intel I3, my algorithms
are slight faster than in the GCC version.
Trying to optimize the parallel implementation. I modified the parallel
version for to use the std::sort and std::stable_sort. The parallel_sort
version response good, but the parallel_stable_sort increase the time
around a 60% compared with the version with
countertree::parallel_merge_sort .
After many test, I discover the cause of the problem. The demonstration is
in the test_stress_sort_01.cpp.
In this program we have a big vector of 100 000 000 elements uint64_t. The
program make parts of the same size for each HW core. On my computer 4
parts and 4 cores. The program execute the part sequentially checking the
time, and after do the same but in parallel, each thread sort a part, and
check the time. This is done with the std::stable sort and
countertree::merge_sort. The data input is the same for all. The next lines
are the result show
max. OpenMP threads: 4 (4 processors)
Sort of 100000000 elements in a vector ---------------------
Random elements, few repeated ( rand() )-----------------------
Stable Sort
Part 0 secs :2.77955
Part 1 secs :2.75015
Part 2 secs :2.76068
Part 3 secs :2.76284
Parallel Sort secs :8.49796
Countertree stable sort
Part 0 secs :2.72802
Part 1 secs :2.71614
Part 2 secs :2.72074
Part 3 secs :2.722
Parallel Sort secs :4.80692
Surprisingly the parallel sort with countertree::stable_sort is much more
faster than the sort with std::stable_sort in the GCC 4.8 and 4.9 compiler.
This is the secret of the speed of the countertree::parallel_merge_sort.
This is very important, because when the size of the elements is small,
the GCC and the countertree algorithms have similar performance. But when
the size of the elements grow, the std::stable sort is slower than the
countertree::merge_sort. This difference is increased in the parallel
version. The next lines are part of benchmark_sort_05.cpp
***************************************************************
Sort of N = 6250000 elements of size 128
*****************************************************************
Random elements, few repeated ( rand() )-----------------------
STL std::sort :2.9188 secs
countertree::intro_sort :2.87864 secs
parallel::sort :2.46113 secs
tbb::parallel_sort :1.86458 secs
countertree::parallel_sort:1.89263 secs
std::stable_sort :8.77963 secs
countertree::merge_sort :5.70694 secs
parallel::stable_sort :8.03946 secs
countertree::parallel_merge_sort :4.90815 secs
I have in mind a new version of the stable sort in order to improve the
speed, reducing the number of internal copy of elements. I hope to have
time in Christmas, but I can't guarantee. If the results are the expected I
will send you
You have too, a version of indirect_sort, which sort pointers to the
elements, and at end sort the elements according the pointers. You can
replace any sort algorithm by their indirect version, obtaining the same
results, with the only difference of the time needed. You have a
benchmark_indirect_sort.cpp where see the comparison of the time needed of
several sort methods and their indirect version with elements of different
size.
This code is a preliminary version, done for my countertree library. If
you consider useful, I will change the name space and document the code,
the algorithms and the ideas obtained from the benchmarks.
Please feel free for ask me any question or for to modify or change
anything. The goal is to provide good sort methods to the users.
In the zip file you have the code and a file with a description of each
code.
Yours
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk