Subject: [boost] Fwd: [sort parallel] New Parallel Sort Algorithm
From: Francisco JosÃ© Tapia (fjtapia_at_[hidden])
Date: 2016-03-29 16:28:18
This message is for to announce the new version of the Boost Sort Parallel
algorithms. You can find in http://github.com/fjtapia/sort_parallel. In the
documentation, you can see the description of the algorithms, examples of
invocations, and the graphs of the benchmark of the algorithms.
The benchmark was not include in the library. I created other repository
https://github.com/fjtapia/sort_parallel_benchmark, where you can find the
code of the benchmark, the library, and the external code of the TBB
parallel stable sort. It was not logic, include external code inside the
The most significant of this version is the new parallel sort algorithm
(internally named Block Indirect Sort)
In the Parallel Sort Algorithms, we can find two groups.
Algorithms very fast with many threads (GCC Parallel Sort, Microsoft
Parallel Buffered Sort), but use an additional memory as the data.
Algorithms without additional memory, but slower with many threads (TBB
Parallel Sort, Microsoft Parallel Sort)
The new algorithm *invented, designed and implemented by the author,
provide the speed of GCC Parallel Sort and a memory usage similar to TBB. With
the additional advantage that don't need any library ( Open MP , TBB,
Microsoft PPL â¦). Only need a C++11 compiler. An example shows you this :
Sorting 100 000 000 random numbers of 64 bits. In a server with 32 threads.
The results obtained was
*Memory used (MB)*
*GCC Parallel Sort*
*TBB Parallel Sort*
*Block Indirect Sort*
In the repository with the benchmark code you can find the benchmark
results with several machines
these files, you can see the rival is GCC. TBB is clearly surpassed, due to
their internal algorithm.
GCC divide the data, sorted independently by all the threads. And after for
to merge do an easy process, and copy in an auxiliary memory, of the same
size than the used by the data. All the threads are fed, but need the
double of memory.
The new algorithm, do the division, and sort independently, but for the
merge, have a small buffer of 1024 elements, and when a part is moved to
the buffer, must calculate which part must be copied in that place. Do the
same but without the additional memory. Due this, need more CPU for to do
The hyper threading (HT) provide around a 30% more power to the processor,
and usually in the machines with HT activate, the new algorithm is a bit
faster than GCC, and with the HT deactivate is slower, and TBB is slower
except when very big objects. But then the bottleneck is not the algorithm,
is the data bus
The new algorithm is a collection of bad algorithms when you have only 1
thread. But they are easily subdivided. This permit parallelize all the
process and have fed all the threads.
The new algorithm is documented in two articles
block_indirect_sort_brief_en.*:* (Introduction and graph benchmarks)
block_indirect_sort_en.pdf: (Introduction, detailed description of the
algorithm and benchmarks.)
If you want to test the benchmark in your machine it's easy. In the
benchmark repository, you have the code used, and shell scripts for to
compile and run, saving the results in a Results.txt file, for Linux GCC
and CLANG and Windows Visual Studio 15
If you see any problem, mistake, something to improve or need any other
information, please, say me.
* Invented: The algorithm had been ideate, designed and implemented
beginning from zero from an old idea. After read hundreds of articles and
books, I didn't find any similar. If you know something, please say me.
But the important is not the author, is provide a fast, robust, and easy to
use algorithm to the community of programmers.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk