Boost logo

Boost :

Subject: [boost] Fwd: [sort parallel] New Parallel Sort Algorithm
From: Francisco José Tapia (fjtapia_at_[hidden])
Date: 2016-03-29 16:28:18

This message is for to announce the new version of the Boost Sort Parallel
algorithms. You can find in In the
documentation, you can see the description of the algorithms, examples of
invocations, and the graphs of the benchmark of the algorithms.

The benchmark was not include in the library. I created other repository, where you can find the
code of the benchmark, the library, and the external code of the TBB
parallel stable sort. It was not logic, include external code inside the

The most significant of this version is the new parallel sort algorithm
(internally named Block Indirect Sort)

In the Parallel Sort Algorithms, we can find two groups.


   Algorithms very fast with many threads (GCC Parallel Sort, Microsoft
   Parallel Buffered Sort), but use an additional memory as the data.

   Algorithms without additional memory, but slower with many threads (TBB
   Parallel Sort, Microsoft Parallel Sort)

The new algorithm *invented, designed and implemented by the author,
provide the speed of GCC Parallel Sort and a memory usage similar to TBB. With
the additional advantage that don't need any library ( Open MP , TBB,
Microsoft PPL …). Only need a C++11 compiler. An example shows you this :

Sorting 100 000 000 random numbers of 64 bits. In a server with 32 threads.
The results obtained was

*Time spent*

*Memory used (MB)*

*GCC Parallel Sort*

1.05 secs

1560 MB

*TBB Parallel Sort*

1.76 secs

783 MB

*Block Indirect Sort*

0.97 secs

790 MB

In the repository with the benchmark code you can find the benchmark
results with several machines
these files, you can see the rival is GCC. TBB is clearly surpassed, due to
their internal algorithm.

GCC divide the data, sorted independently by all the threads. And after for
to merge do an easy process, and copy in an auxiliary memory, of the same
size than the used by the data. All the threads are fed, but need the
double of memory.

The new algorithm, do the division, and sort independently, but for the
merge, have a small buffer of 1024 elements, and when a part is moved to
the buffer, must calculate which part must be copied in that place. Do the
same but without the additional memory. Due this, need more CPU for to do
the process.

The hyper threading (HT) provide around a 30% more power to the processor,
and usually in the machines with HT activate, the new algorithm is a bit
faster than GCC, and with the HT deactivate is slower, and TBB is slower
except when very big objects. But then the bottleneck is not the algorithm,
is the data bus

The new algorithm is a collection of bad algorithms when you have only 1
thread. But they are easily subdivided. This permit parallelize all the
process and have fed all the threads.

The new algorithm is documented in two articles


   block_indirect_sort_brief_en.*:* (Introduction and graph benchmarks)

   block_indirect_sort_en.pdf: (Introduction, detailed description of the
   algorithm and benchmarks.)

If you want to test the benchmark in your machine it's easy. In the
benchmark repository, you have the code used, and shell scripts for to
compile and run, saving the results in a Results.txt file, for Linux GCC
and CLANG and Windows Visual Studio 15

If you see any problem, mistake, something to improve or need any other
information, please, say me.



* Invented: The algorithm had been ideate, designed and implemented
beginning from zero from an old idea. After read hundreds of articles and
books, I didn't find any similar. If you know something, please say me.

But the important is not the author, is provide a fast, robust, and easy to
use algorithm to the community of programmers.

Boost list run by bdawes at, gregod at, cpdaniel at, john at