Subject: Re: [boost] [SORT] Parallel Algorithms
From: Steven Ross (spreadsort_at_[hidden])
Date: 2015-03-26 22:32:17
On Wed, Mar 25, 2015 at 3:57 PM Francisco JosÃ© Tapia <fjtapia_at_[hidden]>
> TEST OBJECTS
> The object used in the test ( Reburcio is a word of an exercise from my
> first pascal book, and don't have any sense) In the construction , copy
> construction and operator =, copy all the elements with a loop. The only
> one operation which use only the first element is the comparison.
> I had done a new comparison, and compare the sum of all the elements of the
> object. The time are similar, you must add 5% in the small objects to 15%
> in the big objects.
I'm not precisely sure of your meaning, but I suggested modification of the
test object to use default operators where possible, to randomize the
entire contents of the object (so that not all integers in the array are
the same), and to use the entire contents of the object for comparison. I
believe that these suggested changes make it more reflective of the
conventional large-object comparison case. Most of the time, only the
first integer will be compared, but in equal cases the entire array will be
compared. I don't like feeding overly simplified objects into benchmarks,
because compilers can make some surprising optimizations.
> The Tim Sort code is not mine.
Agreed. I think we can drop it from this comparison, as your algorithm is
clearly much faster. I mentioned it because it is used with other
languages, and responded when somebody else on the email group suggested I
allow it. I'm not asking you for an implementation.
Timsort uses N/2 additional memory.
> DESIGN PHASE
> Now , we are in the design phase. Nothing is fixed. We can comment and
> change anything.
> The specification is simple, provide sort algorithms, stable and not
> stable, for to run with 1 thread or with any number of threads, without any
> other external library ( as Open MP , TBB, CilkPlus..). Only with the C++
> If you want to discuss about this, we can do. We admit ideas, suggestions,
> opinions. Any help is welcome. The final decision about if the code is
> included or not in the boost sort library, depend of Steven, the creator of
> the library.
> After the design phase, with the decisions taken and the interfaces fixed.
> We prepare the final version, redesign a test system, test in deep and
> create the documentation.
I suggest deciding what you want to try to include. Getting a competitive
parallel unstable sort to tbb will be hard, and I'm skeptical of including
a library that just replicates what tbb provides for easy use.
> I tried to improve, I would like to have more time, more equipment, and
> better documentation on order to find the best options, but I don't have,
> and I do this in my free time after my work and my family.
I'm in the same situation with regards to available time. I think many
boost participants are.
> I designed 6 stable sort algorithms, for to select the fast (internally
> named smart_merge_sort, about 10% faster than the GCC version). I prepared
> a version of Sample Sort, only with threads and atomic variables. In the
> benchmarks have similar performance than the TBB version, use one half of
> the auxiliary memory used by TBB and is exception safe.
I like your parallel stable sorts. They look like the most promising part
of your proposed library additions. Indirect sorting is also interesting.
I just want reliable large-object and string comparison numbers for them
before diving in further.
Have you looked at integrating your testing with the approach in the Boost
Sort library? The randomgen binary generates files with different types of
random number distributions that can test a variety of scenarios, and the
tune.pl script controls automated performance testing.
> In my opinion, the parallel stable sort is reasonably good for to be
> included in the final version. The important case, which shows the quality
> of the algorithm, is when the data are unsorted. I will try to implement
> the detection of the sorted elements , like in TBB, but I am not sure about
> the results. Th internal loops of the algorithm are delicate, and a small
> change , can produce big changes in the performance, as happened in other
> Is there a way you can do a fast initial pass, checking for already-sorted
data? That's what Spreadsort does (at each level of the sort).
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk