Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [parallel_sort] Proposal
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-02-02 12:41:58

Next message: John Salmon: "[boost] warnings from gcc-4.3.3 -Wall"
Previous message: Beman Dawes: "Re: [boost] [filesystem] Version 3 request for comments"
In reply to: Edouard A.: "Re: [boost] [parallel_sort] Proposal"
Next in thread: Edouard A.: "Re: [boost] [parallel_sort] Proposal"
Reply: Edouard A.: "Re: [boost] [parallel_sort] Proposal"
Reply: Phil Endecott: "Re: [boost] [parallel_sort] Proposal"

Edouard A wrote:
> For example, for 4 threads you have :
>
> parallel_sort<4> -> parallel_sort<2> -> parallel_sort<1>
> -> parallel_sort<1>
> -> parallel_sort<2> -> parallel_sort<1>
> -> parallel_sort<1>
>
> This code is generated at compile time.
>
> The one thread version use the fallback sorter which is by default
> std::sort.
>
> The default > 1 thread version merges the results of the underlying calls
> (there is a version with quicksort).

So basically it's something like this:

   thread t( sort(begin,mid) );
   sort(mid,end);
   t.join();
   merge(begin,mid,end); // [*]

The trouble with this is that one of the threads will terminate before
the other, e.g. because its data was easier to sort or was in
more-local memory, and it will then do nothing until the other thread
(and all its sub-threads) has also finished. It would be better to
keep all threads occupied for as much of the time as possible, e.g. by
dividing the problem up into chunks and giving the chunks to threads as
and when they are ready to handle them.

You mentioned the Intel implementation before. Have you benchmarked
that in comparison with your algorithm?

I guess that you may have benchmarked with random input data. This
will tend to favour your method as the input is uniformly difficult to
sort. You should also benchmark with input that is less homogeneous,
e.g. data that's mostly sorted but with local permutations, etc. Is
anyone aware of synthetic or real benchmark data sets for sorting?

[*] Of course a fundamental issue for this algorithm is that even if
the two sub-sorts are equally time-consuming, the final merge is still
a sequential operation. But I just wondered whether it would be
possible to have one thread start doing

in_place_merge(begin,mid,end);

while the other thread starts at the other end, going backwards:

in_place_merge(rbegin,reverse_iterator(mid),rend);

Of course they will crash into each other in the middle. But maybe
it's possible to do this in some sort of safe lock-free manner?

Phil.

Next message: John Salmon: "[boost] warnings from gcc-4.3.3 -Wall"
Previous message: Beman Dawes: "Re: [boost] [filesystem] Version 3 request for comments"
In reply to: Edouard A.: "Re: [boost] [parallel_sort] Proposal"
Next in thread: Edouard A.: "Re: [boost] [parallel_sort] Proposal"
Reply: Edouard A.: "Re: [boost] [parallel_sort] Proposal"
Reply: Phil Endecott: "Re: [boost] [parallel_sort] Proposal"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk