Boost logo

Boost :

Subject: Re: [boost] [parallel_sort] Proposal
From: Edouard A. (edouard_at_[hidden])
Date: 2009-02-02 14:41:39


 
> It seems fundamentally runtime to me, since different machines or just
> different runs will want different levels of concurrency. The
> overhead ought to just be a few compares and arithmetic operations,
> which would be swamped by the effort involved in the sorting.
>
> I think normally I'd want to just use
> parallel_sort(b, e, thread::hardware_concurrency())

Maybe you're right. That's not a major design change to make it runtime, I
can easily give it a try. We spend most of our time sorting, making an
additional check is no big deal.

As for TBB, I did a quick benchmark.

As expected, TBB is faster, but not a lot.

TBB is roughly 3 times faster than std::sort. Again, I didn't expect to get
a x 4 gain because of the design of the Q6600 CPU.

My implementation is 2.6 times faster than std::sort.
 
Although I don't slice the data, my synchronization mechanisms are extremely
simple, which explain why the difference isn't very big (15% advantage for
tbb).

I have however noticed that the behavior on sorted input is not very
satisfactory on my side, probably for the reasons described by Phil.
tbb::parallel_sort behavior is constant from what I can see.

I'm going to:

 * write a parallel_merge
 * see what I can do to optimize memory allocations
 * think about the slicing issue
 * probably play left4dead instead of all the above :(

-- 
EA

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk