Subject: Re: [boost] [parallel_sort] Proposal
From: Edouard A. (edouard_at_[hidden])
Date: 2009-02-02 05:08:05
On Mon, 02 Feb 2009 09:57:40 +0000, "Phil Endecott"
>> Current benchmarks on my Q6600 vs2008-64-bit default allocator:
>> 2 threads : 160 % faster
>> 4 threads : 260 % faster
> So you're getting a super-linear speedup going from 1 to 2 processors?
> That seems odd.
> What size input does the benchmark relate to? I'm curious to know
> whether you still get a worthwhile speedup for relatively small inputs.
No sorry, this is wrongly said, I get factor 1.6 and 2.6. I meant 160 %
speed, not 160 % faster.
For small input it's very simple, anything under 1000 is not parallelized
because of the thread overhead cost.
This would require some precise benchmarking to find the right value.
>> It can be heavily customized. I already offer the possibility to choose
>> between quick sorting and merge sorting, and for quick sorting I offer
>> pivot algorithms : fast or secure. Then you can of course specify a
>> predicate and a fallback sorting algorithm for when you run out of
> How about run-time selection between quicksort and mergesort, in the
> style of introsort? This seems hard to beat in uniprocessor
Not really needed for merge_sort, there is no problematic behaviour like
reason why you would want to use quicksort is that if the extra memory
allocation is an issue.
>> Nevertheless, I think it's time to open up the code and start getting
> Yes please. You haven't said anything about how it actually works...
Threads are specified as a template parameter. There are finite. If you
wish to have a mutlithreaded version that uses 2 threads you do
4 threads ?
What happens then is that you have a template recursive call that generates
the right amount of threading through recursion. That means apart from
thread creation and termination you have little overhead.
For example, for 4 threads you have :
parallel_sort<4> -> parallel_sort<2> -> parallel_sort<1>
-> parallel_sort<2> -> parallel_sort<1>
This code is generated at compile time.
The one thread version use the fallback sorter which is by default
The default > 1 thread version merges the results of the underlying calls
(there is a version with quicksort).
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk