Boost :

Date view	Thread view	Subject view	Author view

From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2008-07-08 19:05:53

Next message: Robert Dailey: "Re: [boost] [extension] Concatenating factory_map objects"
Previous message: Jeremy Pack: "Re: [boost] [extension] Concatenating factory_map objects"
In reply to: Steven Ross: "Re: [boost] Proposed templated integer_sort"
Next in thread: Robert Ramey: "Re: [boost] Proposed templated integer_sort"

>To actually make it part of std::sort, I'd need to have float, integer,
and
>string versions, check for which applies, and if none apply, call the
>current std::sort. That will require significant work.

Yes, that is the work I had in mind. But as I said, I'm not sure it
would be worthwhile because nobody stands to benefit from a faster
serial sorting algorithm for large data sets.

>2) The only fully serial part is finding the size of the bins to create
in
>the first iteration. This could be made parallel by allocating
external
>memory for bins, but at the cost of locking code, or merging bins from
each
>processor, in addition to the memory allocation, which probably isn't
worth
>it with a small number of processors. If you did insert into bins at
this
>point, the algorithm would be fully parallel, and it could skip the
next >two
>steps.

I disagree with you here. You could easily split the input into kernels
of data and process each one as a task to compute the size of bins
required by that kernel and join the results.

>3) Allocate bins (there are normally n/512 bins in the first iteration,
so
>this is fast, even if serial)

Allocation of bins would happen as a serial stage in your pipeline, yes,
but be very fast and not harm performance.

>4) In the swap loop, as soon as a bin is filled, fire it off on a
separate
>thread. As you cut the data into 512-or-so pieces in the first
iteration,
>each successive iteration should now be running efficiently in
parallel.
>How many swaps are required to fill the first bin is not reliable, so
this
>step is not fully parallel.

You are thinking about threading in a thread-oriented instead of
task-oriented way. If you recursively create threads you will
oversubscribe your machine and the threads will contend for cores. 512
threads is way, way, way too many. That's not a recipe for success.
Instead of creating threads you want to create tasks and have a task
scheduler that assigns them to threads.

The first iteration of spreadsort can be fully parallel because if you
cut the original input into enough kernels and process them as tasks you
can keep all your cores busy until all the bins are filled, then proceed
with parallel recursion into each bin.

>As to std::sort being a straw man, I'd be happy to compare against a
faster
>general-case serial sorting algorithm (if said algorithm exists).
>Improvements in serial sorting algorithms should mostly apply to
parallel
>algorithms, and nothing stops you from running Spreadsort in parallel.
I'd
>be happy to look into developing parallel versions once integer_sort,
>float_sort, and string_sort are in Boost, and hopefully I have enough
money
>to buy a test machine with enough cores to be worth testing on.

You sort of missed my point about std::sort being a straw man. It is
the best general case serial sorting algorithm. It isn't a good
benchmark to compare against, however, in a world with parallel sorting
algorithms and multi-core machines that are quickly becoming ubiquitous.
Nobody should ever consider investigating using a faster serial
algorithm for sorting large data sets when they would be much better
served to apply the freely available implementation of a parallel
algorithm.

The thing stopping me from running spreadsort in parallel is that you
haven't released a threaded version of the algorithm with a nice, clean,
generic API. I would never parallelize it myself, because if I wanted a
parallel sorting algorithm I'd use one off the shelf. Also, it stands
to reason that there is at least a 50/50 chance parallel spreadsort
would under-perform existing parallel sorting implementations. Whether
it turns out to be better or worse than the best parallel implementation
is what I'm interested in finding out. Marginally better than std::sort
for large n only tells me that you are at a good point to start thinking
about threading the algorithm since you have bottomed out on serial
performance improvements.

Luke

Next message: Robert Dailey: "Re: [boost] [extension] Concatenating factory_map objects"
Previous message: Jeremy Pack: "Re: [boost] [extension] Concatenating factory_map objects"
In reply to: Steven Ross: "Re: [boost] Proposed templated integer_sort"
Next in thread: Robert Ramey: "Re: [boost] Proposed templated integer_sort"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk