Boost logo

Boost :

Subject: Re: [boost] [SORT] Parallel Algorithms
From: Steven Ross (spreadsort_at_[hidden])
Date: 2015-03-24 22:32:35


Franciso,

Thanks for your detailed explanation.

I'll repeat what I've said before about your object benchmark:
I suggest changing the struct you use for benchmarking to this:
template <uint32_t NN>
struct reburcio
{
  uint64_t M[NN];
  reburcio () {
    init();
  }

  void init() {
    for (int i = 0; i < NN; ++i) {
      M[i] = my_rand(); // or something with less variability to test
identical substring performance
    }
  }

  bool operator < ( const reburcio &R) const{
    for (int i = 0; i < NN; ++i) {
      if (M[i] != R.M[i]) {
        return M[i] < R.M[i];
      }
    }
    return false;
  }
};

That way it's using the default copy and assignment operators, which copy
over the entire datastructure, and it initializes and compares all elements
of the array. It appears that the benchmark you sent me only copies over
and compares the first element in the array, which might have caused some
of the large-element differences in performance you were seeing.
Another alternative is just using a vector for the array of elements.

On Tue, Mar 24, 2015 at 4:56 PM Francisco José Tapia <fjtapia_at_[hidden]>
wrote:

>
> -
>
> GCC sort → M
> -
>
> boost sort → M
>
I don't think we'll ship this; this looks more like a proof-of-concept,
unless the indirect sorting benefits are demonstrable after fixing the
object you use in testing as described above.

   -
>
> GCC parallel_sort → M
> -
>
> boost parallel_sort → M
>

I see no advantage to this sort relative to tbb, and its performance on
already-sorted data needs to improve to be competitive with tbb.

> -
>
> TBB parallel_sort → M
>
>
> -
>
> GCC stable_sort → 2 M
> -
>
> boost stable_sort → 1.5 M
>
The combination of improved speed and lower memory usage looks impressive.
I'll try to verify this, but this looks promising.

> -
>
> TimSort -> 2M
> -
>
> GCC parallel_stable_sort → 2.5 M
> -
>
> TBB parallel_sort → 2 M
> -
>
> boost sample_sort → 2 M
> -
>
> boost parallel_stable_sort → 1.5 M
>
Both of these algorithms look good. I'm inclined towards
parallel_stable_sort because memory is becoming more of a bottleneck on
modern systems, but I'll have to do some testing.

Timsort seems fairly impressive, but specialized and already available for
free, hence not necessary to include in the library.

>
>
> If you found any problem , error or doubt, please , contact me for to
> resolve
>

Please fix your object benchmarks as described at the beginning of my
email; otherwise I don't trust them. I also would like to see string
sorting speeds. Once that is all together, I'd like to test and review
them carefully and look to see if there are any weaknesses missed by your
current tests. If they look good, and we can agree on which algorithms you
want to add to the library, we should move on to cleaning this up for
review.

Steve


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk