Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [SORT] Parallel Algorithms
From: Francisco JosÃ© Tapia (fjtapia_at_[hidden])
Date: 2014-12-23 03:31:21

Next message: Andrey Semashev: "[boost] [compute] Some questions"
Previous message: Steven Ross: "Re: [boost] [SORT] Parallel Algorithms"
In reply to: Steven Ross: "Re: [boost] [SORT] Parallel Algorithms"
Next in thread: Steven Ross: "Re: [boost] [SORT] Parallel Algorithms"
Reply: Steven Ross: "Re: [boost] [SORT] Parallel Algorithms"

We can do too a benchmark with variable lenght elements with strings.

If you prepare the operations for to do in that test, I can prepare an
additional benchmark with all the algorithms and your operations.

After all, we see and take a decision

Yours

Francisco

2014-12-23 4:34 GMT+01:00 Steven Ross <spreadsort_at_[hidden]>:

> Francisco,
>
> On Mon Dec 22 2014 at 5:04:17 PM Francisco JosÃ© Tapia <fjtapia_at_[hidden]>
> wrote:
>
> > > I had been working about the memory used by the different algorithms,
> > > specially the stable algorithms. I developed a low memory algorithm
> > > circular_stable_sort, which use only a 10% more of memory, and the
> speed
> > > decrease around 15% with small objects and with big objects is similar
> to
> > > the merge_sort
> >
> > Some people may find that useful. What is the worst-case computational
> complexity? Is it a well-known algorithm? The bar is going to be higher
> (may require a full review) for previously unknown algorithms.
>
>
> > > I was looking too, the TimSort algorithm. I found a C++ implementation
> > > in https://github.com/gfx/cpp-TimSort .
> > >
> > > I inserted this algorithm in the benchmark programs. With small
> objects,
> > > the speed is slower than the stable sort and merge_sort, and have a
> good
> > > performance with big objects, greater than 64 bits. It's logic the
> > decision
> > > of Java and Python, because they must order objects, usually non
> > contiguous
> > > and bigger than the naked number.
> > >
> > > For the measure of the memory used by the program I use the command
> > > /usr/bin/time -v program_to_execute. I take the benchmark_algorithm_04
> > > (100000000 uint64_t numbers), and comment all the invocations to the
> > > algorithms , except one, compile, and run in a terminal with the
> command
> > > /usr/bin/time -v benchmark_sort_04.
> > >
> > > I repeat this process with all the algorithms for to know the memory
> > > used by each algorithm.
> >
>
> I suggest changing the struct you use for benchmarking to this:
> template <uint32_t NN>
> struct reburcio
> {
> uint64_t M[NN];
> reburcio () {
> init();
> }
>
> void init() {
> for (int i = 0; i < NN; ++i) {
> M[i] = my_rand();
> }
> }
>
> bool operator < ( const reburcio &R) const{
> for (int i = 0; i < NN; ++i) {
> if (M[i] != R.M[i]) {
> return M[i] < R.M[i];
> }
> }
> return false;
> }
> };
>
> That way it's using the default copy and assignment operators, which copy
> over the entire datastructure, and it initializes and compares all elements
> of the array. It appears that the benchmark you sent me only copies over
> and compares the first element in the array, which might have caused some
> of the large-element differences in performance you were seeing.
>
> With this data structure, countertree intro_sort is competitive on my linux
> system with std::sort on fully random data, and countertree parallel_sort
> is competitive for 8, 16, and 256 bytes per element, but significantly
> underperforms (10-15%) tbb for 32 and 64 bytes.
>
>
> > > As you can see with a 256 bytes of size the indirect
> > > parallel_stable_sort is 3 times faster. And the indirect_sort is a 25%
> > > faster.
> >
>
> Yes, indirect sort should eventually be faster, given a large enough
> element, especially if indirect sort is done with the fastest underlying
> algorithm available.
>
>
> > >
> > > The final solution, if at end is taken, will be a hybrid algorithm
> > > depending of the size of the elements
> >
>
> That sounds reasonable for fixed-size elements. What about variable-length
> elements, like strings? Will this only be for fixed-size elements?
>
>
> > > I propose you the next. Let me 2 or 3 weeks and I will prepare a big
> > > benchmark with all the algorithms, including the indirect version of
> the
> > > stable and unstable sort. We will run the benchmarks over different
> > > machines, and according the results, take a decision about if we
> propose
> > > something to Boost, and which algorithm must be used in the hybrid
> > version.
> > >
> > > If, at end, we decide don't propose nothing, I only can say this had
> > > been a nice and exciting experience
> >
>
> I think indirect sorting is promising.
>
>
> > >
> > > Mi intention is not to impose my code, I want provide to the final
> users
> > > the best solution, if exist. With my code or with the code of other. I
> am
> > > the first to clap the winner.
> > >
> > > We are doing this because this parallel implementation don't need any
> > > other library, only the standard. There are compilers without OpenMP or
> > TBB
> > > support, or simply, the user don't want to load them for to make a
> sort.
> >
> > Agreed, but OpenMP is very common (even Android supports it)!, so systems
> without OpenMP are a very small niche in terms of total systems out there.
> TBB is supported on intel-compatible processors, and the vast majority of
> deployed processors where parallel sorting makes sense (won't drain the
> battery really quickly) are intel-compatible. If we're going to provide an
> alternative, it should cost very little in terms of performance, and
> ideally provide some kind of benefit besides eliminating a dependency.
>
>
> > > I have in mind, several improvements of the algorithms. I will codify
> > > and test, and if run well, I include in the algorithms. I want, too,
> > > simplify the interface of indirect_sort, for to be used in a easy way
> by
> > > any sort algorithm. If you are interested, I will prepare too a brief
> > > document with the explanation and the internal algorithm.
> >
> > I'd like to see that document.
> Along with testing a struct that is an array of ints, you should probably
> test variable-length string sorting, as string sorting is very common.
>
> Merry Christmas,
>
> Steve
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

Next message: Andrey Semashev: "[boost] [compute] Some questions"
Previous message: Steven Ross: "Re: [boost] [SORT] Parallel Algorithms"
In reply to: Steven Ross: "Re: [boost] [SORT] Parallel Algorithms"
Next in thread: Steven Ross: "Re: [boost] [SORT] Parallel Algorithms"
Reply: Steven Ross: "Re: [boost] [SORT] Parallel Algorithms"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk