Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] Fwd: [sort][container] Challenge for sorting experts, an improved stable sort
From: Francisco JosÃ© Tapia (fjtapia_at_[hidden])
Date: 2015-09-23 15:53:12

Next message: Tom Kent: "Re: [boost] [regression] Tests for 'develop' not showing"
Previous message: Phil Endecott: "Re: [boost] [sort] [container] Challenge for sorting experts, improved stable sort"
In reply to: Ion GaztaÃ±aga: "Re: [boost] Fwd: [sort][container] Challenge for sorting experts, an improved stable sort"

The problem of the auxiliary memory is inverse to the speed.

I didn't know the Grail and the wikiSort, I will examine when finish the
new parallel_sort algorithm, and if you are interested, I will say my
opinion.

I developed time ago an stable sort algorithm with a small auxiliary
memory. You can define the size of this auxiliary memory, and run with any
size. Small size, small speed. With a size of 10% of the memory used by the
data, I remember , was around 30% slower than the fastest implementation

If the Boost community think interesting, we can try to develop algorithms
with small auxiliary memory or without it, for to sort and merge .

Other solution, with low memory is the indirect sort. You create a pointer
to each element of the input data,. The elements of the auxiliary memory
used in the stable sort are pointers , which usually are small than the
elements to sort. Using the smart_merge_sort you need one half only

You have the complete code of the indirect sort in the (
https://github.com/fjtapia/so
<https://github.com/fjtapia/sort_parallel>rt_parallel
).

If you have any problem or question, contact me.

2015-09-23 16:10 GMT+02:00 Ion GaztaÃ±aga <igaztanaga_at_[hidden]>:

> On 22/09/2015 6:58, Francisco JosÃ© Tapia wrote:
>
>> Hi Jon,
>>
>> If the elements in the flat associative container are sorted, you need
>> only
>> to sort the elements to insert and after do a fusion of the two parts.
>>
>
> Yes, I understand that merge has better complexity guarantees than sort,
> sincethe already stored elements are sorted, it's better to sort the new
> elements and merge.
>
> If your container admit repeated elements, you can use the std::sort, if
>> not, you must use an stable sort , and delete the repeated.
>>
>
> I don't think I could use std::sort as is unstable, if there are repeated
> elements, I should only keep the first occurrence, so I need stability
> while sorting. In any case, I can't use standard algorithms as as I'd like
> some traits support to customize basic operations (swap, move,
> uninitialized_move) in order to use Boost.Move or future destructive move
> operations.
>
> You have several algorithms for to do a stable sort. In the library we are
>> developing (https://github.com/fjtapia/sort_parallel ) you have a
>> smart_merge_sort, which is a very fast stable sort and use only one half
>> of
>> the memory used by the data, is exception safe and use the move semantics
>>
>
> Very nice! Is the N/2 memory requirement achieved with techniques similar
> to those described here?
>
> "Fast mergesort implementation based on half-copying merge algorithm"
> http://kicia.ift.uni.wroc.pl/algorytmy/mergesortpaper.pdf.
>
> Just some comments on your implementation, (I'm not a sorting expert,
> sorry if I'm mistaken), mainly related to "adaptability":
>
> Your implementation is not adaptive in the sense that a failure to
> allocate N/2 external memory is triggered with an exception. In contrast,
> many std::stable_sort implementations try to allocate N/4,N/8,etc... and
> they gracefully degrade to a O((log(N)^2) sort for sorting steps that have
> no enough external memory. Eg.: they use available memory to speed up apply
> mergesort for the "low level" phases (merging small ranges that could fit
> in the external memory). Only the final merge steps are done O((log(N)^2).
> This can be seen in the SGI implementation:
>
> https://www.sgi.com/tech/stl/stl_algo.h
>
> "inplace_merge" can also be adaptive, achieving O(N) if enough memory is
> available and ONlogN if no memory is is available.
>
> I didn't see any implementation of the stable sort algorithms without
>> additional memory, in any library GCC, TBB, IPP â€¦ I suppose because they
>> are slow.
>>
>
> GCC also degrades to a no-memory sort. See method "__inplace_stable_sort"
> in include/bits/stl_algo.h. It uses "__merge_without_buffer" to implement
> mergesort. GCC's inplace_merge is also adaptive, using
> "__merge_without_buffer" when
>
> This "merge without buffer" is slower, no doubt. However, as I mentioned
> in other posts, there are some newer algorithms that claim similar
> performance to stable_sort without external memory. As a sort expert you
> might find them interesting, the implementation might be quite complicated:
>
> GrailSort: https://github.com/Mrrl/GrailSort, based on
> http://comjnl.oxfordjournals.org/content/35/6/643.full.pdf
>
> Wikisort: https://github.com/BonzaiThePenguin/WikiSort. Based on paper
> https://github.com/BonzaiThePenguin/WikiSort/blob/master/tamc2008.pdf
>
> I don't know how these algorithms work with non-integer types, but they
> are at least interesting.
>
> If you have 100 elements and want to insert 30 elements, need to resize the
>> vector to 130 elements, the last 30 are empty.
>>
>> If you have other vector with the data to insert, you can use
>> smart_merge_sort, and use as auxiliary memory the last 15 elements of the
>> main vector. After sort, delete the repeated elements. And with the two
>> vectors you must do a half_merge fusion , and all is done.
>>
>
> Sure. But I have no guarantee that input data is in another vector, the
> container just receives just a range of iterators I can't modify, just
> read. So I should:
>
> -> allocate a new buffer to store 130 elements
> -> Copy new elements to the back of the buffer (last 30 positions)
> -> Sort them using 15 elements of additional storage from that buffer.
> -> Merge the old buffer and the newly sorted elements in the first
> positions of the new buffer.
> -> Destroy last 30 positions of the new buffer.
>
> The problem is that we don't want to reallocate in every range input, only
> when capacity is filled. Extra memory is not small when the input is bigger
> than the stored element count. E.g.: You have a vector of 30 elements and
> range to insert has 10000 elements, I'd need to allocate a buffer of 15000
> elements, extra 5000 elements only to do the initial sort. Obviously, your
> N/2 is much better than N.
>
> The ideal (not from a performance point of view, but from a "balanced"
> point of view) would be:
>
> Consider:
>
> N = size() //current flat_xxx size
> M = std::distance(first, last) //range to insert
> T = M+N
>
> If capacity() > N + M (vector is NOT reallocated)
> ---------------------------------------------------
>
> -> No new memory is allocated
>
> -> new elements are copied at the end of the already inserted elements.
>
> -> stable_sort in new elements passing the remaining capacity of the
> vector as a temporary buffer. stable_sort uses the extra capacity in some
> steps with complexity O(MlogM) sort. In the last steps, when no enough
> memory is available it uses a slower algorithm (the STL allows
> O(M(logM)^2)). If remaining capacity is >= M/2 mergesort will always be
> optimal. If GrailSort guarantees are true, O(MlogM) could be always
> obtained if stable_sort degrades to grailsort when no memory is available.
>
> -> A stable "inplace_merge" is used to merge the old elements (already
> sorted) with the newly sorted elements). Extra capacity is used in some
> steps. Complexity: T*log(T), O(T) if enough free memory (T) is available.
>
> The whole process' complexity ranges (depending on the available extra
> capacity) from:
>
> -> Worst case: O(M(logM)^2) + T*log(T).
> -> Best case: O(T) + O(MlogM)
>
> If capacity() < N + M
> ---------------------------------------------------
> A new buffer is allocated. new capacity: std::max(N,M/2) + M
> -> new elements are copied to the new buffer
> -> stable_sort is called using the extra memory of the new buffer.
> Complexity: best: O(MlogM).
> -> A merge is performed with the old elements. Complexity: O(T)
>
> Complexity : O(T) + O(MlogM)
>
>
> Best,
>
> Ion
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

Next message: Tom Kent: "Re: [boost] [regression] Tests for 'develop' not showing"
Previous message: Phil Endecott: "Re: [boost] [sort] [container] Challenge for sorting experts, improved stable sort"
In reply to: Ion GaztaÃ±aga: "Re: [boost] Fwd: [sort][container] Challenge for sorting experts, an improved stable sort"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk