Boost logo

Boost :

Subject: Re: [boost] Fwd: [sort][container] Challenge for sorting experts, an improved stable sort
From: Francisco José Tapia (fjtapia_at_[hidden])
Date: 2015-09-23 15:53:12


The problem of the auxiliary memory is inverse to the speed.

I didn't know the Grail and the wikiSort, I will examine when finish the
new parallel_sort algorithm, and if you are interested, I will say my
opinion.

I developed time ago an stable sort algorithm with a small auxiliary
memory. You can define the size of this auxiliary memory, and run with any
size. Small size, small speed. With a size of 10% of the memory used by the
data, I remember , was around 30% slower than the fastest implementation

If the Boost community think interesting, we can try to develop algorithms
with small auxiliary memory or without it, for to sort and merge .

Other solution, with low memory is the indirect sort. You create a pointer
to each element of the input data,. The elements of the auxiliary memory
used in the stable sort are pointers , which usually are small than the
elements to sort. Using the smart_merge_sort you need one half only

You have the complete code of the indirect sort in the (
https://github.com/fjtapia/so
<https://github.com/fjtapia/sort_parallel>rt_parallel
).

If you have any problem or question, contact me.

2015-09-23 16:10 GMT+02:00 Ion Gaztañaga <igaztanaga_at_[hidden]>:

> On 22/09/2015 6:58, Francisco José Tapia wrote:
>
>> Hi Jon,
>>
>> If the elements in the flat associative container are sorted, you need
>> only
>> to sort the elements to insert and after do a fusion of the two parts.
>>
>
> Yes, I understand that merge has better complexity guarantees than sort,
> sincethe already stored elements are sorted, it's better to sort the new
> elements and merge.
>
> If your container admit repeated elements, you can use the std::sort, if
>> not, you must use an stable sort , and delete the repeated.
>>
>
> I don't think I could use std::sort as is unstable, if there are repeated
> elements, I should only keep the first occurrence, so I need stability
> while sorting. In any case, I can't use standard algorithms as as I'd like
> some traits support to customize basic operations (swap, move,
> uninitialized_move) in order to use Boost.Move or future destructive move
> operations.
>
> You have several algorithms for to do a stable sort. In the library we are
>> developing (https://github.com/fjtapia/sort_parallel ) you have a
>> smart_merge_sort, which is a very fast stable sort and use only one half
>> of
>> the memory used by the data, is exception safe and use the move semantics
>>
>
> Very nice! Is the N/2 memory requirement achieved with techniques similar
> to those described here?
>
> "Fast mergesort implementation based on half-copying merge algorithm"
> http://kicia.ift.uni.wroc.pl/algorytmy/mergesortpaper.pdf.
>
> Just some comments on your implementation, (I'm not a sorting expert,
> sorry if I'm mistaken), mainly related to "adaptability":
>
> Your implementation is not adaptive in the sense that a failure to
> allocate N/2 external memory is triggered with an exception. In contrast,
> many std::stable_sort implementations try to allocate N/4,N/8,etc... and
> they gracefully degrade to a O((log(N)^2) sort for sorting steps that have
> no enough external memory. Eg.: they use available memory to speed up apply
> mergesort for the "low level" phases (merging small ranges that could fit
> in the external memory). Only the final merge steps are done O((log(N)^2).
> This can be seen in the SGI implementation:
>
> https://www.sgi.com/tech/stl/stl_algo.h
>
> "inplace_merge" can also be adaptive, achieving O(N) if enough memory is
> available and ONlogN if no memory is is available.
>
> I didn't see any implementation of the stable sort algorithms without
>> additional memory, in any library GCC, TBB, IPP … I suppose because they
>> are slow.
>>
>
> GCC also degrades to a no-memory sort. See method "__inplace_stable_sort"
> in include/bits/stl_algo.h. It uses "__merge_without_buffer" to implement
> mergesort. GCC's inplace_merge is also adaptive, using
> "__merge_without_buffer" when
>
> This "merge without buffer" is slower, no doubt. However, as I mentioned
> in other posts, there are some newer algorithms that claim similar
> performance to stable_sort without external memory. As a sort expert you
> might find them interesting, the implementation might be quite complicated:
>
> GrailSort: https://github.com/Mrrl/GrailSort, based on
> http://comjnl.oxfordjournals.org/content/35/6/643.full.pdf
>
> Wikisort: https://github.com/BonzaiThePenguin/WikiSort. Based on paper
> https://github.com/BonzaiThePenguin/WikiSort/blob/master/tamc2008.pdf
>
> I don't know how these algorithms work with non-integer types, but they
> are at least interesting.
>
> If you have 100 elements and want to insert 30 elements, need to resize the
>> vector to 130 elements, the last 30 are empty.
>>
>> If you have other vector with the data to insert, you can use
>> smart_merge_sort, and use as auxiliary memory the last 15 elements of the
>> main vector. After sort, delete the repeated elements. And with the two
>> vectors you must do a half_merge fusion , and all is done.
>>
>
> Sure. But I have no guarantee that input data is in another vector, the
> container just receives just a range of iterators I can't modify, just
> read. So I should:
>
> -> allocate a new buffer to store 130 elements
> -> Copy new elements to the back of the buffer (last 30 positions)
> -> Sort them using 15 elements of additional storage from that buffer.
> -> Merge the old buffer and the newly sorted elements in the first
> positions of the new buffer.
> -> Destroy last 30 positions of the new buffer.
>
> The problem is that we don't want to reallocate in every range input, only
> when capacity is filled. Extra memory is not small when the input is bigger
> than the stored element count. E.g.: You have a vector of 30 elements and
> range to insert has 10000 elements, I'd need to allocate a buffer of 15000
> elements, extra 5000 elements only to do the initial sort. Obviously, your
> N/2 is much better than N.
>
> The ideal (not from a performance point of view, but from a "balanced"
> point of view) would be:
>
> Consider:
>
> N = size() //current flat_xxx size
> M = std::distance(first, last) //range to insert
> T = M+N
>
> If capacity() > N + M (vector is NOT reallocated)
> ---------------------------------------------------
>
> -> No new memory is allocated
>
> -> new elements are copied at the end of the already inserted elements.
>
> -> stable_sort in new elements passing the remaining capacity of the
> vector as a temporary buffer. stable_sort uses the extra capacity in some
> steps with complexity O(MlogM) sort. In the last steps, when no enough
> memory is available it uses a slower algorithm (the STL allows
> O(M(logM)^2)). If remaining capacity is >= M/2 mergesort will always be
> optimal. If GrailSort guarantees are true, O(MlogM) could be always
> obtained if stable_sort degrades to grailsort when no memory is available.
>
> -> A stable "inplace_merge" is used to merge the old elements (already
> sorted) with the newly sorted elements). Extra capacity is used in some
> steps. Complexity: T*log(T), O(T) if enough free memory (T) is available.
>
> The whole process' complexity ranges (depending on the available extra
> capacity) from:
>
> -> Worst case: O(M(logM)^2) + T*log(T).
> -> Best case: O(T) + O(MlogM)
>
> If capacity() < N + M
> ---------------------------------------------------
> A new buffer is allocated. new capacity: std::max(N,M/2) + M
> -> new elements are copied to the new buffer
> -> stable_sort is called using the extra memory of the new buffer.
> Complexity: best: O(MlogM).
> -> A merge is performed with the old elements. Complexity: O(T)
>
> Complexity : O(T) + O(MlogM)
>
>
> Best,
>
> Ion
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk