Boost logo

Boost :

Subject: Re: [boost] Fwd: [sort][container] Challenge for sorting experts, an improved stable sort
From: Ion Gaztañaga (igaztanaga_at_[hidden])
Date: 2015-09-23 10:10:15


On 22/09/2015 6:58, Francisco José Tapia wrote:
> Hi Jon,
>
> If the elements in the flat associative container are sorted, you need only
> to sort the elements to insert and after do a fusion of the two parts.

Yes, I understand that merge has better complexity guarantees than sort,
sincethe already stored elements are sorted, it's better to sort the new
elements and merge.

> If your container admit repeated elements, you can use the std::sort, if
> not, you must use an stable sort , and delete the repeated.

I don't think I could use std::sort as is unstable, if there are
repeated elements, I should only keep the first occurrence, so I need
stability while sorting. In any case, I can't use standard algorithms as
as I'd like some traits support to customize basic operations (swap,
move, uninitialized_move) in order to use Boost.Move or future
destructive move operations.

> You have several algorithms for to do a stable sort. In the library we are
> developing (https://github.com/fjtapia/sort_parallel ) you have a
> smart_merge_sort, which is a very fast stable sort and use only one half of
> the memory used by the data, is exception safe and use the move semantics

Very nice! Is the N/2 memory requirement achieved with techniques
similar to those described here?

"Fast mergesort implementation based on half-copying merge algorithm"
http://kicia.ift.uni.wroc.pl/algorytmy/mergesortpaper.pdf.

Just some comments on your implementation, (I'm not a sorting expert,
sorry if I'm mistaken), mainly related to "adaptability":

Your implementation is not adaptive in the sense that a failure to
allocate N/2 external memory is triggered with an exception. In
contrast, many std::stable_sort implementations try to allocate
N/4,N/8,etc... and they gracefully degrade to a O((log(N)^2) sort for
sorting steps that have no enough external memory. Eg.: they use
available memory to speed up apply mergesort for the "low level" phases
(merging small ranges that could fit in the external memory). Only the
final merge steps are done O((log(N)^2). This can be seen in the SGI
implementation:

https://www.sgi.com/tech/stl/stl_algo.h

"inplace_merge" can also be adaptive, achieving O(N) if enough memory is
available and ONlogN if no memory is is available.

> I didn't see any implementation of the stable sort algorithms without
> additional memory, in any library GCC, TBB, IPP … I suppose because they
> are slow.

GCC also degrades to a no-memory sort. See method
"__inplace_stable_sort" in include/bits/stl_algo.h. It uses
"__merge_without_buffer" to implement mergesort. GCC's inplace_merge is
also adaptive, using "__merge_without_buffer" when

This "merge without buffer" is slower, no doubt. However, as I mentioned
in other posts, there are some newer algorithms that claim similar
performance to stable_sort without external memory. As a sort expert you
might find them interesting, the implementation might be quite complicated:

GrailSort: https://github.com/Mrrl/GrailSort, based on
http://comjnl.oxfordjournals.org/content/35/6/643.full.pdf

Wikisort: https://github.com/BonzaiThePenguin/WikiSort. Based on paper
https://github.com/BonzaiThePenguin/WikiSort/blob/master/tamc2008.pdf

I don't know how these algorithms work with non-integer types, but they
are at least interesting.

> If you have 100 elements and want to insert 30 elements, need to resize the
> vector to 130 elements, the last 30 are empty.
>
> If you have other vector with the data to insert, you can use
> smart_merge_sort, and use as auxiliary memory the last 15 elements of the
> main vector. After sort, delete the repeated elements. And with the two
> vectors you must do a half_merge fusion , and all is done.

Sure. But I have no guarantee that input data is in another vector, the
container just receives just a range of iterators I can't modify, just
read. So I should:

-> allocate a new buffer to store 130 elements
-> Copy new elements to the back of the buffer (last 30 positions)
-> Sort them using 15 elements of additional storage from that buffer.
-> Merge the old buffer and the newly sorted elements in the first
positions of the new buffer.
-> Destroy last 30 positions of the new buffer.

The problem is that we don't want to reallocate in every range input,
only when capacity is filled. Extra memory is not small when the input
is bigger than the stored element count. E.g.: You have a vector of 30
elements and range to insert has 10000 elements, I'd need to allocate a
buffer of 15000 elements, extra 5000 elements only to do the initial
sort. Obviously, your N/2 is much better than N.

The ideal (not from a performance point of view, but from a "balanced"
point of view) would be:

Consider:

N = size() //current flat_xxx size
M = std::distance(first, last) //range to insert
T = M+N

If capacity() > N + M (vector is NOT reallocated)
---------------------------------------------------

-> No new memory is allocated

-> new elements are copied at the end of the already inserted elements.

-> stable_sort in new elements passing the remaining capacity of the
vector as a temporary buffer. stable_sort uses the extra capacity in
some steps with complexity O(MlogM) sort. In the last steps, when no
enough memory is available it uses a slower algorithm (the STL allows
O(M(logM)^2)). If remaining capacity is >= M/2 mergesort will always be
optimal. If GrailSort guarantees are true, O(MlogM) could be always
obtained if stable_sort degrades to grailsort when no memory is available.

-> A stable "inplace_merge" is used to merge the old elements (already
sorted) with the newly sorted elements). Extra capacity is used in some
steps. Complexity: T*log(T), O(T) if enough free memory (T) is available.

The whole process' complexity ranges (depending on the available extra
capacity) from:

-> Worst case: O(M(logM)^2) + T*log(T).
-> Best case: O(T) + O(MlogM)

If capacity() < N + M
---------------------------------------------------
A new buffer is allocated. new capacity: std::max(N,M/2) + M
-> new elements are copied to the new buffer
-> stable_sort is called using the extra memory of the new buffer.
Complexity: best: O(MlogM).
-> A merge is performed with the old elements. Complexity: O(T)

Complexity : O(T) + O(MlogM)

Best,

Ion


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk