|
Boost : |
Subject: [boost] [sort][container] Challenge for sorting experts, an improved stable sort
From: Ion Gaztañaga (igaztanaga_at_[hidden])
Date: 2015-09-20 18:05:57
Hi,
I'm trying to improve insertion times for flat associative containers.
In range insertions one optimization is to insert the whole range in the
end of the container and sort the whole vector instead of trying a loop.
An additional optimization proposed in the mailing list is to have an
option to create externally the internal vector type of the flat
map/set, fill it and then try to move it into the flat container. That
requires sorting the moved vector (and removing duplicates if needed).
I could try to use std::sort/std::stable_sort but there are some problems:
-> Non-portable for C++03 compilers and Boost.Move, and Boost.Container
fully supports those platforms.
-> Additional O(N) temporary memory for stable sorting. Usually this
means a temporary N element array, although there are algorithm to use
only N/2 temporary elements. With big containers, this is a lot of memory.
So I started analyzing several papers and implementations to see if a
custom implementation was possible, but I felt Boost.Sort could be a
good place to have that implementation and the Boost community a good
place to discuss design alternatives.
My initial requirements:
1) Stable sort. A non-stable sort is faster and useful for unique keys
containers, but stable sort can be used initially in all flat containers
until the non-stable version is implemented.
2) Externally supplied auxiliary memory with optionally no auxiliary
memory at all. It seems that there are newer algorithms like Grailsort
(https://github.com/Mrrl/GrailSort) or WikiSort
(https://github.com/BonzaiThePenguin/WikiSort) that claim O(1) memory,
even no auxiliary buffer, and key comparison and data exchanges with
complexity O(n*log(n)). However I don't think those implementation
correctly handle the basic exception guarantee or non-trivially copyable
types.
Reason: I want to avoid allocating short-lived temporary memory: if it's
available I want to pass the extra capacity that flat containers'
internal vector holds for future insertions.
3) Customization points (without ADL, which IMHO is not customizable
enough). The sort algorithm should use a SortTraits template parameter
for basic operations like "move", "swap", "initialized_move". E.g:
template<class T>
struct sort_traits
{
static? void move(T &from, T &to);
static? void swap(T &l, T &r);
static? void uninitialized_move(T &from, void *to);
static? void destroy(T &from, void *to);
};
I don't know if this trait should be stateful or not (static members). I
could imagine the usefulness of a stateful implementation of sort_traits
that could store a reference to an allocator so that some special
"construct" version is used in some operations. Or just a version that
counts the number of each operation to ease debugging. In any case, a
stateless version would be enough for most cases.
Reason: In Boost.Container, the idea is to use boost::move +
boost::adl_move_swap in the initial implementation, maybe some
destructive move implementation in a second iteration if Boost.Move
implements it. In any case those operations should be customizable by
the user, independently from the type to be sorted. This also avoids
unnecessary dependencies in Boost.Sort.
-> Dependencies: the implementation should only rely on Boost.Core,
Boost.Config, maybe Boost.TypeTraits and no heavy STL dependencies
(optionally). Currently Boost.Sort depends on Serialization for
BOOST_STATIC_WARNING, but I think this is easily solvable, maybe this
static assert utility should go to the core/utility library.
Reason: Sort is a low-level algorithm that should avoid pulling unneeded
dependencies to other boost libraries.
* * *
I don't know if other Boost libraries could benefit from this new
stable_sort implementation. Maybe it's too special and I should try to
write it myself in Boost.Container.
I would love to hear comments and suggestions from sort experts, this
might be a common need that could enrich the Boost.Sort library.
Best,
Ion
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk