|
Boost : |
Subject: Re: [boost] [sort][container] Challenge for sorting experts, an improved stable sort
From: Steven Ross (spreadsort_at_[hidden])
Date: 2015-09-20 22:11:25
On Sun, Sep 20, 2015 at 6:09 PM Ion Gaztañaga <igaztanaga_at_[hidden]> wrote:
> Hi,
>
> I'm trying to improve insertion times for flat associative containers.
> In range insertions one optimization is to insert the whole range in the
> end of the container and sort the whole vector instead of trying a loop.
>
> I could try to use std::sort/std::stable_sort but there are some problems:
>
> -> Non-portable for C++03 compilers and Boost.Move, and Boost.Container
> fully supports those platforms.
>
Do you have example current users of your library who need this
portability? It would help me to understand an actual use case.
> So I started analyzing several papers and implementations to see if a
> custom implementation was possible, but I felt Boost.Sort could be a
> good place to have that implementation and the Boost community a good
> place to discuss design alternatives.
>
That's reasonable.
> My initial requirements:
>
> 1) Stable sort. A non-stable sort is faster and useful for unique keys
> containers, but stable sort can be used initially in all flat containers
> until the non-stable version is implemented.
>
> 2) Externally supplied auxiliary memory with optionally no auxiliary
> memory at all. It seems that there are newer algorithms like Grailsort
> (https://github.com/Mrrl/GrailSort) or WikiSort
> (https://github.com/BonzaiThePenguin/WikiSort) that claim O(1) memory,
> even no auxiliary buffer, and key comparison and data exchanges with
> complexity O(n*log(n)). However I don't think those implementation
> correctly handle the basic exception guarantee or non-trivially copyable
> types.
>
> Reason: I want to avoid allocating short-lived temporary memory: if it's
> available I want to pass the extra capacity that flat containers'
> internal vector holds for future insertions.
>
Please figure out your full set of requirements, state them, and then list
algorithms that fulfill them (if any). Spreadsort and other radix-based
sorts are capable of being stable if you give them enough RAM and structure
the computation properly, and are quite fast properly used. Would it be ok
to have a full copy of pointers to the elements?
>
> 3) Customization points (without ADL, which IMHO is not customizable
> enough). The sort algorithm should use a SortTraits template parameter
> for basic operations like "move", "swap", "initialized_move". E.g:
>
> template<class T>
> struct sort_traits
> {
> static? void move(T &from, T &to);
> static? void swap(T &l, T &r);
> static? void uninitialized_move(T &from, void *to);
> static? void destroy(T &from, void *to);
> };
>
> I don't know if this trait should be stateful or not (static members). I
> could imagine the usefulness of a stateful implementation of sort_traits
> that could store a reference to an allocator so that some special
> "construct" version is used in some operations. Or just a version that
> counts the number of each operation to ease debugging. In any case, a
> stateless version would be enough for most cases.
>
My inclination is to avoid adding options for users unless they are
absolutely necessary, but if you're confident you'll need this flexibility
in the future it seems reasonable.
>
> Reason: In Boost.Container, the idea is to use boost::move +
> boost::adl_move_swap in the initial implementation, maybe some
> destructive move implementation in a second iteration if Boost.Move
> implements it. In any case those operations should be customizable by
> the user, independently from the type to be sorted. This also avoids
> unnecessary dependencies in Boost.Sort.
>
> -> Dependencies: the implementation should only rely on Boost.Core,
> Boost.Config, maybe Boost.TypeTraits and no heavy STL dependencies
> (optionally). Currently Boost.Sort depends on Serialization for
> BOOST_STATIC_WARNING, but I think this is easily solvable, maybe this
> static assert utility should go to the core/utility library.
>
> Reason: Sort is a low-level algorithm that should avoid pulling unneeded
> dependencies to other boost libraries.
>
If you're willing to shift the critical dependencies of Boost.Sort into a
smaller library so it doesn't include the world, I'd love that.
>
> * * *
>
> I don't know if other Boost libraries could benefit from this new
> stable_sort implementation. Maybe it's too special and I should try to
> write it myself in Boost.Container.
>
> I would love to hear comments and suggestions from sort experts, this
> might be a common need that could enrich the Boost.Sort library.
>
> Have you talked with Francisco? He has a promising stable sort that we
were getting close to adding, but he hasn't fixed his test class yet to
eliminate some weirdness in it I've pointed out.
Note: I have very little bandwidth until late December to work on anything,
as I'm on a remote assignment.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk