Boost logo

Boost :

From: Marshall Clow (mclow.lists_at_[hidden])
Date: 2023-06-15 03:39:48


On Jun 14, 2023, at 7:25 PM, Gavin Lambert via Boost <boost_at_[hidden]> wrote:
>
> On 15/06/2023 13:42, Marshall Clow wrote:
>> At Boostcon last month, there was a discussion of indirect sorting algorithms to boost::algorithm.
>> An “Indirect sort” (called 'argsort' in numpy - https://numpy.org/doc/stable/reference/generated/numpy.argsort.html) returns a sequence of indices into the collection describing how the collection can be rearranged in order to become sorted.
>> So I created https://github.com/boostorg/algorithm/pull/117 to start down this road.
>> (There’s clearly more that could be done here - stable_sort, partial_sort, nth_element, etc)
>> Comments welcomed.
>
> It invites an interesting question. A sequence of indexes is only particularly useful for a random-access collection.
>
> If it instead generated and then outputed a sorted sequence of iterators, it could potentially support other input collection types (such as std::list, or something else with forward-only iterators) as well.

Yes, but sorting is (in general) something we only do on random-access sequences.
But I get your point - if we were to return a sequence of iterators, we could sort non-RA sequences (at the cost of a single extra traversal at the beginning of the sort)

> But the utility of this depends on the subsequent usage. A collection of iterators is better than a collection of indexes (and frequently the same size, when not using checked iterators) if your purpose is to then iterate the collection in sorted order, or otherwise perform read-only operations.

For random-access collections, they’re essentially the same.

> A collection of iterators is less safe if you intend to perform mutations on the underlying list (such as trying to actually realize the sorted order), as this will sometimes invalidate them depending on the underlying collection type and the specific operations used. (Indexes can also be rendered similarly useless, but are somewhat easier to perform equivalent operations to keep in sync with less performance issues than trying to do the same with non-random-access iterators.)

— Marshall


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk