Subject: Re: [boost] [review][sort] Formal review period for Sort library begins today, November 10, and ends Wednesday, November 19
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2014-11-16 18:19:36
On 15 Nov 2014 at 21:29, Steven Ross wrote:
> > Another example of what I mean by sorted_view. Hopefully it makes
> > sense now.
> Thanks for the explanation. Have you considered partial_sort? That
> provides the separation of the sort runtime overhead you're requesting (as
> do heaps).
They're not quite commensurate. A bit like a string_view, a
sorted_view lets you write code as if a full sort had taken place,
it's just it's delayed until point of need. A partial sort requires
> With regards to sorting data that won't fit in main memory,
> there are two standard approaches:
> k-way merge, and radix-based division of the problem. Radix-based division
> is usually faster (merging is simply concatenation), but k-way merge
> guarantees that it will split up the data small enough to fit in RAM in one
> pass (radix is usually 1, and can guarantee no more than 2 passes). Radix
> based division is compatible with the kind of piece at a time sorting you
> mention. Both approaches are efficient ways to sort in parallel and/or
> data on disk. I've written code for both in the past.
> Disk-based sorting requires writing and managing a bunch of temp files, for
> which the optimal approach may vary across OSes. Most people who want a
> scalable way to sort data that is too large to fit in a single system's
> memory are probably best off using something like MapReduce. I could write
> a radix-based division/map function to split up data into manageable chunks
> as you're requesting, but handling the output streams efficiently is more
> complicated than the trivial bucketing the function would provide. Have
> you considered just bucketing your data based on the top X bits (say 8-16),
> which allows you to finish the sort later for each bucket as needed?
I have no present need to sort out of memory items, nor expect to any
time soon. My need for a kernel memory allocator is actually to
enable true zero copy network packet handling whereby incoming and
outgoing UDP packets are never ever copied, not once, except where
other people's code gets in the way. We essentially need UDP packets
to be assembled into a form we can dispatch to the GPU via OpenCL and
process there with hashing and decryption/encryption before sending
data straight to the filing system. Local process code need never
actually touch any of the data moving between disk and the NIC, it
all remains either in the GPU or in the kernel.
> You are correct about the brute force approach with forward iterators: that
> is a shallow copy. Sorting data that won't fit in memory normally requires
> a deep copy.
If you can SHA512 the serialisation of an item, you can thereafter
work with 512 bit integers and not need to touch the original. With
AVX2, 512 bit integers are even a single register now, amazing
> I'm willing to code up the radix bucketing with a guaranteed maximum of 2
> bucketing passes, or the k-way merge step, but they can't wrap the entire
> functionality without adding a bunch of file operations, which can be
> tricky to port efficiently. They'd also need a bunch of parameters about
> things like maximum RAM to use, number of files, maximum numbers of files
> to have open simultaneously, etc.
There is a library in the Boost review queue which implements
portable asynchronous file i/o. I may have had something to do with
> I've added some basic documentation on exception safety along these lines,
> just like std::sort has, in the develop branch.
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk