Boost logo

Boost :

Subject: Re: [boost] [sort] Re: [review] Formal review period for Sort library begins today, November 10, and ends Wednesday, November 1
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2014-11-10 13:02:06


On 10 Nov 2014 at 8:18, Steven Ross wrote:

> > I want to see scaling benchmarks comparing this implementation with
> > the traditional STL containers for 0 to 10,000 items, for both single
> > threaded and concurrent usage (even if that is simply a spinlock
> > around the library).
> >
> > If these scaling benchmarks are not provided, I automatically vote
> > no.
>
> Niall, I'm not sure precisely what you're asking for, and would like
> clarification:
>
> Are you asking me to sort 0, 1, 2, 4, ... 2^14 elements, input in this form:
> vector<int>
> vector<float>
> vector<string>

I'd be happy with int and string only personally. Float is nice
though.

> And compare the speeds of std::sort vs integer_sort, float_sort, and
> string_sort, both sequentially, and doing multiple separate sorts in
> parallel?

Sounds good. You'd get a graph of CPU cycle scaling to N looking
like:

http://www.nedprod.com/programs/portable/nedtries/images/LogLogBitwise
TreesScaling.png

http://www.nedprod.com/programs/portable/nedtries/images/LogLogRedBlac
kTreesScaling.png

http://www.nedprod.com/programs/portable/nedtries/images/LogLogHashTab
leScaling.png

For your situation you just sort, so a combined graph showing a
comparative to other algorithms makes much more sense.

Regarding parallel sort, you don't need to necessarily sort the same
dataset. Rather many parallel sorts shows how well an algorithm
coexists with other work on the same CPU. As an example of what I
mean, a single thread doing red-black tree modification is fast, but
two threads doing red-black tree modification sees exponential slow
down if the cache lines touch. That makes std::map extremely unsuited
to multiple thread use, even when protected by a spinlock.

Finally, and the most important, these sorts of scaling benchmarks
demonstrate proof you've thought about this sort of stuff. That
stamps a great big fat mark of quality on your library. Anyone can
claim anything about some bit of code of theirs. Not everyone can
empirically demonstrate its truth others can replicate easily for
themselves.

> What distribution do you want this to sort? Evenly distributed
> random, the variety of distributions (including evenly distributed
> random) used in tune.pl, or something else?

I think I remember seeing your algorithm makes good use of structure?
If so, totally random, partially sorted and nearly sorted are all
good distributions.

> I'd like to note that if the input is <3000 elements, this library
> immediately falls back to std::sort, as that is the approximate
> crossover point at which hybrid radix sorting becomes faster.
> Differences for arrays smaller than that are likely to be due to the
> overhead of the size check + the increase in executable size, which is
> likely to be difficult to separate out from noise.

Now that is a very vital piece of new information. I'll be honest in
saying that I am unsure how useful a > 3000 element algorithm can be.

Maybe if you implement a no-alloc core loop like Antony suggested you
can significantly improve on this? > 300 would be lovely.

I also have to admit a curiosity. My non-allocating bitwise tries
library nedtries could be viewed as a way of very quickly nearly
sorting items, even though I wrote it with the intention of it being
an indexing algorithm rather than sorting algorithm. An interesting
side effect of the non-allocating in-place algorithm is that it is
highly amenable to being multithreaded i.e. you can fire N threads at
it, and get ~N speedup if your keys are very random.

Reading back that index would give you an almost sorted array, one
upon which bubble sort would surely perform excellently. As you are
proposing a full on Boost.Sort library rather than just a single sort
algorithm, I thought I should raise this perhaps even better
performing sort algorithm which can make excellent use of multiple
cores, something which is always hard to do for sort algorithms.

I sadly have no need for such algorithms in my present work, and
therefore can't allocate any time of my own on this sort of stuff. A
shame.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk