|
Boost : |
Subject: Re: [boost] [sort] Re: [review] Formal review period for Sort library begins today, November 10, and ends Wednesday, November 19
From: Steven Ross (spreadsort_at_[hidden])
Date: 2014-11-10 06:46:43
>> - What is your evaluation of the potential usefulness of the
>> library?
>
>
> I am not sure the library is useful. For very few users, sorting 100kb
> arrays of integers is a performance
> bottleneck. And if it's truly a performance bottleneck, one would probably
> ponder cache architecture, or use of GPU, or
> a zillion other tricks. Like this paper from 2010 appears to do, for
> example:
>
> http://dl.acm.org/citation.cfm?doid=1807167.1807207
With regards to practicality: I applied the string_sort algorithm to
bzip and saw a 40% reduction in total compression time. Would you
like to see the source for that?
There are problems where people sort billions of strings, integers,
floats, or things that can be sorted like them, such as processing
integrated circuit designs, where time is money.
One quote from that article you cite: "Merge sort performs better than
radix sort for sorting keys of large sizes - such keys will be
required to accommodate the increasing cardinality of future
databases"
Merge sort is fundamentally less efficient than a smart hybrid-radix
approach when CPU sorting long keys because it has to compare the
entire length of the key up to the point of difference multiple times,
where the string_sort in this library only passes through each
character in the key once (or twice depending on how you count) during
the radix sorting portion, and then only compares what hasn't already
been passed through in the final comparison-sorting stage once the
problem has been cut down to less than a specific constant size.
Do you happen to have a copy of the full article? I'm happy to apply
any helpful concept to improve the library.
>> And finally, every review should attempt to answer this question:
>>
>> - Do you think the library should be accepted as a Boost library?
>
>
> I think the library should not be accepted. There's no reason to give Boost
> approval to implementation of algorithm that
> is neither accepted classic nor an obvious breakthrough.
With regards to classic algorithms, this library provides an
implementation similar to Adaptive Left Radix for integer_sort, which
came after it but was more widely available, with the addition of
better worst-case performance. It also provides an adaptation of
American Flag Sort (a classic algorithm) to C++ strings and with
better worst-case performance guarantees.
Did you read the overview section of the documentation that came with
the library?
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk