Boost logo

Boost :

Subject: Re: [boost] [review] Formal review period for Sort library continuing through November 19
From: Julian Gonggrijp (j.gonggrijp_at_[hidden])
Date: 2014-11-20 13:03:53


Steven Ross wrote:

>> How is Get_char
>> supposed to ensure all by itself that all first characters of all book
>> titles are considered in the same radix pass, i.e. the one after the last
>> character of the longest author name?
>>
> By alternating between returning payload radix characters and a "does this
> field continue" byte. A 0 will specify an end of field, and sort lower in
> the order than a 1, thus making all shorter names sort lower than names
> that are equivalent to the same length but longer.
> Thus, Author: Jon Title: bat will have GetChar output these values:
> 1,'J',1,'o',1,'n',0 ,1,'b',1,'a',1,'t',0
>
> Author: Jonathan Title:(empty), will output these values:
> 1,'J',1,'o',1,'n',1,'a',1,'t',1,'h',1,'a',1,'n',0, 0
>
> The 0 vs 1 will cause Jon to appear before Jonathan in order, because it is
> shorter. Actually, the 0s and 1s aren't necessary for the last subkey (the
> title), but I included them to show this approach.
>
> There is also a simple GetLength functor that lets string_sort know when
> all characters are exhausted.
>
>>
>> By contrast, it is very clear how refining the comparison function would
>> be sufficient for a comparison sort. Just compare by author; in case of a
>> tie, compare by title.
>>
>
> That's exactly what the Get_char method I explained is doing, but it's less
> intuitive. The radix equivalent requires encoding the same information the
> comparison function uses, but sometimes in a different byte format to
> include factors such as sizes and signs. As I said, doable, but may not be
> worth the effort.
>
>
>> To the best of my knowledge, it is common wisdom that radix sorts are not
>> as general as comparison sorts. I can be wrong. I may have missed an
>> important (new?) insight. If that is the case, please explain to me how
>> refining the Get_char function is sufficient to deal with any strict weak
>> ordering (just repeating the claim does not help me understand).
>>
>> I agree with your statement that this is conventional wisdom.
> 2 major factors are included in this conventional wisdom:
> 1) Most Radix sorts perform poorly relative to comparison sorting for small
> N, large K, or variable-length data. Hybrid-radix sorting resolves this
> problem.
> 2) The mapping from a sorting requirement to a comparison function is more
> intuitive and generally simpler than the mapping to a Get_char-type radix
> implementation. With #1 limiting performance, and comparison-based sorts
> being fast enough for most problems, I think most people just didn't put in
> the effort to examine the possibility of generalized radix sorting.
> string_sort provides this capability. It won't be optimal for many types
> of problems, but it can be used for them and provides worst-case
> performance guarantees.

Thank you, Steven. It clicked. I guess I just needed a “explain like I’m
10” type of explanation.

Issue resolved, as far as I’m concerned.

Cheers,
Julian


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk