Subject: Re: [boost] Proposed templated integer_sort
From: Steven Ross (spreadsort_at_[hidden])
Date: 2008-12-17 18:40:05
On Tue, Dec 16, 2008 at 3:42 PM, Scott McMurray <me22.ca+boost_at_[hidden]>wrote:
> As for the bit patterns, I can't think of any bit patterns that cannot
> be created by legal <<ing and ++ing, so I'm not convinced that invalid
> integer bit patterns exist (at least in unsigned types -- I don't know
> what ones-complement signed ints do with ~0, for example).
An IEEE floating-point number maps nicely to a signed int of the same size
for the purposes of sorting. The main issue is that negative floats
increase in the opposite order from the integer with the same bits, so I
sort them in reverse (if you look at the source). Positive floats sort in
the same order as the positive integer with the same bit representation. -0
ends up being less than 0 this way, though for std::sort they are equal.
NANs sort in a definite order during the radix-sorting portion, unlike
std::sort which dumps them arbitrarily throughout the resulting file. I
assume that developers won't mind these discrepancies for -0 and NANs.
This trick is commonly used by radix sorting implementations, and is
especially valuable because floating-point compares are inefficiently
implemented on some platforms.
I should note that I named the operation "casting_float_sort" because I was
concerned that not all theoretical platforms may support the cast. If
people are concerned about sorting some non-IEEE floating-point number, they
can use float_sort, which requires the user to specify the type they cast
the float to, or the functor versions, where the user defines the >>
operation (where the cast is). All versions of float_sort still assume that
negatives need to be sorted in reverse order; it's a method specialized to
sorting IEEE floating_point numbers as integers. Anything that can be
sorted in a normal fashion based upon an integer return from the >> operator
(or equivalent functor) can be sorted by integer_sort.
> That said, how is it used in practise? Aliasing allows reading
> through an unsigned char*, the size of which is always a factor of the
> size of any other type, so would it be better to implement everything
> to always go through that? I know naïve radix sorts will often go
> CHAR_BIT bits at a time to take advantage of that, but I figure the
> better implementation may well need something more complex.
I could do that, but I would often need one more iteration than is strictly
necessary, and I wouldn't be able to reuse integer_sort to sort the positive
floats. I have no reason to believe that going 8 bits at a time would be
any faster, unless it can save me a memory operation to work around a
casting restriction that isn't enforced on all (any?) platforms.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk