Subject: Re: [boost] sorting floats by casting to integer
From: Vladimir Prus (vladimir_at_[hidden])
Date: 2009-07-06 14:07:33
Steven Ross wrote:
> On Mon, Jul 6, 2009 at 8:37 AM, Vladimir Prus <vladimir_at_[hidden]>wrote:
>> > With 23-bit random data, all
>> the exponents are the same. With 32-bit random data, there will be 512
>> > different values for the exponents,
>> Are you sure? 32-bit IEEE format has 8 bits for the exponent.
> +1 for sign.
Technically, it's not part of the exponent. That is, the first bit is
not the sign of the exponent, it's the sign of the value.
>> > So not 120X, but still a substantial difference.
>> > Also worth noting is that when the element count increases, the speed
>> > difference increases, even on 32-bit random data.
>> > randomgen 16 16 200000000:
>> What real-world problems require sorting such huge arrays of floats?
> Circuit layouts, if someone wants to sort them for various purposes. They
> are gigabytes in size, so they take a long time to sort unless casting to
> integer is done.
OK, I know nothing about circuit layouts :-)
> Yes, I guess I was using denormalized numbers for the 23-bit case, but it's
> interesting that there is such a huge speed difference, and I doubt it's
> because they're denormalized; there is no reason to special-case
> denormalized sorting.
It is well possible that comparing denormalized numbers is slow.
> Notably, normalized numbers also show the same problem if their exponent
> range is small relative to the number of elements; this can be achieved by
> using a small number of different exponents, or simply a large number of
> floats (as shown with my large test). What do you consider a reasonable
> range? I can easily make a test that excludes denormalized numbers.
> Even if the full exponent range is in use (which isn't realistic), wouldn't
> it be better to use an algorithm that doesn't blow up when n gets much
> larger than 10 million elements, and is still faster with a smaller number
> of elements?
I don't quite follow the above point. My argument is (and always was), that
it's important to provide accurate measurements on the data that ordinary
humans will use. Sorting of 200'000'000 floats does not seem a typical use
case, for performance on that is probably less important. Just like sorting
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk