Boost logo

Boost :

Subject: Re: [boost] sorting floats by casting to integer
From: Vladimir Prus (vladimir_at_[hidden])
Date: 2009-07-06 11:37:53


Steven Ross wrote:

> On Sun, Jul 5, 2009 at 9:17 AM, Phil Endecott <
> spam_from_boost_dev_at_[hidden]> wrote:
>
>>
>> If the problem were simply that "x86 FP is very slow", then this problem
>> would be evident with std::sort as well as your algorithm. I've therefore
>> done a simple test with a program like the following:
>
>
> Agreed Phil. I do see it with std::sort. Also Vladimir also saw a large
> runtime difference with 23-bit random data, though not quite as large.
>
>
>>
>> Note that I haven't actually checked if I get the right answers from this,
>> so please let me know if you can see any mistakes! Also note that I have
>> not done any fixup for negative floats, so some small additional time would
>> be needed for that.
>>
>> I've run variations of this code with the following results:
>>
>> vector<int>, std::sort(begin,end) : 3.71 s
>> vector<float>, std::sort(begin,end) : 9.12 s
>> vector<float>, std::sort(begin,end,compare_floats_as_ints) : 4.19 s
>
>
> My results, with 32-bit random data, which is what you appear to be using
> (you did not provide your random_fill function; I used randomgen 16 16
> 10000000 to get these results):
> vector<float>, sorting as integers (my approach) : 1.417s and data is
> correctly sorted
> vector<float>, std::sort(begin, end): 2.793s and data is correctly sorted
> vector<float>, std::sort(begin,end,compare_floats_as_ints) : 4.291s and data
> is not correctly sorted (because negatives aren't flipped)
> I set NaNs (I get 3975 of them) to 0.0 in my testing so as to have a unique
> correct sorted output.
> I'm using a Core 2 Duo.
>
> I think I've figured out what's going on with my 23-bit random data: floats
> on x86 have some form of optimization to look at the exponent first, and
> then the coefficient afterwards. On x86 systems, the comparison is
> extremely slow if the exponents are the same.

Same, and equal to what? If your test ends up generating a pile of
denormalized numbers, then it's rather broken regardless of what
processor you are using.

> With 23-bit random data, all
> the exponents are the same. With 32-bit random data, there will be 512
> different values for the exponents,

Are you sure? 32-bit IEEE format has 8 bits for the exponent.

> splitting the identical-exponent
> problems much smaller, to the point where they are close to the size of the
> cache (in-cache sorting is extremely fast).
> With realistic data, there will be multiple exponents, but not as many as
> there are data elements. More likely, there will be a small range of
> exponents, and for each of them, many elements with the same exponent.
> Also, a denial-of-service attack could easily be made against a system that
> sort floats by just feeding it floats with the same exponent, but different
> coefficients.
> When I test with 26-bit random data, where there are 8 different exponent
> values (a more realistic case), here are my results:
> sorting floats as integers (my approach): 1.473s
> std::sort 21.511s
> (I used randomgen 10 16 10000000 to generate the input data)
>
> So not 120X, but still a substantial difference.
>
> Also worth noting is that when the element count increases, the speed
> difference increases, even on 32-bit random data.
> randomgen 16 16 200000000:

What real-world problems require sorting such huge arrays of floats?

> I'll test with SSE and other compiler settings next. Is there an easy way
> to add these with bjam?

Because bjam is actually a low-level build engine, it is pretty much impossible
to do anything with it at all. Boost.Build allows you to pass any flags
to compiler you see fit, my previous post gives the actual syntax I have
used.

- Volodya


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk