
Boost : 
Subject: Re: [boost] sorting floats by casting to integer
From: Vladimir Prus (vladimir_at_[hidden])
Date: 20090706 11:37:53
Steven Ross wrote:
> On Sun, Jul 5, 2009 at 9:17 AM, Phil Endecott <
> spam_from_boost_dev_at_[hidden]> wrote:
>
>>
>> If the problem were simply that "x86 FP is very slow", then this problem
>> would be evident with std::sort as well as your algorithm. I've therefore
>> done a simple test with a program like the following:
>
>
> Agreed Phil. I do see it with std::sort. Also Vladimir also saw a large
> runtime difference with 23bit random data, though not quite as large.
>
>
>>
>> Note that I haven't actually checked if I get the right answers from this,
>> so please let me know if you can see any mistakes! Also note that I have
>> not done any fixup for negative floats, so some small additional time would
>> be needed for that.
>>
>> I've run variations of this code with the following results:
>>
>> vector<int>, std::sort(begin,end) : 3.71 s
>> vector<float>, std::sort(begin,end) : 9.12 s
>> vector<float>, std::sort(begin,end,compare_floats_as_ints) : 4.19 s
>
>
> My results, with 32bit random data, which is what you appear to be using
> (you did not provide your random_fill function; I used randomgen 16 16
> 10000000 to get these results):
> vector<float>, sorting as integers (my approach) : 1.417s and data is
> correctly sorted
> vector<float>, std::sort(begin, end): 2.793s and data is correctly sorted
> vector<float>, std::sort(begin,end,compare_floats_as_ints) : 4.291s and data
> is not correctly sorted (because negatives aren't flipped)
> I set NaNs (I get 3975 of them) to 0.0 in my testing so as to have a unique
> correct sorted output.
> I'm using a Core 2 Duo.
>
> I think I've figured out what's going on with my 23bit random data: floats
> on x86 have some form of optimization to look at the exponent first, and
> then the coefficient afterwards. On x86 systems, the comparison is
> extremely slow if the exponents are the same.
Same, and equal to what? If your test ends up generating a pile of
denormalized numbers, then it's rather broken regardless of what
processor you are using.
> With 23bit random data, all
> the exponents are the same. With 32bit random data, there will be 512
> different values for the exponents,
Are you sure? 32bit IEEE format has 8 bits for the exponent.
> splitting the identicalexponent
> problems much smaller, to the point where they are close to the size of the
> cache (incache sorting is extremely fast).
> With realistic data, there will be multiple exponents, but not as many as
> there are data elements. More likely, there will be a small range of
> exponents, and for each of them, many elements with the same exponent.
> Also, a denialofservice attack could easily be made against a system that
> sort floats by just feeding it floats with the same exponent, but different
> coefficients.
> When I test with 26bit random data, where there are 8 different exponent
> values (a more realistic case), here are my results:
> sorting floats as integers (my approach): 1.473s
> std::sort 21.511s
> (I used randomgen 10 16 10000000 to generate the input data)
>
> So not 120X, but still a substantial difference.
>
> Also worth noting is that when the element count increases, the speed
> difference increases, even on 32bit random data.
> randomgen 16 16 200000000:
What realworld problems require sorting such huge arrays of floats?
> I'll test with SSE and other compiler settings next. Is there an easy way
> to add these with bjam?
Because bjam is actually a lowlevel build engine, it is pretty much impossible
to do anything with it at all. Boost.Build allows you to pass any flags
to compiler you see fit, my previous post gives the actual syntax I have
used.
 Volodya
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk