|
Boost : |
Subject: Re: [boost] Proposed templated integer_sort
From: Steven Ross (spreadsort_at_[hidden])
Date: 2009-01-05 11:08:03
On Tue, Dec 16, 2008 at 7:34 AM, Scott McMurray <me22.ca+boost_at_[hidden]>wrote:
> Empyrically, putting the following into the LLVM online demo
> (http://llvm.org/demo/) shows that, in llvm-g++ at least, the
> (intermediate) code generated from the memcpy is optimal:
>
> #include <string.h>
> int foo(float const y) {
> int x;
> memcpy(&x,&y,sizeof(int));
> return x;
> }
>
> define i32 @_Z3foof(float %y) nounwind readnone {
> entry:
> %0 = bitcast float %y to i32 ; <i32> [#uses=1]
> ret i32 %0
> }
>
> So since the memcpy is always safe and doesn't break the aliasing
> rules, it certainly seems the best approach.
>
Thanks, Scott, I've added you to the list of contributors. I made that
change after some string_sort improvements, and found:
1) float_sort now works on Cygwin
2) No measurable performance penalty relative to my prior casting approach
3) My Cygwin testcase that was crashing runs over 100X as fast with
float_sort than std::sort, generating exactly the same (correct) results.
My speculation for the cause of the incredible speedup, based upon some
other testcases where float_sort runs in 40% less time than std::sort, is
that floating-point comparisons are extremely inefficiently implemented on
my Cygwin test system; the testcase that is 100X as fast is purely
radix-sorted by float_sort, while the ones that are only modestly faster use
some recursive calls to std::sort, which uses comparisons. Based upon these
results, it looks to me like an LSB radix sort is the best approach to use
on Cygwin for sorting sets of floating-point numbers large than some basic
minimum size (1000 elements?), as long as the additional memory overhead for
a duplicate vector is acceptable.
It could also just be my old processor (refurbished windows system purchased
from Dell over 5 years ago); modern systems might not have this problem.
Does anyone have a competing theory as to why pure radix sorting is so much
faster for floating-point numbers on an old Cygwin system?
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk