|
Boost : |
From: Greg Chicares (chicares_at_[hidden])
Date: 2001-08-30 17:42:09
"George A. Heintzelman" wrote:
>
> I would like to enable y = x with y in centimeters and x in inches,
> however. If it is not going to be regularly optimized down to a single
> multiplication, though, these kinds of conversions get stickier,
> especially with higher-dimensionality (= more multiplications) units
> involved.
Agreed: mixing centimeters and inches shouldn't be forbidden, at least
not if it can be efficient. We shouldn't impose that decision on users.
> So, I tested this code (which would look very similar to what a
> compiler would need to do for an implicit conversion between inches and
> centimeters) with the two compilers readily available to me: g++-2.95
> and Sun CC 6u1, both on a Solaris 2.8 machine:
[snip code, and most of the analysis except this:]
> When the parentheses around (x/y) were removed, g++ did *not* optimize
> out the multiplication -- correctly, I believe, as IEEE
> multiplication/division are not associative, unlike the pure
> mathematical form. So we would have to take care that all floating
> point arithmetic with constants was carried out before applying it to
> the user's variable, if at all possible, to enable this optimization.
> That is probably also necessary to avoid unnecessary overflows, anyway.
>
> In sum, g++ gets full marks and Sun CC fails the test. I would be
> interested in knowing how well other compilers manage with this little
> test, but the fact that g++ successfully optimized it is encouraging;
> it certainly isn't beyond expecting compilers to do.
I tried this on windows with with gcc-2.95.2-1 and borland C++ 5.5.1 .
Here is my analysis, code, and disassembly.
Analysis:
---gcc--- -borland-
#mul #div #mul #div
inches * ( x / y ); 1 0 1 1
inches * x / y ; 1 1 1 1
inches * (fx() / fy()); 1 0 1 0
inches * fx() / fy() ; 1 1 2 0
As you noted, putting x and y inside parentheses helps.
The borland compiler actually does better optimization for an inline
function than for a const double.
In the last case, the borland compiler saves the reciprocal of fy()
so that it can multiply instead of divide (faster on intel). I don't
think IEC 60559 allows that: cf. C99 standard [F.8.2].
Code:
#include <iostream>
inline double fx() {return 0.0254;}
inline double fy() {return 0.01 ;}
int main() {
const double x = 0.0254;
const double y = 0.01;
volatile double inches = 2.718281828459045;
double r0 = inches * ( x / y );
double r1 = inches * x / y ;
double r2 = inches * (fx() / fy());
double r3 = inches * fx() / fy() ;
std::cout << r0 << " " << r1 << " " << r2 << " " << r3 << "\n";
}
Disassembly:
g++ -O3 -ggdb heintzelman.cpp
14 double r0 = inches * ( x / y );
0x40124c <main+20>: fld QWORD PTR [%ebp-8]
0x401262 <main+42>: fld %ds:0x401230
0x401268 <main+48>: fmul %st(4),%st
15 double r1 = inches * x / y ;
0x40124f <main+23>: fld QWORD PTR [%ebp-8]
0x401252 <main+26>: fld %ds:0x401218
0x401258 <main+32>: fmul %st(1),%st
0x40125a <main+34>: fld %ds:0x401220
0x401260 <main+40>: fdivr %st(2),%st
16
17 double r2 = inches * (fx() / fy());
0x40126a <main+50>: fld QWORD PTR [%ebp-8]
0x40126d <main+53>: fmulp %st(1),%st
18 double r3 = inches * fx() / fy() ;
0x40126f <main+55>: fld QWORD PTR [%ebp-8]
0x401272 <main+58>: fmulp %st(3),%st
0x401274 <main+60>: fxch %st(2)
0x401276 <main+62>: fdivp %st(1),%st
borland C++ 5.5.1
bcc32 -O2 -6 -v -vi -ff- heintzelman.cpp
-O2 optimize for speed
-6 pentium pro
-v debug info
-vi inline functions
-ff- standard-compliant floating-point math
#heintzelman#14: double r0 = inches * ( x / y );
:00401182 DD0424 fld qword ptr[esp]
:00401185 DC742408 fdiv qword ptr[esp+08]
:00401189 DC4C2410 fmul qword ptr[esp+10]
:0040118D DD5C2418 fstp qword ptr[esp+18]
:00401191 9B wait
#heintzelman#15: double r1 = inches * x / y ;
:00401192 DD442410 fld qword ptr[esp+10]
:00401196 DC0C24 fmul qword ptr[esp]
:00401199 DC742408 fdiv qword ptr[esp+08]
:0040119D DD5C2420 fstp qword ptr[esp+20]
:004011A1 9B wait
#heintzelman#17: double r2 = inches * (fx() / fy());
:004011A2 DD442410 fld qword ptr[esp+10]
:004011A6 DC0D50124000 fmul qword ptr[HEINTZELMAN.00401
:004011AC DD5C2428 fstp qword ptr[esp+28]
:004011B0 9B wait
#heintzelman#18: double r3 = inches * fx() / fy() ;
:004011B1 DD442410 fld qword ptr[esp+10]
:004011B5 DC0D58124000 fmul qword ptr[HEINTZELMAN.00401
:004011BB DB2D60124000 fld tbyte ptr[HEINTZELMAN.00401
:004011C1 DEC9 fmulp st(1),st
:004011C3 DD5C2430 fstp qword ptr[esp+30]
:004011C7 9B wait
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk