 # Boost :

From: k.hagan_at_[hidden]
Date: 2001-05-18 04:47:50

Regarding "double rounding" (DR), Paul asks

> I have yet to be convinced of this - can someone produce some
> experimental evidence that it does happen??

It may be humble pie time. I've tried to product an example,
and contemplated the reasons behind it all, and whilst I have
an example, I think it helps you more than me in the present
discussion.

Here's one example in base 10. Assume our long double can only
hold 6 digits and double only holds 5 digits, lets multiply a
couple of 4 digit numbers.

a = 2.357;
double b = 2.357;
double c = a * b;

The exact answer is 2.357*2.357= 5.555449. (I don't think it is
significant that they are the same. My argument will apply to any
result that ends in the range ...445-...449.)

If "a" is a long double, the result is first rounded by the
hardware to be a 6 digit result, 5.55545, and then rounded again
to fit into "y", giving 5.5555. If "a" is a double, the result
will be only rounded once, to be a 5 digit result, 5.5554. This

If the initial values had been longer, (say 2.35712324) then we
get an extra rounding by storing them in "a" to begin with, and
losing decimal digits here does worse things to the accuracy than
the DR phenomenon. Alternatively, if the exact result fits in a
long double, then we can't get DR.

Therefore, as far as I can see, the error can only result in cases
where both arguments are exactly representable in either format
AND the exact result is representable in neither. (Even then, it
is rare, happening in only 5 cases per thousand, but with a binary
radix it would happen whenever the result ended in 011, which is
1 case in eight.)

DR is most commonly mentioned in connection with the Intel and
68k floating point units which (left to their own devices) perform
operations at long double precision. Therefore, when I write
double c = a * b;
the hardware computes
double c = double( (long double)a * (long double)b );
and we are at risk of DR because even if a and b were computed from
higher precision intermediates, they are now clearly represented at
double precision.

However, this argument also destroys my objection to you presenting
your constants at the higher precision, since none (?) of your
constants are exactly representable at double precision. All will
therefore benefit from the higher accuracy.

I still worry about having to write that cast, though...
double circ = double(2*pi*r); // :)