Boost logo

Boost :

Subject: Re: [boost] [review][constrained_value] Review of Constrained Value Library begins today
From: John Phillips (phillips_at_[hidden])
Date: 2008-12-09 17:40:17

Johan Råde wrote:
> Kim Barrett wrote:
>> At 2:23 PM +0100 12/6/08, Robert Kawulak wrote:
>>> ... maybe the problem
>>> could be somehow solved if we have a function float exact(float)
>>> that, given a
>>> floating point value (that may have greater precision because of
>>> caching in a
>>> register), returns a value that is truncated (has exactly the
>>> precision of
>>> float, not greater).
>> I think that something along the lines of the following will likely work:
>> inline double exact(double x) {
>> struct { volatile double x; } xx = { x };
>> return xx.x;
>> }
>> The idea is to force the value to make a round trip through a memory
>> location of the "correct" size. The use of volatile should prevent
>> the compiler from optimizing away the trip through memory.
> The following should work:
> inline double exact(double x)
> {
> double y;
> memcpy(&y, &x, sizeof(double));
> return y;
> }
> But I'm not sure the library should do this at all.
> It seems like forcing a policy upon the user.
> And it may be inefficient.
> The root of the problem is that Intel processors may store double values
> in 80-bit register.
> These values may later be truncated to 64-bits.
> However, you can force doubles to always be stored with 64-bit precision
> by changing the processor floating point control word.
> On Visual Studio, this can be done through the command
> _controlfp(_PC_53, MCW_PC)
> (53 = number of significand (mantissa) bits in the 64-bit format).
> --Johan Råde

   Ideas that force the floating point value out of the register and
into cache will, on average cost more than an order of magnitude in time
increase for the comparison (forcing to main memory is far, far worse).
This isn't acceptable for anyone who needs performance from the comparison.

   Reducing the bits used in the mantissa for the registers can be far
worse. Most accuracy guarantees made for the calculations assume the
existence of guard bits in the register. Without them, the calculation
loses significance faster than expected. This is not acceptable for
anyone who needs high precision.

   In general, these are difficult and subtle problems you are
approaching when you try to compare close floating point numbers. The
past is littered with examples of how not to do it, and doing it well is
very fiddly. It is possible to find all of the details for this, but it
should only be done if someone wants to put in the time and effort to
find out the best available methods. "This is probably good enough"
ideas almost never are in this setting.


Boost list run by bdawes at, gregod at, cpdaniel at, john at