Boost logo

Boost :

From: Fernando Cacciola (fcacciola_at_[hidden])
Date: 2002-08-22 15:36:59


Hi, I've just recently read all this thread.

First of all, I agree with Maxim that a general core-level solution is
really
difficult, not only because it is difficult to define consistent semantics
for it, but also because of backward compatibility. Thus, I agree with him
that a library solution would be more applicable, at least for now.

The issue can be seen in general as this:

Given an arithmetic expression, *intermediate* rounding errors could
undermine the final result. It could be possible that the expression as a
whole would have a value which would fit in the range of the expression type
if those intermediate errors didn't occur.; but when they occur, the value
of the expression differs significantly from the exact result rounded to fit
into the type.

In an ideal architecture, it could be possible to define that an arithmetic
expression should be considered *atomic* [as indicated by some sort of
syntax] such that the only acceptable
rounding error should be at most the rounding error of the mathematically
exact result rounded to the p-digit format of the expression type (which
could
differ from the target type, in which case, an additional rounding could
occur).

Unfortunately, such a definition is not necessarily implementable, at least
not without excessive software support, so it couldn't be part of a language
standard. The reason is that in order to achieve such a expression-wide
roundoff, the program will have to access higher precision formats which
could not be available. The typical case is that I could use (2*p)-words to
internally multiply 2 p-words, but only
when p < 1/2P, where P is the maximum digits of the highest precision format
supported by the environment. If I use a P-digits type (say 'long long' or
'long double'), there's no way to
recover from intermediate errors without explicit hardware support in the
form of hidden guard-digits.

This issue of expression atomicity w.r.t to rounding errors has been
addressed in the past. There is one particular expression which arises
quite too often in numerical applications: (a*b)+c, called MAC
(Multiply-ACcumulate), which is already starting to be supported in hardware
as a "fused" instruction, that is, atomically. (hence its name Fused-MAC)

C99 uses a new #pragma FP_CONTRACT to control whether implementations are
allowed to 'contract' such arithmetic expressions if the environment support
fused operations (such as the fused-mac).

The C99 specification for 'contracted expressions' is not circumscribed to
the specific case of the MAC, therefore, it could in principle fuse (a*b/c)
also, if the implementation knows how to do it for a given target platform.
The C99 draft I have, on Annex F.6, says:
"A contracted expression should deliver the same value as its
uncontracted counterpart, else should be correctly rounded (once)."

I think that this clearly allows a contracted expression to internally use
some extra-precision format *if available* or else behave uncontracted, just
as it is now.
This semantic will not *guarantee* the accuracy of (a*b/c) but wil allow the
implementation to do better than today.
Therefore, I think that this issue could be addressed by C++0x at the core
level in a similar way to C99's contracted expressions.
Still, we'll need some field results to see how that turned out.

Going back to a library solution, I think there are two -not necessarily
compatible- possible goals: robustness or speed.

If what I want is to use hardware resources more efficiently, with increased
reliability as a likely side effect, then a reasonable library solution
would be a black-box 'muldiv()' inline overloaded function which would be
implemented platform-wise.(most compilers allows you to have inline asm
instructions either directly or by using some sort of __emit__)

If what I want is to increase the robustness of mul-div operations, even if
it comes with overhead w.r.t the ordinary evaluation, then I can use
something along these lines:

In my particular domain I'm particularly concerned about robustness a lot
more than speed, so I sometimes use the following:

template<class N>
struct algebra_kernel
{
  template<class T>
  static T muldiv ( T x, T y,T w)
    {
      return static_cast<T>( static_cast<N>(x) * static_cast<N>(y) /
static_cast<N>(w)) ;
    }

.... so on ...
} ;

typedef algebra_kernel<__int64> I64Kernel ;

int x, y, w ;
int r = I64Kernel::muldiv(x,y,w);

{This code is a only sketch to show the idea}

HTH,

Fernando Cacciola
Sierra s.r.l.
fcacciola_at_[hidden]
www.gosierra.com


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk