|
Boost Users : |
Subject: Re: [Boost-users] [Math] BOOST_FPU_EXCEPTION_GUARD on Linux
From: David Roberts (dave_at_[hidden])
Date: 2013-11-14 11:42:25
> As I recall this was a hardware issue on early AMD 64-bit CPU's, possibly a
> GLIBC issue too. In fact if I recall it was this issue:
> http://sourceware.org/bugzilla/show_bug.cgi?id=2445 which didn't get much
> help from the glibc guys
>
> But basically if any of the FPU exception flags are set prior to a call to
> pow or exp, you risk getting a garbage answer back. I'll add a link to the
> above in the code.
>
> HTH, John.
That Bugzilla link was extremely helpful - thanks!
I've done a bit more research and I'm pretty sure this WAS a bug in glibc. It's fixed on recent versions of Linux though.
Taking your test program from http://sourceware.org/bugzilla/show_bug.cgi?id=2445 and trying it on some different Linux versions and CPUs resulted in this:
SLES 10:
kernel 2.6.16.21-0.8-smp glibc 2.4 g++ 4.1.0 CPU Xeon E5-2630
FAILS
CentOS 6.2:
kernel 2.6.32-220.el6.x86_64 glibc 2.12 g++ 4.4.6 CPU Xeon X3450
FAILS
Amazon Linux 2013.03:
kernel 3.4.57-48.42.amzn1.x86_64 glibc 2.12 g++ 4.6.3 CPU Xeon E5507
FAILS
OpenSuSE 12.1:
kernel 3.1.10-1.16-ec2 glibc 2.14.1 g++ 4.6.2 CPU Xeon E5-2650
WORKS
OpenSuSE 12.2:
kernel 3.4.6-2.10-ec2 glibc 2.15 g++ 4.7.1 CPU Xeon E5-2650
WORKS
Fedora 20 beta:
kernel 3.11.6-300.fc20.x86_64 glibc 2.18 g++ 4.8.2 CPU Xeon E5645
WORKS
This shows that the problem definitely occurs on Intel CPUs as well as AMD, and is much more likely to be caused by glibc than the compiler or the kernel.
The most likely change to the glibc code that fixed the problem is this one from May 2011:
http://repo.or.cz/w/glibc.git/commitdiff/8db736347c7aca3201f61e3f05b5f672bcdd5bd9
There was a place in powl() where 4500 decimal was used instead of 0x4500 hex. 4500 decimal is 0x1194 hex. This number was then tested against the floating point status word. The exception flags are in the low byte of this word (see http://www.nacad.ufrj.br/online/intel/Documentation/en_US/compiler_f/main_for/fpops/fortran/fpops_statw_f.htm for the values) and as a result of the typo the presence of either FE_UNDERFLOW = 0x10 or FE_DIVBYZERO = 0x4 will cause the test to pass unintentionally.
More experiments with your test program on CentOS 6.2 (glibc 2.12) seem to confirm this - if only the flags FE_INEXACT, FE_OVERFLOW and/or FE_INVALID are set they DON'T cause the weird result to occur.
Given the bug was in the x86_64-specific powl() implementation and it seems to be fixed in glibc version 2.14, I think you could get away with tightening up the criteria on your guard from:
#if ((defined(__linux__) && !defined(__UCLIBC__)) || defined(__QNX__) || defined(__IBMCPP__)) && !defined(BOOST_NO_FENV_H)
to:
#if ((defined(__linux__) && defined(__x86_64__) && !defined(__UCLIBC__) && (!defined(__GLIBC_PREREQ) || !__GLIBC_PREREQ(2,14))) || defined(__QNX__) || defined(__IBMCPP__)) && !defined(BOOST_NO_FENV_H)
That still doesn't help people like me who have to work on current 64 bit enterprise Linux releases, but for the benefit of anyone else who stumbles across this having suffered the same performance problem, here's a hack that works OK if neither your code nor any other 3rd party library you're using cares about the floating point flags. In boost/math/tools/config.hpp change:
struct fpu_guard
{
fpu_guard()
{
fegetexceptflag(&m_flags, FE_ALL_EXCEPT);
feclearexcept(FE_ALL_EXCEPT);
}
~fpu_guard()
{
fesetexceptflag(&m_flags, FE_ALL_EXCEPT);
}
private:
fexcept_t m_flags;
};
to:
struct fpu_guard
{
fpu_guard()
{
if (fetestexcept(FE_UNDERFLOW | FE_DIVBYZERO) != 0)
{
feclearexcept(FE_ALL_EXCEPT);
}
}
};
This is beneficial for performance because the expensive calls are feclearexcept() and fesetexceptflag(). The (more-or-less identical on Linux) fegetexceptflag() and fetestexcept() functions are much quicker.
Not bothering to reset the cleared flags saves one expensive call - obviously this relies on nothing downstream caring about them.
Only clearing the flags if one of the problematic ones is set often avoids the other expensive call. I've noticed that the FE_INEXACT flag is almost always set if you do any serious floating point calculation, but the other 4 flags are set much less frequently.
I'm not suggesting that you put this hack into the official Boost codeline, but just thought it might be useful to anyone in my situation who finds this post via a search engine.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net