These files are a collection of very
accurate mathematical constants for C ++ (and C) programs.
These are like to be useful even by programs just computing areas of circles!
Mathematical constants (like Archimedes' pi) are exceedingly unlikely to
change, whereas estimates of physical ‘constants‘ (like Planck's h) are certain to vary.
Physical constants usually have a known uncertainty, usually greater than even
the accuracy of float representations, whereas mathematical constants can be
calculated to an arbitrarily high precision.
Therefore to avoid confusion, mathematical and physical constants will never be
stored in the same header file.
The mathematical constants collection is convenient in that it does not require a search of reference sources and is useful because it ensures that the values are as accurate as possible and as portable as possible between different systems.
A special feature of this collection is that the constants values are a little more accurate than can be represented by current and foreseeable floating-point hardware. Upper and lower smallest interval values also provided for many popular floating-point formats: these will be useful when using interval arithmetic, for example to estimate computational uncertainty. Interval limit values are exactly representable to aid portability. Small differences in constants can lead to puzzling differences between computations. The values given are slightly, but significantly, more accurate than is usually achieved by using the compiler to calculate constants.
The constants have all been calculated using software working with 300-bit precision giving about 100 decimal digits. (The precision can be chosen and is limited only by compute time). The precision selected (40 decimal digits) exceeds the accuracy of foreseeable floating-point hardware.
The objective is to achieve the full accuracy possible for all real-life computations. This has no extra cost to the user, but reduces irritating, and often confusing and hard-to-trace effects, caused by the intrinsically limited precision of floating-point calculations. At least these show as a spurious least-significant digit; at worst slightly inaccurate constants cause algorithms to fail because internal comparisons just fail.
Different applications need constants in different formats. Six types of file are provided:
1. Files which contain C Macros which #define constants and matching #undefs. These are less likely to be useful for C++ programs, but may suit C programs or embedded systems where no inefficiency is using memory can be tolerated.
2. Files that contain float const constants. Useful if only float precision is required.
3. Files that contain double const constants. Most useful for most purposes.
4. Files that contain long double const constants. Useful where long double precision is needed.
5. Files that contain constants defined as functions, so called by pi(), to help compilers optimise for speed.
6. Files that contain class constant template function constants, but called by pi as well as pi(). This form is useful where more than one precision (for example, both float and double) is needed, and to facilitate optimisation for speed by compilers.
7. Files that contain exactly representable upper and lower limits of the smallest interval containing the constant. The limits differ for different floating-point formats depending on the radix or base, and number of significand bits. The limits are selected, by default, using std::numeric_limits<floating_point_type>::digits and std::numeric_limits<floating_point_type>::radix values, if available.
More than one header file may be used, but including two files in the same module will lead to confusing results. This might happen implicitly by including a header that in turn includes another constants header file. No protection against this is provided and may result in baffling compilation errors and/or unexpected selection of constant representation. You have been warned!
All files contain exactly the same constants and identical numeric values with 40 decimal digits (except interval values which have enough decimal digits to be exactly representable).
The choice of constants is necessarily arbitrary and inevitably contentious. The constants chosen are widely used in calculation of mathematical functions and include most of those used by several authors of mathematical function libraries or texts referenced below, for example Knuth, Cody, Morris, Hart and Moshier.
To avoid including an excessive number of unwanted constants, the constants are subdivided into several groups of files, whose exact distribution and content is yet to be finally agreed.
The first group contains the most fundamental constants that cannot be easily calculated, like pi and e.
The second group contains constants that are widely used. Although compilers may calculate them, some compilers may lose some accuracy for some constants.
Finally more obscure constants, and those that can probably be calculated without loss (for example, 2. * pi) but are commonly used and so are convenient. Note that upper and lower interval limits are especially at risk of being inaccurately calculated using limits of more fundamental constants, with multiplication and division by factors of the radix the only safe exception. Explicitly calculated exactly representable values are safer.
The file define_constants.h consists of a list of C style MACROs that define a collection of mathematical constants very accurately, for example:
#define BOOST_PI 3.14159265358979323846264338327950288L /* pi */
Individual MACRO value(s) can be used directly, for example:
long double const PI = BOOST_PI;
Or with casting to other types, for example:
double const PI = (long double)BOOST_PI;
(Note that to avoid any loss of precision for long double definitions, the MACRO constants are defined as long double by the suffix L, but this means that casting is needed or desirable to avoid errors or warnings for less precise types double or float.)
double const pi = (double)BOOST_PI; /* C style cast */
Or C++ static cast down to double and/or
float precisions.
long double const pi = static_cast<double>(BOOST_PI); // C++ static_cast
long float const pi_f = static_cast<float>(BOOST_PI); // C++
static_cast
To document helpfully, it is sensible to static_cast all definitions, for example:
long double const pi_f = static_cast<long double>(BOOST_PI);
(Note one needs two different names, pi and pi_f, if different precisions must coexist (unless one uses constant class template functions described below).
Unlike most header files, the macros file
can be used in many ways in C and C++ (and indeed in other languages).
(Note that is it deliberately a .h file and not the Boost convention of a .hpp
file. This is to aid it to be used in C programs, as well as C++ programs).
Corresponding files that contain matching #undef statements called undefine*.h are also provided to reduce namespace pollution and the risk of name clashes or confusion.
2 Files (of type .hpp) containing constants as const float, const double and const long double values can be #included
or, dare one suggest, individual literal value(s) strings can be copied and pasted into programs, for example:
float const pi = 3.14159265358979323846264338327950288F;
(Note that the suffix is F to avoid the need to static_cast long double to float).
long double const pi = 3.14159265358979323846264338327950288L;
(Note that the suffix is L because the literal should be a long double).
The header file constants.h can be included in a separate translation unit:
#include “constants.h”
extern long double const pi = static_cast<double>(BOOST_PI); // C++ static cast
extern long double const e = static_cast<double>(BOOST_E); // C++ static cast
will make only pi and e visible to other translation units through the linker. The names can be chosen to avoid name clashes.
4 The simpler scheme defining constants within a translation unit
long double const pi = static_cast<double>(BOOST_PI); // C++ static cast
may cause some name clashes, so a matching file undef_constants.h is provided to undefine ALL the macros defined by define_constants.h. This undefined file should be included immediately after the last reference to any BOOST macro constant.
5 When multiple precision constants are needed in the same program, (and namespace pollution is to be avoided) class constant template functions in files like function_constants.hpp can be used:
// Select an entire set of constants with a single using directive:
using namespace boost::math::double_constants;
// Or float constants...
// using namespace boost::math::float_constants;
// So naive users (mea culpa) can just say "pi",
// and get boost::math::double_constants::pi.
cout << "pi = " << pi << endl;
double r = 2.;
cout << "area = " << pi * r * r << "\n";
// Naughty users CAN'T write pi = 3.; !!! // - they get a cryptic error message:
// C2678: binary '=' : no operator defined which takes a left-hand operand of type
// 'const struct boost::math::constant<struct boost::math::pi_tag,double>'
// Or they can explicitly access fully specified constants at any time:
cout.precision(ceil(1 + std::numeric_limits<double>::digits*log10(2.)));
// Ensure all significant digits for double are displayed, or
cout << "double pi = " << boost::math::double_constants::pi << endl;
cout.precision(std::numeric_limits<double>::digits10);
// Ensure all guaranteed digits for double are displayed.
cout << "double pi = " << boost::math::double_constants::pi << endl;
cout.precision(ceil(1 +std::numeric_limits<long double>::digits) *log10(2.)) ;
cout << "long double pi = " << boost::math::long_double_constants::pi << "\n";
using namespace boost::math;
cout.precision(ceil(1 + long_double_significand_digits * log10(2.));
cout << "float pi = " << float_constants::pi << endl;
cout.precision(ceil(1+std::numeric_limits<double>::digits10)*log10(2.));
cout << "double pi = " << double_constants::pi << endl;
// But CANNOT switch to float constants.
// using namespace boost::math::float_constants;
// cout << "pi = " << pi << endl; // error C2872: 'pi' : ambiguous symbol
// This is probably a useful safeguard.
// For types that are not specialised, one could write:
const short pi_I = (short)boost::math::float_constants::pi;
cout << "short pi = " << pi_I << endl; // 3!
// Access via the templated function call may allow better optimisation.
float pi_float = boost::math::constant<pi_tag, float >();
// If a template is not specialised, then it fails at link time.
Michael Kenniston devised this scheme (after much discussion on Boost.org) for the following reasons:
Only need to write one interface (not counting macros-for-C, which is really a separate library from the C++ stuff)
1. No macros: all names are in namespaces.
2. Average users can switch "pi" between float and double by changing one "using" directive.
3. No "double rounding" (i.e. no multiple conversions).
4. Platform implementers can use whatever evil incantations they desire for maximum accuracy.
5. Everything is implemented with inline functions to ease constant folding.
6. It is (relatively) easy to extend with new constants and representation types.
7. Purists can access all constants via templated functions.
8. No "static-initialization-order" issues, because the static variables contain no data to initialise.
9. No partial template specialization, so it works on MSVC.
10. No non-type template parameters, so it should work on Borland.
(The same basic techniques will probably work for converting read access to any statically allocated class object to an inline function call, including in particular physical constants with units/dimensions).
The disadvantage of this method is that some compilers in strict mode seem to insist on explicit definition of the empty class constant and this leads to much clutter and potential code-bloat because a constructor is instantiated for each and every of the many constants. Some compilers are able to optimise these empty and unused constructors away, but may not do so in debug mode. Other compilers (and some in non-strict mode) do not require the definition of the empty constant class. The C++ Standard ISO 14882:1998 does not make clear whether this is a language ambiguity or an excessive compiler requirement. However, the net result is that this method may not be ideally efficient or portable.
6 C++ allows other methods of avoiding name collisions, for example using named and unnamed or anonymous namespaces.
Discussion about naming convention made very clear that it is not possible to please everyone!
In general, a Standard Library C++ style is
used, favouring longer and easier to read names over ease of typing, and all
lower case and underscore _ as separator. Of course, names cannot start with a
digit, so words, like two and half, are used instead. C99 names are used where
appropriate, but names are not in all capitals as used in many C programs using
the rationale that these names are variable names and not macro or #define
names. Some examples, and the rationale for them, should make this clearer than
an attempt to write a formal specification. Of course there are always problems
with more complex constants where brackets would be used.
pi // Prefer lower case only - use
uppercase only for class_names.
ln_pi // clearer than lnpi
sqrt_2 // shorter than sqrt_two and clearer than sqrt2.
// sqrt is commonly used and shorter than square_root_2.
cbrt_2 // ugly, but is C99 name.
ten // can’t be 10 so use word instead.
third // shorter than one_div_three.
one_div_sqrt_2 // one_div rather than reciprocal – short but clear.
// _div rather than _on or _upon.
half_pi // rather than pi_div_2.
pi_sqr // pi_squared rather long.
two_pow_three_halves // two_pow_three_div_two is ambiguous.
minus_ln_ln_2 // can't start with - so use word.
Explicit typed names
pi_f // float
pi_d // double
pi_l // long double
pi_i // integer
Interval limit static const constant variable names
pi // nearest to pi (usually either upper and/or lower interval limit)
pi_l // pi interval lower limit
pi_u // pi interval upper limit
pi_f_l // pi float lower limit
pi_l_l // pi long double lower limit
pi_l_u // pi long double upper limit
The advantages of pre-computed constants are in accuracy and precision, reliability, and most important in portability.
It may be worth using single constants rather than using the preprocessor and/or compiler to calculate a derived value. For example, it may be a little better to use third_pi rather than pi /3. This is because the single constants are produced with a higher precision than the floating-point hardware and/or software that the compiler is also certain to use for its computation. (The differences are especially significant for trig and pow functions!)
The constants are entirely fixed at compile time, so there is no code to call functions, for example, log, nor the need to include header files which define those functions, for example <cmath>. This may speed compilation and reduce file accesses and dependency, and facilitate optimisation. When debugging code (perhaps with less optimisation) the constant is never calculated at run-time. Behaviour when debugging is independent of compiler type, version and compile options.
There is no evaluation of the function at program start time, nor wasted evaluation if the constant is never used, so programs will be a bit quicker.
Portability is improved because as there is no compile or run-time evaluation of constants, there can be no difference caused by different versions of functions, for example in different version of C math libraries, nor in different version of hardware floating point evaluation. Although in reality, this is rather unlikely to be a problem, the effort and time wasted proving that it is not a problem is likely to be significant.
The most potentially serious loss of accuracy occurs with transcendental functions: most seriously, the exponentiation function x raised to power y, pow(x, y) for x**y or x^y where y may not be an integer.
Although compilers should be able to calculate constant expressions, for example 1. / 3., they may do so with varying accuracy. To achieve the accuracy of the pre-computed constants, the compiler would need to use similar high-precision software rather than the natural hardware built-in types like double. Few if any compilers yet do this, and doing this would significantly slow compilation. Compilers may also only do these calculations when optimisation is selected, giving confusingly slightly different results for debug and release versions. If new compilers do calculate more accurate values, then results from recompiled code may differ slightly, but mysteriously.
If the Standard C float.h is available, the macro FLT_MANT_DIG may provide the number of digits (usually radix 2) in the mantissa (significand) that limits precision (but not the maximum or minimum values that are constrained mainly by the maximum exponent). (Macro digits and mueric_limits<FPType>::digits include any hidden or implicit significand bits as these contribute to precision, so it is not sufficient to count the bits in the floating point layout.)
/* The number of base FLT_RADIX digits in the mantissa. */
#define _DBL_RADIX 2 /* exponent radix */
#define DBL_MANT_DIG 53 /* # of bits in mantissa */
#define LDBL_MANT_DIG 64 /* # of bits in mantissa */
The same values are available using Standard C++ <limits>, for example as:
int double_radix = std::numeric_limits< double>::radix; // usually 2 for IEEE FP
int double_mantissa_digits = std::numeric_limits<double>::digits;
// Usually 53 for IEEE 64-bit double.
int double_digits10 = std::numeric_limits<long double>::digits10;
// 15 decimal digits guaranteed output correctly for 64-bit double.
(But the 128-bit representation used by the AIX OS warns that that this value, defined as 106-bit for AIX long double, may be not tell the whole story – the actual number of bits of precision can vary. There are even worse difficulties in calculating interval values for non-standard floating point formats and values given may not be fully reliable.)
To ensure no loss of accuracy when reading in, all floating point constants should be specified with at least a few decimal digits MORE than the value of ISO Standard 14882:1998 C++ Library limits for std::numeric_limits<type>:: which are only the number guaranteed to be correct on output . For example std::numeric_limits<double>::digits10 (equivalent to Standard C Library float.h DBL_DIG) is 15 for double using IEEE 754/IEC559 standard 64-bit real, 11-bit exponent, 52+1 = 53 bit significand (mantissa) using Motorola 680X0 or Intel 8087 floating point coprocessor as incorporated in all Pentium chips. So double constants must be specified with at least 17 decimal digits. (Even many more decimal digits may be required to ensure that interval values are exactly representable – see below).
William Kahan in his report on the status of IEEE754 in 1996 (reference 23) gives two significant digits values for each format:
The lower value is at least
floor ((n - 1) * log10(2) significant digits, std::numeric_limits<
>::digits10)
and the upper,
ceil(1 + n * log10(2)) significant digits,
where n is number of significand bits (std::numeric_limits< >::digits).
For float, floor (24 –1) * 0.301 = 6 &
ceil(1 + 24 * 0.301) = 9,
& for double 15 and 17, extended (long double?) 18 & 21, quadruple 33 & 36). For example, for float:
If a decimal string with at most the lower significant digits (6) is converted to float precision and the converted back to the same number of significant digits (6), then the string should match the original 6 digits. This is
int digits10 = int(floor(-1) std::numeric_limits<float>::digits * log10(2.))
that should be the same as
std::numeric_limits<float>::digits10
and macro value FLT_DIG.
If a float value is converted to a decimal string with at least the upper number of significant digits (9 for float), and then converted back to float, then the final number must match the original float number. For example, using Standard C++
#include <iostream>
#include <limits>
double const log10Two =
0.3010299956639811952137388947; // log10(2.)
pi = 3.1415926535897932384626433832795F;
int sig_digits10 =
int( ceil(1 + std::numeric_limits<float>::digits * 0.3010299956639811952137388947));
std::cout.precision(decimal_digits);
std::cout <<"float pi = " << pi << " using
" <<
sig_digits10 << " decimal digits" << endl;
(But note that MACRO calculations only use integer arithmetic and cannot use floating-point functions ceil and floor. An approximation using only integer arithmetic is
2 + MANT_DIG * 30103 /100000
that gives the expected result for the significand digits listed below).
Note the asymmetry: the upper value is the number of digits that are significant on output and are required for accurate input, the lower value is the number guaranteed correct on output. The practically useful precision is given by bits/log10(2.)
Floating point Type |
Often used for C/C++ type |
Total |
Significand |
guaranteed digits10 |
significant |
IEEE single |
float |
32 |
23 + 1 = 24 |
6 |
9 |
VAX F |
float |
32 |
23 + 1 = 24 |
6 |
9 |
IBM G/390 short |
float |
32 |
24 |
6 |
9 |
IEEE single extended |
? |
>=43 |
>=32 |
7 |
11 |
IEEE double |
double |
64 |
52 + 1 = 53 |
15 |
17 |
VAX G |
double |
64 |
52 + 1 = 53 |
15 |
17 |
IBM G/390 long |
double |
64 |
56 |
15 |
17 |
VAX D |
long double |
64 |
56 |
15 |
17 |
IEEE double extended |
long double |
80 |
>=64 |
19 |
21 |
Sparc doubleextended (x86) |
long double |
80 |
64 |
18 |
21 |
AIX quad |
long double |
128 |
106 |
31 |
33 |
NTL quad |
‘quad’ |
128 |
106 |
31 |
33 |
IBM G/390 extended |
long double |
128 |
112 |
33 |
35 |
VAX H |
long double |
128 |
112 + 1 = 113 |
34 |
36 |
IEEE quadruple |
long double |
128 |
112 + 1 = 113 |
34 |
36 |
Sparc double extended |
long double |
128 |
112 + 1 = 113 |
34 |
36 |
signed fractional |
? |
127 |
128 |
38 |
40 |
unsigned fractional |
? |
128 |
128 |
38 |
40 |
unsigned fractional |
? |
128 |
128 + 1 = 129 |
38 |
40 |
Semi-exhaustive tests using MSVC++.NET (compiler version 7.0) with C++ float and double types confirm that these numbers of significant digits can be output and re-input as expected.
For example, the following code does not fail the assert test for any normalised float value.
float a = 1.F;
float aa = 0;
std::stringstream s;
s.precision(9);
s << scientific << a;
s >> aa;
assert (a != aa);
All the constants here are believed reliable to 40 decimal digits and so should meet all these floating point formats, including C++ long double extended precision if the compiler uses IEEE 80-bit real, 15-bit exponent, 64-bit significand (or mantissa) format for which constants will need 21 decimal digits, and will even meet the requirements for 128-bit real arithmetic providing significand (mantissa) accuracy of 113-bit used by the Sun Sharc processor, and the AIX OS 106-bit, and floating-point types which can be emulated in software using, for example, C++ NTL quad_float class by Victor Shoup (derived from the doubledouble class of Keith Briggs).
The Intel Itanium IA-64 128-bit quad_precision is planned to be implemented in hardware (with 112-bit significand, 15 exponent, and sign), slightly more precise than 36 decimal digit constants required by other formats above. See www.intel.com , ADAG.pdf for IA-64 Application Developers Architecture Guide.
If all 128 bits are used as a fraction or significand (perhaps without sign and with an implicit most significant bit), 40 decimal digits would still be sufficient.
Floating-point hardware is unlikely to provide higher precision for some time to come. (For applications that require a higher precision, several software solutions are available, for example NTL, MPFUM, GMP, LIDIA - see references below).
Sadly, but perhaps inevitably, the ISO C++ standard 14882:1998 [2.13.3.1] states:
" [...] If the scaled value is in the range of representable values for its type, the result is the scaled value if representable else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner. [...] "
The phrase “implementation-defined manner” implies that the compiler is not required to round to the nearest internal representation, although all compilers used so far appear to round as expected.
The effect of this is that if the number is not exactly representable (for example, if radix = 2 then 10. is not, but 2. is) then different implementations may convert the same decimal digits to binary values that differ by one least significant bit (or digit if radix is not two, for example ten or sixteen). The constants interval upper and lower limits are these two values, differing by at most one least significant bit (but the identical if the number, for example 2, is exactly representable). To ensure that they are converted from decimal digits to the internal format, these interval limits must be exactly representable. Many decimal digits may be required to achieve this – nearly the number of bits in the significand.
So with sufficient decimal digits, compilers should convert constants from decimal digit characters to their internal floating-point format within 1 least significant bit (or digit).
Not all compilers achieve this and so some programs, for example the Cephes mathematical functions library (reference 17), have provided constants (especially polynomial coefficients) in hexadecimal format (also proposed for future version of C and C++). This ensures that the values are stored exactly, but also requires different hexadecimal values for each processor and floating-point format.
The method of providing decimal character digits is more portable but it does rely on compilers making the conversion from decimal character digits to internal floating-point binary format conforming to the C++ Standard. In both cases, the results of storage and computation depend on the floating-point representation and so none are fully portable, as explained below.
(The unusually large number of decimal digits might cause overflow of non-compliant compilers that should accept, but ignore, unlimited numbers of digits. The only workaround for constant values is to round the values by hand or to use hexadecimal constants for the intervals. This package does not attempt to cater for this erroneous behaviour.)
Intel X86 floating-point hardware, for example, includes instructions for providing floating-point constants pi and log base 2 (10). These are accurate to 64 significand bits, including guard and sticky bits, which is higher than for any constant read into a register or memory by the compiler, even from a more precise constant decimal digit string.
Internally, the Intel X86 floating-point hardware holds all numbers in extended precision real format (64 significand bits compared to external double of 53 significand bits). Extra bits are mainly to support rounding mode correctly. Calculations performed using floating point register to register operations will be performed with this 64-bit accuracy. But operations that use external memory will have lower accuracy. The compiler makes the choice between register and memory, so expecting completely portable accuracy is probably too optimistic. Using accurate literal constants probably gives the best chance of portability (but not necessarily the best possible accuracy).
Alas the complexity of floating point calculations is such that calculations cannot be absolutely guaranteed to give the same result, even if both use hardware that conforms to the IEEE 754/IEC559 floating-point specification. This is most likely to be a significant problem when iterative calculations at the limit of accuracy are involved, perhaps leading to failure to converge, or converging to a different solution. These differences are most troublesome when trying to prove the equivalence of difference systems for computer validation. Because one can predict that differences may arise, one should not pessimistically assume that different output means that either systems is faulty, nor assume that the systems are not equivalent.
One method of ensuring that the IEEE extended double precision floating-point (typically the Intel X86/Pentium) is not used is to alter the appropriate bit in the processor “control word”. This can only be done in assembler. Extended precision significand is 64-bit, double is only 53-bit, and memory values are always 53-bit (plus sign and exponent to total 64 bits). (See www.intel.com 27064004.pdf for 80C187 80-bit Math Coprocessor datasheet).
The default value of the code word bit is NOT extended for Microsoft Windows, but IS extended for gcc/Linux.
The third way to fix the problem is to 'force' all intermediate floating point results into memory. This is not an 'ideal' fix, since it is not fully equivalent to 53-bit precision (because of double rounding), but it is reported to work (although a full proof of correctness in this case is not available). See X86ControlWord.cpp for an example. C99 promises to provide a portable way of setting the precision.
NTL's quad_float code ensures use of 53-bit significands on all platforms by storing intermediate results in local variables declared to be 'volatile'. This is the solution to the problem that NTL uses if it detects the problem and can't fix it using the GNU 'asm' hack mentioned above. This solution should work on any platform that faithfully implements 'volatile' according to the ANSI C standard. See www.shoup.net for more details.
Note that MSVC++ compilers up to version 7 implement long double exactly as double, IEEE 64-bit floating-point format. (The compiler does NOT use IEEE 80-bit extended precision available within the Intel Pentium or X86 floating point co-processor processor family instruction set).
C/C++ Pre-processor should calculate any long integers using 36 bits (P J Plauger, The Standard C Library, p 78) but none of the constants in constants.h are currently of integer type. (Some are floating point versions of integer values, for example: unity, zero, two, ten…).
For most values, Victor Shoup's NTL has been used to compute (and/or cross-check) the values in this file with a significand of 300 bits (about 100 decimal digits) and values are output as 40 decimal digit strings (or more digits if exactly representable interval limit values). There is no typing or editing to introduce errors.
The numeric string values in the file
constants.h are checked either one or both of two ways:
1 Checks that the values are consistent with values calculated by the compiler
preprocessor.
2 Comparison with values in references tables: for example in D. E. Knuth,
Appendix A (scanned in from the printed book and converted to a text file), or M. Abramowitz & Irene A Stegun.
Some values are be tested with straightforward checks against compiler preprocessor calculated values, for example comparing:
if (twoPi != 2. * pi) // may have a problem?
or C macro assert (which causes program termination if it fails - a more noticeable conclusion, but one that does not allow one to continue testing other constants).
assert(twoPi == 2. * pi)
but many naively formulated tests appear to fail to the extent that the two values differ by the machine least significant significand bit, so instead it is better to test the absolute relative difference between values thus:
if (fabs(twoPi - 2.L * pi)/ twoPi >= epsilon) // Do have a problem with accuracy!
See D E Knuth and Alberto Squassabia for more guidance on comparison of reals.
(Epsilon is the smallest floating-point representation of the least-significant significand bit. See below.)
The equivalent using the C macro assert is:
assert(fabs(sqrtTwo * sqrtTwo - 2.)/2. <= eps);
It is best to choose the constant to divide by to make the test relative to the constant value, rather than recomputing the check value.
Typical epsilon for type double for Intel X86 IEEE754/IEC559 hardware is 2.22e-016.
Minimum values specified by ISO 14882:1998
section 18.2.1 are defined as equivalent to the C Standard constants:
#define FLT_EPSILON <float rvalue <= 10^(-5)>
#define DBL_EPSILON <double rvalue <= 10^(-9)>
#define LDBL_EPSILON <long double rvalue <= 10^(-9)>
The macros yield the smallest X of type float,
double or long double such that 1.0 + X != 1.0.
C++ versions of these are provided by the
ISO 14882:1998 Standard numeric limits template, for example:
std::numeric_limits<float>::epsilon();
std::numeric_limits<double>::epsilon()
std::numeric_limits<long double>::epsilon();
The constants interval upper and lower limits should meet the following shown
as an example for pi:
static const double pi_d = 3.1415926535897932384626433832; // closest.
static const double pi_d_l = 3.141592502593994140625; // lower &
static const double pi_d_u = 3.1415927410125732421875; // upper limit.
(pi_d_l == pi_d) || (pi_d_u == pi_d) // must equal upper or
lower, or both
and
float ul_pi = pi_d_l + pi_d_u; // upper + lower
(ul_pi < 2.f * pi_d_u) && (ul_pi > 2.f * pi_d_l) // only one bit different
This is shown in the header file comments, and also includes a timestamp and the version of NTL used.
Paul A Bristow, pbristow@hetp.u-net.com, Oct 2002
Floating-point – real type stored as sign, exponent and significand. Unlike integer types, most floating-point values cannot be stored exactly.
Exponent –power of 2 by which to multiply the significand (usually binary but strictly radix or base returned by std::numeric_limits< floating_point_type>::radix). Always signed, using one bit, and often offset or biased for storage convenience.
Mantissa – see significand.
radix – base for exponent, usually 2, (decimal and hexadecimal systems exist, but are not yet supported by this package).
Significand – (previously called mantissa) binary fraction where most significant bit represents 0.5, next 0.25, then 0.125 … The most significant bit (0.5) is usually not stored and is implicitly assumed to be set, so the significand always represents a fraction greater than 0.5, (but just less than 1). std::numeric_limits<floating_point_type>::digits returns this, including any implicit bit.
least significant bit – of significand – also called “unit in last place” ulp. (decimal or hexadecimal if radix or base is 10 or 16).
functions nextafter (float double and long double versions) changes the least significant bit but adding or subtracting 1 to the binary significand.
The following
contributed to discussions on the BOOST.org forum:
K Hagan
Petr Kocmid
P J Plauger
Herve Bronnimann
John Maddock
Peter Schmittteckert
Jes Maurer
Peter Dimov
Beman Dawes
David Abrahams
Ed Bray
Matt Austern
Gabriel Dos Reis
Eric Ford
Michael Kenniston
John Hickin
Sylvain Pion
Guillaume Melquiond
Author: Paul A. Bristow, hetp Chromatography Quantitation,
Copyright 2002 Paul A Bristow and hetp,21 Oct 02 17:52 1 Apr 2002