Boost-Commit :

Date view	Thread view	Subject view	Author view

Subject: [Boost-commit] svn:boost r83561 - sandbox/precision/libs/precision/doc
From: e_float_at_[hidden]
Date: 2013-03-25 14:31:55

Next message: pbristow_at_[hidden]: "[Boost-commit] svn:boost r83562 - in sandbox/precision/libs/precision/doc: . html html/precision"
Previous message: pbristow_at_[hidden]: "[Boost-commit] svn:boost r83560 - trunk/libs/math/example"

Author: christopher_kormanyos
Date: 2013-03-25 14:31:54 EDT (Mon, 25 Mar 2013)
New Revision: 83561
URL: http://svn.boost.org/trac/boost/changeset/83561

Log:
Big rework based on first rounds of comments from John and Paul.
Text files modified:
sandbox/precision/libs/precision/doc/precision.qbk | 627 ++++++++++++++++-----------------------
1 files changed, 261 insertions(+), 366 deletions(-)

Modified: sandbox/precision/libs/precision/doc/precision.qbk
==============================================================================
--- sandbox/precision/libs/precision/doc/precision.qbk (original)
+++ sandbox/precision/libs/precision/doc/precision.qbk 2013-03-25 14:31:54 EDT (Mon, 25 Mar 2013)
@@ -57,46 +57,50 @@

[section:abstract Abstract]

-It is proposed to add to the C++ standard several optional
-typedefs for floating-point types having specified width.
-In particular, the optional types include
-`float16_t`, `float32_t`, `float64_t`, `float128_t`, and their
-corresponding fast and least types. The optional floating-point
-types are to conform with the corresponding types
+It is proposed to add to the C++ standard
+optional floating-point `typedef`s having specified width.
+The optional `typedef`s include
+`float16_t`, `float32_t`, `float64_t`, `float128_t`,
+their corresponding fast and least types,
+and the corresponding maximum-width type.
+These are to conform with the corresponding specifications of
`binary16`, `binary32`, `binary64`, and `binary128`
-described in __IEEE_floating_point.
+in __IEEE_floating_point.

-The optional floating-point types having specified width
+The optional floating-point `typedef`s having specified width
are to be contained in a new standard library header `<cstdfloat>`.
-Any of the optional floating-point types having specified width
-included in the implementation should have full support for the
-functions in `<cmath>` and seamlessly interoperate with `<complex>`.
-Any of the optional floating-point types having specified width
-included in the implementation must template specializations
-of `std::numeric_limits` in `<limits>`.
-The proposed new floating-point types having specified width
-will be defined in the global and `std` namespaces.
-
-It is also proposed to provide additional suffix(es) to specify
-constants to suit precision lower than that of `float` and
-precision higher than that of `long double`.
-
-Floating-point types having specified width are expected to significantly
-improve clarity of code and portability of floating-point calculations.
-Analogous improvements for integer calculations were recently achieved
-via standardization of integer types having specified width
-such as `int8_t`, `int16_t`, `int32_t`, and `int64_t`.
+They will be defined in the `std` namespace.
+
+It is not proposed to make any changes to `<cmath>`, special functions,
+`<limits>`, or `<complex>`.
+Any of the optional floating-point `typedef`s having specified width
+that are `typedef`ed from the built-in types `float`, `double`, and `long double`
+should automatically be supported by
+the implementation's existing `<cmath>`, special functions,
+`<limits>`, and `<complex>`.
+Support for other `typedef`s is implementation-defined.
+
+New C-style macros are proposed to facilitate initialization
+of the optional floating-point `typedef`s having specified width
+from a floating-point literal constant.

The main objectives of this proposal are to:

-* Extend the range of floating-point precision.
-* Reduce errors in precision.
-* Improve clarity of coding.
* Improve portability, reliability and safety.
+* Reduce the risk of error in precision.
+* Improve clarity of coding.
+* Optionally extend the range of floating-point precision.

[endsect] [/section:abstract Abstract]

-[section:background Background]
+[section:introduction Introduction]
+
+Since the inceptions of C and C++, the built-in types
+`float`, `double`, and `long double` have provided a strong basis
+for floating-point calculations.
+Optional compiler conformance with __IEEE_floating_point has generally led
+to a relatively reliable and portable environment for floating-point
+calculations in the programming community.

Support for mathematical facilities and specialized number types
in C++ is progressing rapidly. Currently, C++11 supports floating-point
@@ -115,119 +119,65 @@
according to the ISO/IEC 80000-2:2009 standard
Document number: N3494 Version: 1.0 Date: 2012-12-19]

-The __Boost_Math library was accepted into __Boost several years ago.
-It implements many of the functions in both documents mentioned
-above and has become quite widely used.
-
-There is also progress in C++ in the area of multiprecision,
-including support of user-defined multiprecision floating-point numbers.
-In particular, the acceptance and release of __Boost_Multiprecision
-provides much higher precision than built-in `long double` with
-its __cpp_dec_float data type. __Boost_Multiprecision has a flexible
-front-end that employs a variety of backends to implement multiprecision
-floating-point types including the well-established __GMP and __MPFR libraries
-as well as a full open-license backend that originates
-from the __e_float library by Christopher Kormanyos and John Maddock.
-
-Since __Boost_Multiprecision and __Boost_Math work seamlessly,
-a `float_type typedef` can be used to switch from a built-in type
-to a multiprecision type with tens or even hundreds of decimal digits.
-This allows all the special functions and distributions in
-__Boost_Math to be used at any chosen precision.
-
-Other users and domains are finding the need and utility of
-[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3407.html decimal] and
-[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3352.html binary fixed-point].
-
-Of course, moving away from hardware supported types to software using
-C++ templates carries a small price at compile-time, and potentially a
-much bigger price at runtime.
-Nonetheless, the new numerical types have wide ranges of application
-required in numerous programming domains.
-
-All these development have made C++ much more attractive to the
-scientific and engineering community, especially those needing
-mathematical functions and higher (or lower) precision for some
-of their calculations. Previously these domains were predominantly
-covered by computer algebra systems.
-
-[endsect] [/section:background Background]
-
-[section:introduction Introduction]
-
-Since the inceptions of C and C++, the built-in types `float`, `double`, and `long double`
-have provided a strong basis for floating-point calculations.
-Optional compiler conformance with __IEEE_floating_point has generally led
-to a relatively reliable and portable environment for floating-point
-calculations in the programming community.
-
It is, however, emphasized that floating-point adherence
to __IEEE_floating_point is not mandated by the current C++ language standard.
-Nor does the standard specify the widths, precisions or lyaout of its built-in types
+Nor does the standard specify the width, precision or lyaout of its built-in types
`float`, `double`, and `long double`. This can lead to portability problems,
introduce poor efficiency on cost-sensitive microcontroller architectures,
and reduce reliability and safety.

-This situation reveals a need for a standard way to specify precision.
-It is also desirable to extend the precision of existing types to
-both lower and higher precisions. The extension to lower precision is expected
-to simplify and improve efficiency of floating-point implementations
-on cost-sensitive architectures such as small microcontrollers.
-The extension to higher precision is useful for large-scale high-performance
-numerical calculations and should ease the transition to multiprecision
-by providing built-in types with progressing precision of finer granularity.
-
-All of these improvements should improve portability, reliability, and safety
-of floating-point calculations in C++ by ensuring that the actual precision
-of a floating-point type can be exactly determined both at compile-time
-as well as during the run of a calculation.
-Strong interest in floating-point `typedefs` having specified width
-has, for example, recently been expressed on
-the [@http://lists.boost.org/Archives/boost/2013/03/201786.php Boost list discussion of precise floating-point types].
-
-Recent specification of integer typedefs having specified width
-in C99, C11, C++11, and [@http://open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3376.pdf C++ draft specification]
-has drastically improved integer algorithm portability and range.
-
-One example of how integer typedefs having specified width
-have proven to be essential is described by Robert Ramey
-[@http://lists.boost.org/Archives/boost/2002/11/40432.php
-Usefulness of fixed integer sizes in portability (for Boost serialization library).]
-
-The motivations to provide floating-point `typedef`s having specified width
-are analogous to those that led to the introduction of integers
-having specified width such as `int8_t`, `int16_t`, `int32_t`, and `int64_t`.
-The specification of floating-point `typedef`s having specified width
-and adherence to __IEEE_floating_point can potentially
-improve the C++ language significantly, especially in the
-scientific and engineering communities
-where other languages have found benefit from types that conform
-exactly to the __IEEE_floating_point.
-
-(Notes on jargon: Section 22.3 in the book "The C++ Standard Library Extensions",
-P. Becker, Addison Wesley 2007, ISBN 0-321-41299-0 is called "Fixed-Size Integer Types".
-Use of the descriptor ['fixed] has lead to some confusion.
-So the descriptor ['specific] in conjunction with width is here used to
-match the wording of C99 and C11 in the sections and subsections describing
-`stdint.h`.)
+This situation reveals a need for a standard way to specify
+floating-point precision in C++.
+
+It may also be desirable to extend floating-point precision to
+both lower and higher precisions. This can be done by including
+implementation-specific optional floating-point `typedef`s having specified width
+that are not derived from `float`, `double`, and `long double`.
+
+Providing optional floating-point `typedef`s having specified width
+is expected to significantly improve portability, reliability, and safety
+of floating-point calculations in C++.
+[footnote
+[Analogous improvements for integer calculations were recently achieved
+via standardization of integer types having specified width
+such as `int8_t`, `int16_t`, `int32_t`, and `int64_t`.]
+]

[endsect] [/section:introduction Introduction]

[section:thetypedefs The proposed typedefs and potential extensions]

-The core of this proposal is based on the `typedef`s `float16_t`, `float32_t`,
-`float64_t`, `float128_t`, and their corresponding least and fast types.
-These floating-point `typedef`s have specified widths and they are
-to conform with the corresponding types
+The core of this proposal is based on the
+optional floating-point `typedef`s `float16_t`, `float32_t`,
+`float64_t`, `float128_t`,
+their corresponding least and fast types,
+and the corresponding maximum-width type.
+
+For example,
+
+ // Sample partial synopsis of <cstdfloat>
+
+ namespace std
+ {
+ typedef float float32_t;
+ typedef double float64_t;
+ typedef long double float128_t;
+ typedef float128_t floatmax_t;
+
+ // ... and the corresponding least and fast types.
+ }
+
+These proposed optional floating-point `typedef`s are to conform with
+the corresponding specifications of
`binary16`, `binary32`, `binary64`, and `binary128`
-specified in __IEEE_floating_point.
+in __IEEE_floating_point.
In particular, `float16_t`, `float32_t`, `float64_t`, and `float128_t`
correspond to floating-point types with 11, 24, 53, and 113 binary significand digits,
respectively. These are defined in __IEEE_floating_point, and there are more detailed descrptions
of each type at __IEEE_Half, __IEEE_Single, __IEEE_Double, __IEEE_Quad, and __IEEE_Extended.

-One could envision two ways to name the proposed floating-point
-types having specified width:
+One could envision two ways to name the proposed
+optional floating-point `typedef`s having specified width:

* `float11_t, float24_t, float53_t, float113_t, ...`
* `float16_t, float32_t, float64_t, float128_t, ...`
@@ -238,38 +188,46 @@
is contained within the name of the data type.

On the other hand, the second set with the size of the ['whole type] contained within
-the name may be more intuitive to users. Here, we prefer this naming scheme.
+the name may be more intuitive to users. Here, we prefer the latter naming scheme.

No matter what naming scheme is used, the exact layout and number of significand
and exponent bits can be confirmed as IEEE754 by checking
`std::numeric_limits<type>::is_iec559 == true`, and the byte-order.

-We will now consider several examples showing how various implementations
-might include floating-point `typedef`s having specified width.
+We will now consider several examples showing how
+various implementations might introduce some of the
+optional floating-point `typedef`s having specified width
+into the `std` namespace.
+
+An implementation has `float` and `double` corresponding to
+IEEE754 `binary32`, `binary64`, respectively. This implementation
+could introduce `float32_t`, `float64_t`, and `floatmax_t`
+into the `std` namespace as shown below.

-An implementation that has `float` and `double` corresponding to
-IEEE 754 `binary32`, `binary64`, respectively, could introduce
-`float32_t` and `float64_t` into the `std` namespace as shown below.
+ // In <cstdfloat>

   namespace std
   {
- typedef float float32_t;
- typedef double float64_t;
+ typedef float float32_t;
+ typedef double float64_t;
+ typedef float64_t floatmax_t;
   }

There may be a need for octuple-precision float, in other words
-`float256_t` with about 240 binary significand digits of precision.
+an extension to `float256_t` with about 240 binary significand digits of precision.
In addition, a `float512_t` type with even more precision
may be considered as a option. Beyond these, there may be
potential extension to multiprecision types, even __arbitrary_precision, in the future.

Consider an implementation for a supercomputer. This platform might have
-`float`, `double`, and `long double` corresponding to IEEE 754
+`float`, `double`, and `long double` corresponding to IEEE754
`binary32`, `binary64`, and `binary128`, respectively. In addition, this
platform might have a user-defined type with octuple-precision.
The implementation for this supercomputer could introduce
-floating-point types having specified width into the `std` namespace
-as shown below.
+its optional floating-point `typedef`s having specified width
+into the `std` namespace as shown below.
+
+ // In <cstdfloat>

   namespace std
   {
@@ -277,140 +235,101 @@
     typedef double float64_t;
     typedef long double float128_t;
     typedef my_octuple_precision_type float256_t;
+ typedef float256_t floatmax_t;
   }

A cost-sensitive 8-bit microcontroller platform without an FPU
-does not have sufficient resources to support eight-byte
-`binary64` in a feasible fashion.
-An implementation on this platform can, however, support
+does not have sufficient resources to support the eight-byte, 64-bit
+`binary64` type in a feasible fashion.
+An implementation for this platform can, however, support
half-precision `float16_t` and single-precision `float32_t`.
-The implementation for this 8-bit microcontroller could introduce
-floating-point types having specified width into the `std` namespace
-as shown below.
+This implementation could introduce
+its optional floating-point `typedef`s having specified width
+into the `std` namespace as shown below.
+
+ // In <cstdfloat>

   namespace std
   {
     typedef my_half_precision_type float16_t;
     typedef float float32_t;
+ typedef float32_t floatmax_t;
   }

The popular [@http://gcc.gnu.org/wiki/x87note Intel X8087 chipset]
architecture supports a 10-byte floating-point format.
-So it may be useful to provide optional support for `float80_t`.
+It may be useful to extend the optional support to `float80_t`.
Several implementations using __IEEE_80_bit already exist in practice.

-An implementation that supports single-precision `float`,
-double-precision `double`, and 10-byte `long double`
-could introduce `float32_t`, `float64_t`, and `float80_t`
+Consider an implementation that supports single-precision `float`,
+double-precision `double`, and 10-byte `long double`.
+This implementation could introduce its optional `typedef`s
+`float32_t`, `float64_t`, `float80_t`, and `floatmax_t`
into the `std` namespace as shown below.

+ // In <cstdfloat>
+
   namespace std
   {
     typedef float float32_t;
     typedef double float64_t;
     typedef long double float80_t;
+ typedef float80_t floatmax_t;
   }

-We will now examine how to use floating-point literal constants in combination
-with floating-point types having specified width.
-
-At present, the only way to provide a floating-point literal constant
-value with precision exceeding the precision of `long double`
-is to use a character string in association with type conversion for
-a user-defined extended-precision type.
-For example, construction from a string as well as the `from_string`
-method are used for this purpose in
-__Boost_Math, __Boost_Multiprecision and __libquadmath.
-
-The sample below, for instance, uses the `cpp_dec_float_50` type
-from __Boost_Multiprecision to initialize the Euler-gamma
-constant with 50 decimal digits of precision.
-
- #include <boost/multiprecision/cpp_dec_float>
-
- typedef boost::multiprecision::cpp_dec_float_50 mp_type;
-
- const mp_type euler("0.577215664901532860606512090082402431042159335939924");
-
-Construction from string is inappropriate for the proposed
-floating-point types having specified width.
-These should be copy assignable and copy constructable from
-floating-point literal constants.
-This requires slight changes to the core language including the addition
-of new floating-point literal constant suffixes. For instance, the sample below
-uses a potential `Q` suffix is used to initialize the Euler-gamma constant
-stored in a `float128_t`.
-
- #include <cstdfloat>
-
- constexpr std::float128_t euler = 0.57721566490153286060651209008240243104216Q;
-
-Suffixes will be described in greater detail below.
+[endsect] [/section:thetypedefs The proposed types and potential extensions]

-It would also be useful to have a method of querying the size of types,
-similar to that provided by
-[@http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html GCC 3.7.2 Common Predefined Macros],
-for example, `__SIZEOF_LONG_DOUBLE__`.
-But similar macros are not defined for `__float128` nor for `__float80`.
+[section:literals Handling floating-point literals]

-[endsect] [/section:thetypedefs The proposed types and potential extensions]
+We will now examine how to use floating-point literal constants in combination
+with the optional floating-point `typedef`s having specified width.
+This will be done in a manner analagous to the mechanism
+specified for integer types having specified width,
+in other words using C-style macros.

-[section:suffixes How to define floating-point literal suffixes?]
+The header `<cstdfloat>` should contain all necessary
+C-style function macros in the form shown below.

-The standard specifies that the type of a floating-point literal is double unless
-explicitly specified by a suffix. The standard continues by specifying that
-the suffixes `f` and `F` specify `float`,
-and the suffixes `l` and `L` specify `long double`.
-
-Recent discussion on extended precision floating-point types in C++ has also
-raised the issue of how to specify constant values with a precision greater
-than `long double`, now signified by the suffix `L` or `l`.
-One possible way is to add `Q` or `q` suffixes to signify that
-floating-point literal has quadruple precision.
+ FLOAT{16 32 64 128 256 MAX}_C

-Code using the `Q` suffix scheme is shown in the sample below.
+The code below, for example, initializes a constant `float128_t`
+value using one of these macros.

   #include <cstdfloat>

- constexpr std::float128_t pi = 3.1415926535897932384626433832795028841972Q;
+ constexpr std::float128_t euler = FLOAT128_C(0.57721566490153286060651209008240243104216);

-For half-precision floating-point literals, the suffix `H` or `h` could be used.
-One potential suffix for octuple-precision floating-point literals is `O` or `o`.
-
-For example,
+The following code initializes a constant `float16_t`
+value using another one of these macros.

   #include <cstdfloat>

- constexpr std::float16_t euler = 0.577216H;
+ constexpr std::float16_t euler = FLOAT16_C(0.577216);

-Higher precisions also require construction from floating-point literals.
-As the list of available suffixes dwindles, however, available suffixes might run out
-and the myriad of suffixes may become confusing. Floating-point literals
-for precisions higher than quadruple precision, then, might be better
-served with construction from string literals.
-
-An alternative suffix scheme could use hybrid suffixes composed of, say,
-the letter `F` or `f` which stands for floating-point, to which the specified width
-of the type is appended, for example `F16`, `F32`, `F64`, `F128`, etc.
-Code using the `F128` suffix scheme is shown in the sample below.
+In addition, the header `<cstdfloat>` should contain all
+necessary macros of the form:

- #include <cstdfloat>
+ FLOAT_[FAST LEAST]{16 32 64 128 256}_MIN
+ FLOAT_[FAST LEAST]{16 32 64 128 256}_MAX
+ FLOATMAX_MIN
+ FLOATMAX_MAX

- constexpr std::float128_t pi = 3.1415926535897932384626433832795028841972F128;
+These macros can be used to query the ranges of
+the optional floating-point `typedef`s having specified width
+at compile-time. For example,

- constexpr std::float16_t euler = 0.577216F16;
+ #include <limits>
+ #include <cstdfloat>

-This suffix scheme is unequivocal and it can be easily extended to
-unlimited precision. On the other hand, it may be difficult for programmers
-to separate the character part of the suffix from its numerical part when
-analyzing source code. For example, it is particularly difficult to resolve
-the `F` suffix in the initialization of `euler` above.
+ static_assert(FLOATMAX_MAX > (std::numeric_limits<float>::max)(),
+ "The floating-point range is too small");

-[endsect] [/section:suffixes How to define floating-point literal suffixes?]
+[endsect] [/section:literals Handling floating-point literals]

[section:thestandard Place in the standard]

-The proper place for floating-point `typedef`s having specified width
+The proper place for defining the optional
+floating-point `typedef`s having specified width
should be oriented along the lines of the current standard.
Consider the existing specification of integer `typedef`s having
specified precision in C++11. A partial synopsis is shown below.
@@ -428,172 +347,153 @@

   // ... and the corresponding least and fast types.

-It is not immediately obvious where the `typedef`s for floating-point types
-having specified width should reside. One potential place is `<cstdint>`.
+It is not immediately obvious where the
+optional floating-point `typedef`s having specified width
+should reside. One potential place is `<cstdint>`.
The `int`, however, implies integer types. Here, we prefer the
proposed new header `<cstdfloat>`.

We propose to add a new header `<cstdfloat>` to the standard library.
-The header `<cstdfloat>` should contain all floating-point
-`typedef`s having specified width in the implementation.
+The header `<cstdfloat>` should contain all
+optional floating-point `typedef`s having specified width
+included in the implementation and the corresponding C-style
+macros shown above.
+
Section 18.4 could be extended as shown below.

18.4? Integer and Floating-Point Types Having Specified Width
18.4.1 Header <cstdint> synopsis [cstdint.syn]
18.4.2? Header <cstdfloat> synopsis [cstdfloat.syn]

- namespace std {
- typedef signed floating-point type float16_t; // optional.
- typedef signed floating-point type float32_t; // optional.
- typedef signed floating-point type float64_t; // optional.
- typedef signed floating-point type float80_t; // optional.
- typedef signed floating-point type float128_t; // optional.
- typedef signed floating-point type float256_t; // optional.
- typedef signed floating-point type floatmax_t; // optional.
-
- typedef signed floating-point type float_least16_t; // optional.
- typedef signed floating-point type float_least32_t; // optional.
- typedef signed floating-point type float_least64_t; // optional.
- typedef signed floating-point type float_least80_t; // optional.
- typedef signed floating-point type float_least128_t; // optional.
- typedef signed floating-point type float_least256_t; // optional.
-
- typedef signed floating-point type float_fast16_t; // optional.
- typedef signed floating-point type float_fast32_t; // optional.
- typedef signed floating-point type float_fast64_t; // optional.
- typedef signed floating-point type float_fast80_t; // optional.
- typedef signed floating-point type float_fast128_t; // optional.
- typedef signed floating-point type float_fast256_t; // optional.
- } // namespace std
+ namespace std
+ {
+ typedef floating-point type float16_t; // optional.
+ typedef floating-point type float32_t; // optional.
+ typedef floating-point type float64_t; // optional.
+ typedef floating-point type float80_t; // optional.
+ typedef floating-point type float128_t; // optional.
+ typedef floating-point type float256_t; // optional.
+ typedef floating-point type floatmax_t; // optional.
+
+ typedef floating-point type float_least16_t; // optional.
+ typedef floating-point type float_least32_t; // optional.
+ typedef floating-point type float_least64_t; // optional.
+ typedef floating-point type float_least80_t; // optional.
+ typedef floating-point type float_least128_t; // optional.
+ typedef floating-point type float_least256_t; // optional.
+
+ typedef floating-point type float_fast16_t; // optional.
+ typedef floating-point type float_fast32_t; // optional.
+ typedef floating-point type float_fast64_t; // optional.
+ typedef floating-point type float_fast80_t; // optional.
+ typedef floating-point type float_fast128_t; // optional.
+ typedef floating-point type float_fast256_t; // optional.
+ }

[endsect] [/section:thestandard Place in the standard]

-[section:limitsinterop Interaction with <limits>]
-
-It is not proposed to make any change to `std::numeric_limits`.
-It is, however, mandatory to provide `std::numeric_limits` specializations
-for all floating-point types having specified width included
-in the implementation.
-
-This will ensure that programs can use the established
-`std::numeric_limits<>::is_iec559` member to determine
-if a floating-point type conforms with __IEEE_floating_point.
+[section:cmathinterop Interoperation with <cmath> and special functions]

-[endsect] [/section:limitsinterop Interaction with <limits>]
+It is not proposed to make any changes to `<cmath>` or special functions.

-[section:cmathinterop Interoperation with <cmath>]
+Any of the optional floating-point `typedef`s having specified width
+that are `typedef`ed from the built-in types `float`, `double`, and `long double`
+should automatically be supported by
+the implementation's existing `<cmath>` and special functions.

-Experience with __Boost_Math and __Boost_Multiprecision has shown that the normal set
-of elementary and transcendental functions (and possibly additional higher transcendental functions)
-is also essential to make the type useful in real-life computational regimes.
-Therefore, the implementation must provide support for the mathematical
-functions in the `std` namespace for each of the floating-point types having specified width
-included in the implementation.
+Implementation-specific optional floating-point `typedef`s having specified width
+that are not derived from `float`, `double`, and `long double` can optionally
+be supported by `<cmath>` and special functions.
+This is considered an implementation detail.

-<cmath> contains
+[note Support of elementary functions and possibly some special functions,
+even where only optional, can be quite useful for real-life computational regimes.]

-Trigonometric functions:
+[endsect] [/section:cmathinterop Interoperation with <cmath> and special functions]

- cos
- sin
- tan
- acos
- asin
- atan
- atan2
+[section:limitsinterop Interoperation with <limits>]

-Hyperbolic functions:
+It is not proposed to make any changes to `<limits>`.

- cosh
- sinh
- tanh
+Any of the optional floating-point `typedef`s having specified width
+that are `typedef`ed from the built-in types `float`, `double`, and `long double`
+should automatically be supported by
+the implementation's existing `<limits>`.

-Exponential and logarithmic functions:
+Implementation-specific optional floating-point `typedef`s having specified width
+that are not derived from `float`, `double`, and `long double` can optionally
+be supported by `<limits>`.
+This is considered an implementation detail.

- exp
- frexp
+[note Support for `<limits>`, even where optional, can be quite
+useful. This allows programs query the floating-point limits
+and use, among other things, `std::numeric_limits<>::is_iec559`
+to determine if a floating-point type conforms with __IEEE_floating_point.]

- ldexp
- log
- log10
- modf
-
-Power functions
-
- pow
- sqrt
-
-Rounding, absolute value and remainder functions:
-
- ceil
- fabs
- floor
- fmod
-
-[endsect] [/section:cmathinterop Interoperation with <cmath>]
+[endsect] [/section:limitsinterop Interoperation with <limits>]

[section:complexinterop Interoperation with <complex>]

-TBD by Chris: Describe interoperation with <complex>.
-
-[endsect] [/section:complexinterop Interoperation with <complex>]
-
-[section:microfpu Improved efficiency and robustness for microcontrollers]
-
-TBD by Chris: Describe cost-sensitive floating-point regime.
-TBD by Chris - add reference to your book!
+It is not proposed to make any changes to `<complex>`.

-TBD by Chris: Describe recent confidential meetings with tier-one silicon suppliers and the relevant problems discussed therein.
+Any of the optional floating-point `typedef`s having specified width
+that are `typedef`ed from the built-in types `float`, `double`, and `long double`
+should automatically be supported by
+the implementation's existing `<complex>`.
+
+Implementation-specific optional floating-point `typedef`s having specified width
+that are not derived from `float`, `double`, and `long double` can optionally
+be supported by `<complex>`.
+This is considered an implementation detail.

-TBD: Cite these as personal communications.
-
-TBD by Chris: Explain how standards adherence and specified width can help to solve these problems by improving reliability and safety.
-
-TBD ba Chris: Add an example and remarks on functional safety and any relevant citations from to ISO/IEC 26262.
-
-[endsect] [/section:microfpu Improved efficiency and robustness for microcontrollers]
+[endsect] [/section:complexinterop Interoperation with <complex>]

-[section:context The context within existing implementations]
+[section:context The context among existing implementations]

Many existing implementations already support `float`, `double`, and `long double`.
In addition, some of these either are or strive to be compliant with __IEEE_floating_point.
In these cases, it will be straightforward to support (at least) a subset
-of the proposed floating-point `typedef`s having specified width using type definitions.
-This was discussed above.
+of the proposed optional floating-point `typedef`s having specified width
+by adding any desired optional type definitions and the corresponding
+macro definitions.

Some implementations for cost-sensitive microcontroller platforms support
`float`, `double`, and `long double`, and some of these are compliant with __IEEE_floating_point.
-It is not uncommon on microcontroller platforms to treat `double` exactly as `float`,
-and to even treat `long double` exactly as `double`.
+Some of these implementations treat `double` exactly as `float`,
+and even treat `long double` exactly as `double`.
This is permitted by the standard which does not prescribe the precision
for any floating-point (or integer) types, leaving them to be implementation-defined.
-On these platforms, the existing floating-point types could be type-defined to `float32_t`.
-Optional support for `float16_t` could provide a very useful high-performance
-floating-point type with half-precision.
+On these platforms, the existing floating-point types could optionally
+be type-defined to `float32_t`.
+Optional support for an extension to `float16_t` could provide a very useful
+and efficient floating-point type with half-precision, but reduced range.
+
+Some implementations for cost-sensitive microcontroller platforms
+also support a 24-bit floating-point type. Here, an extension
+of the optional floating-point `typedef`s with specified width
+could include `float24_t`. This would be equivalent to
+three-quarter precision floating-point, which is not
+specified in __IEEE_floating_point.

-On powerful desktop computers and workstations, `long double` has been treated
-in a variety of ways, and this has given rise to numerous portability problems.
-For example, suppose we wish to achieve a precision higher than the most common
-IEEE 64-bit floating-point type supported by the X86 chipsets normally used for double
-([@http://en.wikipedia.org/wiki/Double_precision double precision] providing a precision of between 15 and 17 decimal digits).
-
-The options for [@http://en.wikipedia.org/wiki/Long_double long double] are many.
-At least one popular compiler treats `long double` exactly as `double`.
-
-However the [@http://gcc.gnu.org/wiki/x87note Intel X8087 chipset] can do
-calculations using internal 80-bit registers, increasing the significand from 53 to 63 bits, and gaining about 3 decimal digits precision from 18 and 21.
-If we wish to ensure that we use all 80 bits available from Intel 8087 chips to calculate
-[@http://en.wikipedia.org/wiki/Extended_precision Extended precision]
-we would use a `typedef float80_t`, as shown above.
-If the compiler could not generate code this type directly,
-then it would substitute software emulation, perhaps using a
-Boost.Multiprecision type such as `cpp_dec_float_21` (or in future, `cpp_bin_float_21`).
+The [@http://gcc.gnu.org/wiki/x87note Intel X8087 chipset] is capable of performing
+calculations with internal 80-bit registers. This increases the width of the
+significand from 53 to 63 bits, thereby gaining about 3 decimal digits precision
+and extending it from 18 and 21. If an implementation has a type that uses
+all 80 bits from this chipset to calculate
+[@http://en.wikipedia.org/wiki/Extended_precision Extended precision],
+it could could use an optional `typedef` of this type to `float80_t`.

Some hardware, for example [@http://en.wikipedia.org/wiki/SPARC Sparc],
-provides a full 128-bit quadruple precision floating-point chip.
-
-As of gcc 4.3, a quadruple precision is also supported on x86,
-but as the nonstandard type `__float128` rather than as a `long double`.
+provides a full 128-bit quadruple-precision floating-point chip.
+An implementation for this kind of architecture might already have
+a built-in type corresponding to `binary128`, and this type could be
+optionally `typedef`ed to `float128_t`.
+
+GCC has recently developed quadruple-precision support on a variety of
+platforms using __libquadmath. However, the implementation-specific
+type `__float128` is used rather than `long double`.
+These implementations could optionally `typedef` `__float128` to `float128_t`
+in addition to any other optional `typedef`s.

[@http://www.opensource.apple.com/source/gcc/gcc-5646/gcc/config/rs6000/darwin-ldouble.c Darwin]
`long double` uses a double-double format developed first by
@@ -601,21 +501,16 @@
This gives about 106-bits of precision (about 33 decimal digits) but has rather odd behavior
at the extremes making implementation of `std::numeric_limits<>::epsilon()` problematic.

-Clang uses a similar technique:
+[note On powerful PCs and workstations, `long double` has been treated
+in a variety of ways, and this has given rise to numerous portability problems.
+It may be useful if future implementations for powerful PCs and workstations
+strive to make `long double` equivalent to quadruple-precision (__IEEE_Quad)
+and to `typedef` this to `float128_t`. Some architectures have hardware support
+for this. Those lacking direct hardware support can use software emulation.]
+
+TBD by Chris: Question: Table of recommended precisions and float layouts?

- #ifdef __clang__
- typedef struct { long double x, y; } __float128;
- #endif
-
-as described in
-[@http://stackoverflow.com/questions/13525774/clang-and-float128-bug-error Clang float128].
-
-In the future, it may be useful on powerful desktop computers and workstations to strive
-to make `long double` equivalent to quadruple-precision (__IEEE_Quad) and to
-type define this to be `float128_t`. Some architectures have hardware support for this.
-Those lacking hardware support for `float128_t` can use software emulation to generate it.
-This could also be preliminarily delegated to a potential `cpp_bin_float_128` type,
-which is under development for __Boost_Multiprecision.
+TBD by Chris: Clearly state that only 16, 32, 64, 128 are portable, as only these are IEEE754.

[h4 Survey of existing extended precision types]

@@ -628,7 +523,7 @@

# With the availability of Boost.Multprecision, C++ programmers can now easily switch to using floating-point types that give far more decimal digits of precision (hundreds) than the built-in types `float`, `double` and `long double`.

-[endsect] [/section:context The context within existing implementations]
+[endsect] [/section:context The context among existing implementations]

[section:references References]

Next message: pbristow_at_[hidden]: "[Boost-commit] svn:boost r83562 - in sandbox/precision/libs/precision/doc: . html html/precision"
Previous message: pbristow_at_[hidden]: "[Boost-commit] svn:boost r83560 - trunk/libs/math/example"

Date view	Thread view	Subject view	Author view

Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk