Boost logo

Boost :

Subject: [boost] Exact Decimal library
From: Andrew Robb (ajrobb57_at_[hidden])
Date: 2012-06-23 14:06:28


Introduction

C++ needs UTF-8 text (Boost.Locale) and exact decimal types support.
Together they allow full processing of modern business text. Normal
existing binary exponent types (float, double and long double) are not
suitable for exact decimal numbers. The IEEE 754-2008 decimal exponent
types should be allowed in binary files. Most processors already use IEEE
754-1985 for binary exponent types. IBM processors already support IEEE
754-2008 decimal exponent types in hardware and can used for optimization
for exact decimal. Other processors may use fixed-point exact decimal.
Exact Decimal

Exact decimal numbers indicate not only the value but also the precision.
Thus 0.3 and 0.30 indicate the same value but with different precisions.
With rounding, the value 0.3 indicates a value between 0.25 and 0.35 but
the value 0.30 indicates a value between 0.295 and 0.305. These are
different ranges albeit different around the same centres but must be
considered different because of their precisions.

Currency values are ‘exact’ as there is no legal or commercial meaning in
between the smallest adjacent values defined precision. Even when the GBP
went decimal with a halfpenny coin, most banks would only keep values to
the penny. A lot of normal commercial exact decimal values can be held in
32-bit with fixed-point decimal to 2 places (USD, EUR and GBP currencies)
up to about 20,000.00 - more than enough for weekly groceries. As such,
they are as quick as 32-bit integers for addition and subtraction an nearly
as fast for multiplying too.

Exact decimal processing would not require IEEE 754-2008 decimal exponent
types and could use decimal fixed-point indest. Indeed, the range of
significand could be more than used by IEEE 754-2008 decimal exponent types
(e.g. java.math.BigDecimal). Embedded processors can support fast exact
decimal their own way (floating or fixed).

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf Extensions to
support embedded processors must not give a general range decimal
fixed-points.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1176.pdf Extension for the
programming language C to support decimal floating-point arithmetic. Again,
from IBM and not for decimal fixed-point type.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2849.pdf Extension
for the programming language C++ to support decimal floating-point
arithmetic. Unfortunately the paper looks only at the IEEE 754-2008
hardware implementation rather than using decimal fixed-point
implementation. The proposed solution might be enough for high-end database
servers but not would helpful for x86 and ARM type embedded processors,
where power savings for would be allowed by turning off the binary
exponential processor as well as improved accuracy.

For existing binary exponent types (float, double and long double), I think
there should be new format string type (e.g “-123U-12”). This has a signed
integer significand and a signed integer binary exponent. The significand
cannot have decimals (no decimal point). Those IEEE 754 formats can
indicate “Inf” and “NaN” as well (numeric_limits::is_iec559). There is no
conversion error from binary exponent to decimal as is suffered currently.

The new decimal types use a opaque arrangement. This allows each platform
to use its own hardware to best effect. Each value is defined as a fixed
number of decimal points. The significand grows or shrinks opaquely to take
any value and can change how it works for each type. The fixed number of
decimal points can be over the thousands before or after the decimal point.
Thus the range of the decimal fixed points will be the same or more than
those for decimal exponent types (-6143..6144). The significand has a least
maximum 34 digits, again to match for decimal exponent - probably at least
38.23 for 128-bit significand.

The method of keeping the significand can change with size and processor.
For instance, 32-bit binary is fast on 32-bit processors (or 64-bit native
binary). 30-bit (9 digit) or 10-bit (3 digit) decimal can grow (BigDecimal)
for thousands of digits. IEEE 754 2008 decimal exponent types for IBM
processors (also 10-bit groups of 3 digits) but slow without hardware
support.

Much of the work of exact decimal are similar to the SQL-92 types for
NUMERIC (variable decimal places) and DECIMAL (fixed-point decimal) differ
by variable or fixed decimal point. The NUMERIC type is similar to decimal
exponent. The DECIMAL fixed-point decimal can be set by template or
constructor. Only run-time decimal places available for NUMERIC. We don’t
limit the number of decimal places but the SQL-92 types limit decimal
places to 0 and number of digits (MySQL 5.0.3 supports 65 digits for
NUMERIC and DECIMAL and DB2 supports 31 digits.)
Accumulators

There should be _Accum that can opaquely contain several decimal
fixed-point accumulators of all different sizes (with the same fixed number
of decimal points). Thus small efficient accumulators are used with small
values. Whereas a _Decimal has a union defining the value, a _Accum might
have a structure with similar members.

If a overflow comes, its old accumulator value is added the to a slower
bigger accumulator and reset the small accumulator. The small accumulator
receives small values again. When the values are accumulated, the small
accumulators are combined into the largest used accumulator for the final
answer. Such a complicated behaviors arrangement needs compilation rather
than C++ coding. For instance, using the overflow flag uses information
from to the compiler.

Operations

We are only interested in (commercial) safe exact decimal operations for
the following types:

_Numeric: decimal exponent
_Decimal: decimal fixed-point
_Accum: several sized decimal fixed-point accumulators

Multiply

The exact decimal can be multiplied by other exact decimals and integers.
The resulting decimal places are the sum of the right hand side.

lhs = one * two;
lhs *= integer; // or 0 decimal place

Divide

We can only divide by integers (or exact decimal with 0 place). The results
have the same number of decimals. A divisor of zero will be thrown (no
infinity).

lhs = rhs / integer;
lhs /= integer;

Modulo

There will be a remainder from a divide. Again with the same number
decimals.

lhs = rhs % integer;
lhs %= integer;

Add and Subtract

We can add/subtract with other exact decimals with the same number of
decimal places. Mixing values with different decimal places may cause
problems for exact values (different truncation for instance).

lhs = rhs + val;
lhs = rhs - val;
lhs += val;
lhs -= val;

Where values with different decimal places need combining explicitly we use
trunc() to match the decimal places first. This can increase as well as
decrease decimal places. It also allows tracks of errors.

Comparison

<, <=, ==, !=, >= and > between exact decimals with the same number of
decimal places.

Collections

Any _Decimal collection can be limited to values with the same decimal
fixed-point places.

Truncation

The binary exponent values float, double and long double have separate
approximate values 1.3f, 1.3 or 1.3L for different sizes. I propose another
exact decimal value, for example 1.300d for three decimal places.
Alternatively use exact decimal exponent 1300e-3d version (using integer
significand).
Value Approximation
1.3f - 1.2999999523162841796875..
1.3 - 1.3000000000000000444089209850062616169452..
1.3L - 1.3000000000000000000000000000000000385185..
1.300d or 1300e-3d - 1.300 (exact)

Where a value is not the precision, we must expand or truncate to required
decimals places and should choose a method of truncation (matched to IEEE
754-2008):

ceil - the most negative answer not less than the original
floor - the most positive answer not greater than the original
zero - the answer furthest from zero not further than the original (the
usual C method)
round - the answer nearest the original choosing the most positive in case
of a tie
bank# - like round but choosing the even last digit in case of a tie
expand - not truncation (zero padding) and the error should be zero

As well as tructaton is then required value, there were be a leftover
(similar in % value with a /). By keeping a record of truncation we can
assess of errors.

template <typename T>
_Decimal & _Decimal::trunc(TRUNC, T const &source, T *error = 0);

This will be a preferred way to convert binary exponent types into a
decimal value for printing.

Formatting

For place 0 decimal format like integer and between places 1 and 9, the
format will be decimal and trailing zeros are significant. Otherwise with
the default format will be a exponent (matched to the decimal) with a
integer significand. They will resemble binary float types, except that
they are accurate and avoid decimals in the significand.

123 (0 decimal place so no decimal point)
-0.012340 (6 decimal places)
-12340e-12 (12 decimal places or decimal exponent -12)

Final number formatting will be up to Boost.Locale using Unicode.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk