Boost logo

Boost :

From: Matt Borland (matt_at_[hidden])
Date: 2025-01-22 22:04:20


On Wednesday, January 22nd, 2025 at 3:20 PM, Alan de Freitas via Boost <boost_at_[hidden]> wrote:

>

>

> Boost.Decimal Review
>

> I want to thank Matt Borland and Chris Kormanyos for their work.
>

> > Does this library bring real benefit to C++ developers for real world
>

> use-case?
>

> Yes. It should be helpful in "human-centric calculations". It might be
> useful in financial applications, although I have some question about this
> below.
>

> > Do you have an application for this library?
>

>

> Not at the moment.
>

> > Does the API match with current best practices?
>

>

> My biggest point of disagreement is this single-header design. The
> documentation says "The entire library should be accessed using the
> convenience header `<boost/decimal.hpp>`": Although providing a single
>

> header as an extra option is common practice for Boost libraries, I don't
> think it should be a recommendation (with the word "should") let alone
> enforced as the only way to consume the library. The rest of the
> documentation always assumes the user will include `<boost/decimal.hpp>`.
>

>

> - The sections "cmath support", "cstdlib support", "format support", and so
> on..., don't even tell me what header I need to only include that
> functionality.
> - The documentation implies `<boost/decimal.hpp>` includes way more than
>

> anyone needs because even the C++ standard library splits that
> functionality into different headers while Boost.Decimal doesn't.
> `<boost/decimal.hpp>` is forcing users to unnecessarily include a big chunk
>

> of the standard library. And this functionality is so diverse that's very
> unlikely the user really wants to use even 10% of what's being included.
> - A single-header design is not a way "to make things just work with no
> effort on the user's part" because it forces the user to use the library
> this way, not simply allowing them to do so like all other boost libraries.
> - If a potential user says, "X seconds of compile time is a problem," it's
> unhelpful to reply with something like "I don't think X seconds of compile
> time is not a problem for you." Especially when you consider that it
> could be 3 seconds per translation unit.Most recent libraries are making an
> effort to reduce compile times by breaking up the headers and avoiding
> header-only libraries. This is a big complaint people have about Boost. The
> library is already header-only and making no effort to break up the headers
> makes no sense to me.
> - Telling the user to rely on macros to disable optional functionality
> doesn't sound good either, because now users have to blacklist rather than
> whitelist what they want. Blacklisting what you don't want with macros is a
> huge workaround. It involves a new convention and blacklisting to solve a
> design problem that has already been solved.
> - Implying this design matches the C++ standard library is also
> unreasonable because it brings in many different headers from the standard
> library. What would match the STL interface, almost by definition, is one
> header for cmath support, one header for cstdlib support, one header for
> charconv support... Even the documentation is structured by the different
> standard library headers it replicates the functionality for.
>

See: https://github.com/cppalliance/decimal/issues/804. There's an active branch `modular` where I am working on resolving the issue.

> A minor point:
>

> - The public API probably shouldn't return `detail::uint128`. It can't be
> considered an implementation detail if the user can be passing it around.
> It could be a "see-below" type, where the user should ignore information
> about the members but can still use.
>

Yes this change makes sense, and several others have brought it up. I tagged you an issue to track it since you're the most recent one to mention it.

> > Is the documentation helpful and clear?
>

>

> Yes. The main point I'd like to discuss is in terms of explaining the
> trade-offs. The documentation says "Decimal floating point numbers avoid
> this issue by storing the significand in base-10 (decimal)." and the
> "issue" mentioned in the previous sentence is "representation errors". This
> seems misleading. It makes it sound like `decimalX` it's a variant of a
> bigint type and that attempts to avoid all representation errors. But if I
> understand correctly, decimal numbers avoid much more specific types of
> errors: representation errors when the base is not a multiple of a prime
> factor. It has the same representation errors as base-2 floating point
> numbers when 1) the base is not a multiple of a prime factor (although
> there are more multiples of prime factors) and 2) there are more and more
> rounding errors in the value as the exponent gets larger and larger. So the
> final trade-off is less representation errors (because there are more
> multiples of prime factors) at the cost of performance ("ratio to double"
> between ~30 and ~150 for most operations). It might be wrong about these
> details, but I still believe that sentence in the documentation is
> misleading.
>

> There's also a technical question I still don't understand about 1) the
> relationship between decimal numbers and fixed-point real numbers and 2)
> the impact of this relationship on potential applications. For instance,
> Bitcoin uses integers (Satoshis) to represent "decimal" units of BTC. This
> representation is just an integer and a fixed/implicit scaling
> factor/exponent. The same representation can be used for dollars: the
> internal units are cents and the main unit is a dollar. When we compare
> that with decimal floating point numbers, the trade-off becomes 1) faster
> operations (even faster than native base-2 floating point types) and 2) no
> precision errors up / exact accurary for numbers in the predefined scale,
> and 3) reliable precision regardless of the best exponent to represent the
> number; at the cost of a more limited range of values that can be
> represented.
>

> With that trade-off in mind, my intuition is decimal floating point numbers
> are not so useful for financial applications (the only application
> discussed in the "Examples" section). With 32-bit fixed-point numbers, I
> could represent more dollars (or cents) or BTC (or satoshis) than will ever
> exist (the limited range of values is not a problem) while getting the
> performance and precision benefits of integers.

In dollars with a 32-bit unsigned integer you could represent ~4.3 Billion USD. The United States Government spends on the order of 10,000 Billion USD a year.

> If the number is larger
> than that maximum, that means either I made a mistake (for instance, in
> currency conversion) and I should definitly fix this mistake (I can't lose
> $1000000 and ignore it because my best exponent is now Y for some reason -
> $1000000 is still a lot and I can't move on until this "precision error" is
> fixed). If this is not a mistake, then I'm in a subdomain where the
> precision errors don't matter anymore because I need those large exponents
> to represent something else. The problem is the extra precision from more
> multiples of prime factors are now not as relevant to me and base-2 numbers
> are probably fine (for instance, in some statistics about the data).
>

> I might be wrong about all of that, but if that's the case I assume other
> people will also be wrong about it. The docs could correct these mistakes
> or, if I'm wrong, provide other examples of useful applications. That's why
> I think the main application of the library is "human-centric calculations"
> and it's only useful in other fields when there's some intersection with
> that, like there is when reading data in base-10 format that might not fit
> a fixed-point representation. This discussion could also open possibilities
> for representations using integers in this library or other libraries in
> the sense that base-10 floating-point is, in a way, fixed-point numbers of
> a dynamic exponent.
>

I think including something like Reuben is planning on doing where he synthesizes a variable fixed point type could be a useful addition.

> Other minor points:
>

> - It seems like most code blocks start with some whitespaces in the first
> line for some reason.
> - Constructing a decimal from floating point numbers leads to precision
> errors, and constructing it from independent integers makes it difficult to
> read. Since we construct it from strings (Literals Support), this could be
> listed in "Basic Usage". I couldn't figure out the functions to construct
> these values from literals without using the literals in the "Literals
> Support" section.

That's fair. I have also moved literals into a literals namespace since the beginning of the review since several people have mentioned that as well.

> - The signatures could contain the explicit requires clauses and concepts
> instead of "Where types `T` and `T2` are integral types" in the exposition.
> - "Lastly the sign follows the convention of `signbit`": This could be
> linked to
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/signbit
> or removed from the beginning of the exposition. The reader might think it
> is important information to understand the rest of the exposition and stop
> reading the documentation to google it. Also, `bool sign` only needs to be
> explained so much because it's unintuitive. If it were called `bool is_negative`, or something like that, there would be nothing to explain.

I'll add an explanation of signbit rather than assuming people know it's counterintuitive.

> - The "3.2.8 Note" could just be part of the reference documentation. This
> way the user would already have this information when looking at the
> function. It would also be useful to explain how that behavior compares
> with other numeric types.
> - Typo in "Integeral"
> - I missed some better explanation about the difference between fast and
> regular types. Like in Boost.Unordered where they explain the data
> structures and then the trade-off.
> - I'm used to the API reference being at the end of the documentation. I
> was surprised to see there's a lot of exposition after the reference:
> Examples, Benchmarks, "Design Decisions", ...
> - "Comparisons" section: "between vec[i] and vec[i + 1]": where the numbers
> come from in the implementation is irrelevant. Only the number of
> replicates (20,000,000) is relevant. "This is repeated 5 times to generate
> stable results" doesn't add entropy to help stability. It just means you
> have 100,000,000 replicates with a bias. You could just generate
> 100,000,000 values. It's also not relevant to the experiment that the
> values are in a vector. In fact, unless there's a technical reason for
> that, generating the numbers for the tests and discarding them is much
> easier. In any case, 100,000,000 is way too much for any probability
> distribution. After much less than that you'll have a stable p-value.
> - Comparison results: The tables need some explaining so the reader doesn't
> have to look at each column and header to infer what's happening. It's
> usual to include something like "The second column means ____. Lower values
> are better.". It's also common practice to include the standard deviation
> in the table.
> - The tables are not sorted by the values. This is particularly confusing
> because `GCC _Decimal<X>` always beats `decimal32`. This is not discussed
>

> in the document. `GCC_Decimal<X>` isn't even defined and discussed.
>

I believe Peter Turcan brought up similar points. I'll add a link to the GCC docs at the top explaining what it is.

> - There should be a conclusion about the benchmark results. Did anything
> change from one platform to the other, etc? If I read the results
> correctly, the conclusion is I should always use `decimalX_fast`, then
> `GCC_DecimalX` if I need other properties, then `decimalX` if not using GCC.
> - If section "Basic Operations" is about "+, -, *, /", then the first
> comparison subsection "Comparisons" should be renamed to reflect what
> operations it compares ">, >=, <, <=, ==, and !=" (relational operators).
>

> - Both columns in the table have the same information in a different scale
> proportion. This means all benchmarks for a platform could be in a single
> bar plot where the x-axis has each number type, and for each number type,
> we have the ratio to double for each operation in a bar.
>

> > Did you try to use it? What problems or surprises did you encounter?
>

>

> Yes. I compiled the code and ran the examples.
>

> > What is your evaluation of the implementation?
>

>

> It seems like performance is still much worse than other implentations. I
> don't think this is a reason to reject the library. Just I hope the authors
> would keep an eye on that, implementing improvements over time, if
> possible. Or maybe some other native types or implementations could be
> wrapped in the library.
>

Chris and I have been discussing offering a wrapper around the Intel library for those who have it, and want to use it. It's the same model that multi-precision uses.

> > Please explicitly state that you either accept or reject the
>

> inclusion of this library into boost.
>

> I recommend accepting the library.
>

> > Also please indicate the time & effort spent on the evaluation and give
>

> the reasons for your decision.
>

> I read the documentation and the source code. I compiled the library and
> ran the examples.
>

> I spent time on and off over the last week. In total, I spent about 2 days
> evaluating the library.
>

Thank you for your review and comments.

Matt





Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk