Boost logo

Boost :

Subject: Re: [boost] [xint] Boost.XInt formal review
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-03-06 19:25:51


On Sun, 06 Mar 2011 13:44:37 -0800
"Jeffrey Lee Hellrung, Jr." <jhellrung_at_[hidden]> wrote:

Thank you for the review.

>> 3. Please explicitly state in your review whether the library should
>> be accepted.
>
> No, XInt in its current incarnation should not be accepted. Yes, it
> would be great for boost to formally endorse a bigint library.
> However, it's not like the XInt library will be unavailable for
> others to use if not accepted into boost. [...]

I have to respectfully disagree. One company I work with is willing to
use Boost itself, but not libraries from the Boost sandbox. I myself am
leery of anything from the sandbox, for interface stability reasons...
if upgrading a library may mean that I suddenly have compiler errors in
previously stable and working code, and have to spend more time
correcting it, I'd prefer to look elsewhere.

XInt's existing interface could be frozen at its current state with no
problem. Even the removal of the copy-on-write stuff would only result
in one function having an unused Boolean parameter. Any additional
functionality can easily be added in a way that doesn't break existing
client code.

> - is_odd, is_even, sign, hex_digits, is_nan: These are all member
> functions with no same-named free function...which makes me wonder
> why these were chosen to be member functions while the likes of
> getbit, setbit, is_prime, etc. were chosen to be strictly free
> functions.

Because n1692, the paper presented to the C++ Standards Committee in
2004 that I based XInt's interface on, has them there. I saw no point
in cluttering the interface by adding free functions that just call
those, although if that would satisfy this objection, it can easily be
done.

> - pow2, factorial, nan: Same deal. I don't see a problem with making
> these free functions and specifying the desired return value as an
> explicit template parameter. It will look the same but make the
> interface more uniform:
> typedef integer_t< ... > integer_type;
> integer_type x = integer_type::pow2(n);
> vs
> typedef integer_t< ... > integer_type;
> integer_type x = pow2< integer_type >(n);

I could easily argue either side of that one. When I originally wrote
that code, the current method seemed ever-so-slightly more appealing;
now I think the way that you're advocating might be.

If anyone else is reading this, please weigh in: do you prefer the
current placement of these functions as static members, or moving them
to free functions? The sole difference is a slight change in the syntax
used to call them, they'd be equally efficient either way.

> The Boost.Parameter interface to the xint::integer_t template is
> fine, although Boost.Parameter *might* be overkill. I was trying to
> find in the source code where the template parameters fed to the
> xint::integer_t template are interpreted, but failed.

No surprise, the code is pretty confusing. It's through integer_t's
base classes, such as nan_functions, in detail/options.hpp.

> I can't seem to find where xint::detail::integer_t_data is defined,

Same file, detail/options.hpp.

> nor any of the other base templates for xint::integer_t. I would
> think they'd be in one of the first 5 includes (not including
> exceptions.hpp and random.hpp) of detail/internals.hpp. This gives me
> the impression that the implementation, from a source file
> perspective, is badly designed, but I might just be dense.

It's actually the *last* include, the lonely one at the bottom of
internals.hpp, because one of its classes needs the get_nan function
defined in that file.

> In any case, as far as I know (that is, Chad hasn't given me a
> specific example), XInt doesn't use the parameter deduction feature of
> Boost.Parameter,

integer_t_data should clear that up. If I understood your explanation
of what you meant, I don't think the class could accept an allocator in
any arbitrary position without it, though I could well be wrong.

> so one could consider using a Boost.MPL map to pack template
> parameters together. This gives a slightly clunkier interface, but it
> avoids the need for the user to set BOOST_PARAMETER_MAX_ARITY, so
> there's definitely a tradeoff.

One I would by far prefer not to make. Clunky interfaces are what I was
trying to get away from with this library.

> The random number stuff and the cryptographic-specific stuff should
> not be included within XInt. These should be left to other libraries
> (Boost.Random in the former case).

It is. Of the two random-number classes in the code, one is just a
wrapper around a fast Boost.Random class that provides a default value
for convenience, the other implements the same abilities as
Boost.Random's random_device class in a form that's slightly easier to
work with in the examples (Boost.Random couldn't handle that at all
under Windows when I originally wrote it). Client code must specify a
random number generator for the few (two, I think) functions that can
use one, and they work equally well with Boost.Random's classes.

I'm not wedded to the default_random_generator. It's no longer needed
by any code in the library, and could be deleted without any other
changes. But the strong_random_generator is significantly more
convenient for the examples, and somewhat more convenient for casual
client code use, as it doesn't require a compiled library to use as
Boost.Random's does.

The random number functions in the library are there to make it easier
to generate large random numbers, which of course Boost.Random can't
natively support. That's a legitimate mathematical purpose, one that
would have to be reimplemented by a significant number of programs
using the library if it were removed.

> I think the scope of a bigint library is very clearly to provide
> arithmetic algorithms and data structures on which the algorithms
> operate. This does not include cryptographic functionality.

This might be pedantic, but there really *isn't* any cryptographic
functionality in the library. There are only mathematical functions.
is_prime (or its new name, is_probably_prime) is most useful in the
field of cryptography, because huge prime numbers are of particular
interest in that field, but prime numbers have a lot of uses outside of
cryptography as well. is_prime is no more a cryptographic function than
pow or gcd is.

If I had to remove functions that are primarily interesting to
cryptography, powmod would have to be on the list too, and it's of even
more general use than is_prime.

> As I've noted before, I'm disappointed that the algorithms Chad's
> used to implement the arithmetic operations are hidden behind the
> xint::integer_t class. Given the richness of these algorithms, I
> would consider it a priority to expose them to the public through a
> generic interface. Much has already been said about this both in the
> last week and even earlier, and if Chad would like some help in this
> regard, I would be willing to assist.

Thank you. I may take you up on that at some point, or at least ask
your opinions on various options, but I think I see how to do it in
general.

> Part (albeit a small part) of my reason for not accepting XInt is, by
> Chad's own admission, that he wants the interface to settle down
> before separating the algorithms out (if I'm misinterpreting your
> comments, Chad, please correct), suggesting the interface hasn't been
> finalized yet.

I may have misspoken and said interface at some point, but my intention
was to let the *internals* settle down before providing access to them.
The justification is that opening up access to them now would tie my
hands, because then I couldn't make changes to them without breaking
client code, which is the last thing I ever want to do.

> Also, I agree with Robert Stewart that the integer_t template should
> be renamed basic_integer, a la basic_string.

As do I, and it's already high on the to-do list.

> The fixed-size integers leave something to be desired, but as Chad
> himself has emphasized, they are second-class citizens. So-called
> second-class citizens probably should not included within the
> library, and if they are, not without huge warnings saying something
> like "EXPERIMENTAL" or "SUBJECT TO CHANGE". [...]

Eh? Only the internals of them are subject to change. The current
interface can be preserved without any loss of functionality, and added
onto without breaking anything.

>> - What is your evaluation of the implementation?
>
> Chad already knows my feelings on COW, but to be explicit: I'm not
> necessarily against use of COW in general, but I have yet to be given
> a use case where COW would outperform move emulation in theory.

I've located two: abs and negate. I believe those were the functions
that originally prompted me to use it. I'm sorry that I didn't mention
them before; I knew I had a reason for going to all the trouble of using
it, but I didn't recall what that reason was until I saw those while
working with the code today.

Think about it. Both involve changes to the sign bit only, and one
only sometimes needs to alter it. It would be foolish to deep-copy a
ten-thousand-bit or larger number just to flip the sign bit. Even
worse, not to do anything to it at all, in the case of abs and
already-positive numbers.

They may seem like minor-league players, and completely unworthy of
having such a huge influence on the design, but they're used heavily in
many low-level parts of the library (where, I might add, it couldn't
easily be layered on as an optional afterthought). Addition and
subtraction make heavy use of them, and they're integral parts of
division, the GCD algorithm, the pow and powmod functions, and if
memory serves a lot of other functions (via unary operator-) that I
can't easily locate by a search.

Move emulation does little for those, because in many cases you're
*not* moving the magnitude. You're just passing that same magnitude
around with possibly-different signs. I haven't delved into the
functions that use them, but as I recall, at least some of them need
multiple representations of a number (like the original and absolute
values) in different parts. CoW provides a provable benefit there.

I haven't yet proven that CoW is justified. I'm not quite to the point
I can test it yet, and it may well turn out that it provides only a
negligible benefit in real-world uses. But at least in theory, there's
a very good reason for it, with or without move emulation.

> [...] It is disturbing to me that the COW infrastructure still exists
> (and even more disturbing that it is still used internally!) when COW
> is supposedly disabled.

You make it sound like I was trying to hide it. The behavior is clearly
explained on the "Copying and Argument-Passing" page of the
documentation. It should probably be on the "threadsafe vs.
copy_on_write" page too, but it didn't occur to me.

The only concrete complaint that people could give me about CoW was
that code using it couldn't be made thread-safe, which is true. So I
made a way to instruct the library to force all integer objects to have
unique storage before they're returned to user code, and made it the
default to prevent unpleasant client-code surprises. The internal code
is single-threaded, and completely under my control in any case if I
find it useful to make it multithreaded someday, so that answered the
thread-safety complaint.

The only other complaints people had was that they didn't like it,
which is not a valid reason to change it, and that move-emulation makes
it irrelevant, which it doesn't in this case.

> Based on others' reports on the implementation, there appears to some
> amount of bloat with certain pieces of the data structure existing
> but remaining unused, depending on which policies are enabled. I
> consider this a weakness of the implementation.

I indicated that it was unused once, early in the review, but I was
wrong. It's always used at present.

> Multiplication and square roots (and possibly some other algorithms)
> could likely be drastically improved from a complexity standpoint. I
> know the reasons for not doing this are covered in the documentation,
> but it is still a weakness in the implementation nonetheless.

Yes, though an easily corrected one. There are more important changes
to make, but I'll get to them as soon as I can.

> The "fast" claim on the primary documentation page is
> unsubstantiated, and, according to Chad, somewhat unsubstantiatable
> (assuming that's a word). I don't think any claims about performance
> should be made *at all* without some kind of benchmark or statistics
> to justify it. Any library will be fast on a sufficiently fast
> (perhaps idealized) computer.

As I've said before, I'm not claiming that it's "faster" or "fastest,"
which are legitimately questionable claims. I'm saying that it's fast,
as a way to conversationally lead in to the explanation that
portability is more important, but reassure the reader that speed is
still an important design criteria.

In any case, thank you for the review, and though I disagree with many
of them, the comments as well.

-- 
Chad Nelson
Oak Circle Software, Inc.
*
*
*



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk