|
Boost : |
Subject: Re: [boost] [xint] Boost.XInt formal review
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-03-05 12:57:56
(I apologize for getting to this message out of order, but it had a lot
of information and I wanted to give it the attention it deserved.)
On Thu, 3 Mar 2011 12:33:16 -0500
"Stewart, Robert" <Robert.Stewart_at_[hidden]> wrote:
>> 3. Please explicitly state in your review whether the library
>> should be accepted.
>
> There are many documentation issues to be addressed.
>
> There are design issues to address.
>
> My vote is a conditional yes. Details below.
Thank you for your review. I'll address your comments below.
>> - What is your evaluation of the design?
>
> I'm concerned about the application of Boost.Move and the continued
> reliance on COW. I don't know Boost.Move, so I cannot comment on
> whether it was employed correctly or completely, but enough concern
> has been expressed that I'm left uncomfortable.
Unless I'm forgetting something (which is entirely possible, given my
mental state over the last few days), no one has provided any concrete
suggestions or criticisms about my use of Boost.Move. The copy-on-write
stuff is a different story, of course.
> It seems that COW has encouraged more use of by-value semantics than
> would normally be the case. Performance tuning with COW disabled
> should reveal a lot. If restoring COW is still beneficial after
> addressing move semantics and other performance tuning, then it may
> be acceptable. At any rate, such tuning should be done before
> acceptance because it may affect the interface.
I fully agree, since the review has turned up at least one issue that
will likely make a big difference to the performance numbers.
> Various functions copy arguments quite inappropriately. Consider
> operator -=(). The argument should be a reference to const. It is
> only used to get the quantity to subtract from *this. A copy is not
> needed. This argument copying makes COW look more necessary than it
> otherwise would be. Such changes are necessary before acceptance
> because they affect the decision to keep COW.
Already on the to-do list.
> I like the direction Mathias Gaunard wants to go, but there's also
> great value in simply declaring an xint::integer and running with it,
> [...] IOW, xint::integer might be a type that the library supports,
> while permitting clients to use the algorithms with arbitrary types.
> [...]
Once the internals are stable, I plan to provide something like that.
Until then, it would simply make some improvements impossible without
breaking existing client code.
> The implementation focuses on portability which is of paramount
> concern in a Boost library. I fully expect that it will need to focus
> on performance soon after its initial release, however. It will be a
> ripe target for comparisons with faster C libraries, so it will need
> to be fast to be as useful as its potential. Custom assembly
> language, etc., will not detract from portability given conditional
> compilation and fallback on the current general purpose code. It will
> just make the implementation uglier.
And require time. XInt is still very young.
> The integer_t(const charT * str, std::size_t base = 10) and similar
> constructors seem out of place. Free functions that return an
> integer_t seem more appropriate and would be in keeping with the
> strtoX() functions.
I tried various forms of that in earlier incarnations. The results
varied between aesthetically displeasing and downright ugly.
Since I had to_string and to_binary, a set of to_integer functions would
have been ideal. But to *what* integer? The type is a template with many
options. The best I could come up with would have required client code
like this:
value = to_integer<integer>(...)
Making those constructors eliminated the need to specify a type
parameter to a function, allowing this instead:
value = integer(...)
Why force the person using the library to type both every time?
> There are several functions with bool parameters. These are obscure
> when used at the call site. Better is to use enumerated types. [...]
Noted. If COW survives, I'll do that.
> There are a number of implementation details brought into the public
> access section of integer_t, such as
> xint::detail::integer_t_data<BOOST_XINT_APARAMS>::data. Those details
> should be in the private access section.
If they were, I'd have to make a lot of other functions in the public
interface into friends. A publicly-available, but
obviously-for-internal-use-only _data function seems the lesser evil.
> All references to integer_t<BOOST_XINT_APARAMS> in the integer_t class
> template definition should be replaced with a typedef to improve
> clarity in the source.
Is that possible? I'm pretty sure I tried it, in an attempt to
de-uglify the documentation, and the compiler wouldn't accept any form
of it. The BOOST_XINT_APARAMS macro, and similar ones, were the best
compromise I could come up with. If you're referring to using a #define
instead, as I've done with some of the internal template classes, that
would cause Doxygen to show useless information.
> Is there another purpose for xint::binary_t than as a type from which
> to construct an integer_t? If not, there's no need for the converting
> constructor to be explicit.
Changed for the next release.
> Why is _make_unique() named as it is? The leading underscore suggests
> private/implementation detail, but it is clearly public, so the
> underscore is just odd and confusing.
Like _data, it is meant for use by library functions only, as a
less-evil alternative to making all of them friend functions.
> Why is factorial() a member function when abs(), square(), and others
> are not? If such functions need access to internals, that suggests
> the interface is deficient as others writing algorithms will not have
> the luxury of making their functions members.
For essentially the same reason that I'm using constructors instead of a
to_integer function: anyone using them would have to type out the
returned integer's type anyway. From the heading for that section of
functions:
Static Member Functions
These are functions that return an integer_t, but don't take one as
a parameter. To use them as free functions, you would have to
specify the return type anyway, so they're static member functions
of the type instead.
It's the difference between...
value = integer::factorial(...)
...and...
value = factorial<integer>(...)
The same amount of typing in this case, but I felt that the former
syntax was preferable.
> There is a great deal of duplication among the string-based
> constructors. Those functions should reuse a common implementation
> to avoid code bloat.
I'm sorry, I don't understand. The parts that can be factored out have
been, to the BOOST_XINT_RAWINT::from_string functions, where two of the
three overloads call the third to do the common work.
> The use of imperative programming to control what should be
> distinguished at compile time is disturbing. The runtime overhead may
> be provably insignificant, but I'd like to see that proved and
> documented.
The runtime overhead should be nonexistent -- see the next point.
> For example, most functions contain "if (Nothrow) { ...} else
> { ... }". That means there is a great deal of code carried in either
> case that never executes.
In the source, yes. In the executable code, I expect the compiler will
eliminate code that is trivially proven to be unused (via "dead code
elimination"). Every compiler I've ever looked at the assembly output
from, up to and including the $20 Mix Power C compiler I started
learning on in the late eighties, has had that capability; I doubt any
compiler could be sold today without it.
> The branches also mean there is more opportunity for bugs to creep
> into one variant and not the other.
That's why I've arranged them the way I have -- so I can see all the
code needed for each variant in one place, and not miss updating one
because I forgot to look for it.
> An application that only uses integer_t<options::nothrow> instances,
> there is a lot of code in those else branches that isn't needed. Far
> better would be for no_throw_integer to derive from integer_t.
I had that in previous iterations, and was told that since they were so
similar, I had to make them all a single class if I wanted the library
to be accepted.
> Code like the following appears in most member functions. Each such
> case should be factored out into a member function:
>
> if (Threadsafe == true) { r._make_unique(); data.make_unique(); }
If it were made into a member function, it couldn't be trivially
detected as dead code and eliminated in variants that don't use it, and
some compilers might not be smart enough to follow the function call.
That's a feature, not a bug. :-)
> is_odd() and similar member functions are unnecessarily complicated.
> Here's a simplification:
>
> template<BOOST_XINT_CLASS_APARAMS>
> bool integer_t<BOOST_XINT_APARAMS>::is_odd() const {
> if (Nothrow && is_nan()) return false;
> return data.is_odd();
> }
Agreed, for is_even and is_odd. They've been changed. I didn't see any
others in a quick scan of the code.
>> - What is your evaluation of the documentation?
>
> The documentation is the weakest part of this library. There is no
> helpful flow introducing the concepts in the library.
Unfortunately, the majority of concepts are either trivial to any
reader with a grade-school education (addition, multiplication,
etcetera), or far too complex to provide any sort of tutorial for in a
library (like prime number theory). If you can point out any in
between, I'll be happy to document them, I'm too close to the subject
to necessarily know what's not trivial.
> If one clicks through all of the links on the Main Page, one discovers
> a good deal, but too much of it comes out of order or simply requires
> user-directed discovery. For example, one of the early links is to
> the RSA Encryption example, but it relies on an understanding of many
> parts of the library not previously explained. The documentation
> should have a sequence of topics/pages to read that logically present
> the capabilities of the library to be acceptable.
If you'll provide a point-by-point outline of what you'd like to see,
I'll be happy to write it.
> The following are my comments regarding the various pages I studied.
> [...]
Thank you. I note that many of these comments are simply stylistic
choices, which are as personal as a developer's choice of text editor.
I'll comment on a few of your comments below. Please take the others as
noted, and if clear improvements over what's already there, acted on.
> Why would I use it?
>
> [...] Bullet 2 should probably link to performance comparison data
> (which I don't find in the documentation).
You don't find it because it's not there, deliberately. :-) As I said
earlier today, there isn't much point to comparing a child's speed to
that of an Olympic gold-medalist's.
> Bullet 3 deviates from the preceding text by using a staccato style
> with sentence fragments and self-describing the documentation as
> "complete and carefully maintained" is excessive. [...]
I spend a great deal of time on the documentation, and not to put too
fine a point on it, it's an order of magnitude better than that of even
a few accepted Boost libraries (not naming any names, I don't have the
time to contribute patches to them so I don't think complaining about
them is fair). "Complete and carefully maintained" is, in my biased
opinion, accurate, despite the current limitations you've identified
(and which I'll be remedying).
> Bullets 4 and 5 are unnecessary. Those "features" are common to all
> of Boost code.
Unnecessary once it's an official Boost library, but until then,
sometimes you've got to toot your own horn to show people that you've
got one.
> [...] You've done a very nice job of formatting Doxygen's output.
Thanks, I've tried to pay careful attention to it.
In the RSA example...
> The code indentation makes the access sections hard to spot.
I'm not sure what you mean by "the access sections," could you clarify?
> While on the matter of naming variables, "n," "d," and "e" are not
> descriptive names for data members. Contrast those names with
> "blocksize." I'll guess that "n" is for "numerator," "d" is for
> "denominator," and "e" is for "exponent," but the longer names would
> remove all doubt.
Unfortunately, those are the variable names traditionally used in most
RSA descriptions, so changing them would make it harder for someone to
follow the encryption side of it. I've added comments explaining their
purpose, as well as other general improvements to that example, most of
which were based on your comments.
If you're curious, 'n' is used as the modulus for the public and
private keys; 'd' is the decryption (i.e. private) key component; and
'e' is the encryption (i.e. public) component of the key.
> Exceptions and the nothrow_integer type
>
> [...] Para 1, Sent 2, avoid starting a sentence with a conjunction;
> you missed the inefficiency concern of exceptions, too. Suggestion:
> Exceptions may make your code harder to follow or write, and can
> certainly affect efficiency.
While that's true, XInt's nothrow_integer type does nothing about the
efficiency problem in the current iteration. It simply operates by
catching such exceptions internally. In a later version, if there's
enough call for it, I'll change that.
> Reading this page gives me the impression that the "Nothrow variants
> of integer_t" mentioned in the The Not-a-Number Value page are really
> "nothrow_integer." If so, please be consistent and link to this page
> from the The Not-a-Number Value page.
nothrow_integer is an example of a Nothrow variant, but not the only
possible example.
> integer_t(const integer_t<...> & b, bool force_thread_safety = false)
>
> The description is "This is an overloaded member function, provided
> for convenience. It differs from the above function only in what
> argument(s) it accepts." [...]
That wording comes from Doxygen, when you tell it that a function is an
overload via the \overload command.
> [...] a user that wishes to understand this particular constructor
> must find the other to which you refer in order to get a proper
> description. Please duplicate all such descriptions rather than rely
> on such indirection.
Sorry, but I'm confused yet again. All of the constructors already
include unique descriptions. The \overload command is there only to
draw attention to the fact that there are other functions that share
that name.
> integer_t::hex_digits(), is_even(), is_odd(), sign()
>
> The description states that, "The nothrow version returns zero instead
> of throwing." There's no mention of any exception being thrown.
> Under what circumstances might that happen? [...]
Not-a-number, for instance. And any object can throw an out-of-memory
exception.
-- Chad Nelson Oak Circle Software, Inc. * * *
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk