Boost logo

Boost :

Subject: Re: [boost] [xint] Boost.XInt formal review
From: Stewart, Robert (Robert.Stewart_at_[hidden])
Date: 2011-03-03 12:33:16


Vladimir Prus wrote:
>
> 3. Please explicitly state in your review whether the library
> should be accepted.

There are many documentation issues to be addressed.

There are design issues to address.

My vote is a conditional yes. Details below.

> - What is your evaluation of the design?

I'm concerned about the application of Boost.Move and the continued reliance on COW. I don't know Boost.Move, so I cannot comment on whether it was employed correctly or completely, but enough concern has been expressed that I'm left uncomfortable. It seems that COW has encouraged more use of by-value semantics than would normally be the case. Performance tuning with COW disabled should reveal a lot. If restoring COW is still beneficial after addressing move semantics and other performance tuning, then it may be acceptable. At any rate, such tuning should be done before acceptance because it may affect the interface.

Various functions copy arguments quite inappropriately. Consider operator -=(). The argument should be a reference to const. It is only used to get the quantity to subtract from *this. A copy is not needed. This argument copying makes COW look more necessary than it otherwise would be. Such changes are necessary before acceptance because they affect the decision to keep COW.

I like the direction Mathias Gaunard wants to go, but there's also great value in simply declaring an xint::integer and running with it, especially for those less comfortable with the more generic approach to things. A distinct advantage of Mathias' approach would appear to be the ability to parallelize computations. Perhaps the two directions can be included in one library. IOW, xint::integer might be a type that the library supports, while permitting clients to use the algorithms with arbitrary types. More concretely, perhaps operator +(xint::integer, xint::integer) can forward to the generic xint::add() algorithm thereby hiding the use of the algorithm behind the nice operator interface. That approach could be included in a second version of the library after its release, so I don't see it as preventing acceptance.

> - What is your evaluation of the implementation?

I have problems with some names (called out in my documentation comments below).

The implementation focuses on portability which is of paramount concern in a Boost library. I fully expect that it will need to focus on performance soon after its initial release, however. It will be a ripe target for comparisons with faster C libraries, so it will need to be fast to be as useful as its potential. Custom assembly language, etc., will not detract from portability given conditional compilation and fallback on the current general purpose code. It will just make the implementation uglier.

The integer_t(const charT * str, std::size_t base = 10) and similar constructors seem out of place. Free functions that return an integer_t seem more appropriate and would be in keeping with the strtoX() functions.

There are several functions with bool parameters. These are obscure when used at the call site. Better is to use enumerated types. For example, the integer_t constructor with a bool parameter named force_thread_safety, that parameter can be of the following enumerated type:

   enum storage_management { force_unique_storage, use_shared_storage };

Obviously, that latter change belies the use of COW and, ideally, COW will be eliminated from the implementation. Still, the parameter type is closely related to what the argument controls whereas "force_thread_safety" points to a side effect of the implementation detail selected. That extra argument will be moot if COW is removed, so this comment only applies if COW is retained.

There are a number of implementation details brought into the public access section of integer_t, such as xint::detail::integer_t_data<BOOST_XINT_APARAMS>::data. Those details should be in the private access section.

All references to integer_t<BOOST_XINT_APARAMS> in the integer_t class template definition should be replaced with a typedef to improve clarity in the source.

Is there another purpose for xint::binary_t than as a type from which to construct an integer_t? If not, there's no need for the converting constructor to be explicit.

Why is _make_unique() named as it is? The leading underscore suggests private/implementation detail, but it is clearly public, so the underscore is just odd and confusing.

Why is factorial() a member function when abs(), square(), and others are not? If such functions need access to internals, that suggests the interface is deficient as others writing algorithms will not have the luxury of making their functions members.

There is a great deal of duplication among the string-based constructors. Those functions should reuse a common implementation to avoid code bloat.

The use of imperative programming to control what should be distinguished at compile time is disturbing. The runtime overhead may be provably insignificant, but I'd like to see that proved and documented. For example, most functions contain "if (Nothrow) { ...} else { ... }". That means there is a great deal of code carried in either case that never executes. The branches also mean there is more opportunity for bugs to creep into one variant and not the other. An application that only uses integer_t<options::nothrow> instances, there is a lot of code in those else branches that isn't needed. Far better would be for no_throw_integer to derive from integer_t. Changing this aspect, if performance and binary size tests show it to be important, may affect the set of types or interfaces in the library, and must therefore be done before acceptance.

Code like the following appears in most member functions. Each such case should be factored out into a member function:

   if (Threadsafe == true) { r._make_unique(); data.make_unique(); }

is_odd() and similar member functions are unnecessarily complicated. Here's a simplification:

   template<BOOST_XINT_CLASS_APARAMS>
   bool integer_t<BOOST_XINT_APARAMS>::is_odd() const {
      if (Nothrow && is_nan()) return false;
      return data.is_odd();
   }

> - What is your evaluation of the documentation?

The documentation is the weakest part of this library. There is no helpful flow introducing the concepts in the library. If one clicks through all of the links on the Main Page, one discovers a good deal, but too much of it comes out of order or simply requires user-directed discovery. For example, one of the early links is to the RSA Encryption example, but it relies on an understanding of many parts of the library not previously explained. The documentation should have a sequence of topics/pages to read that logically present the capabilities of the library to be acceptable.

The following are my comments regarding the various pages I studied.

__________
Main Page

The page titles use title case, but the section headings do not. This looks inconsistent when title case links appear within the sentence case sections.

_____
Why would I use it?

The bullets would be better introduced with these shorter, bold-faced keys: Portable, Fast, Provides needed features, Open source, Closed source and commercial software friendly. That eliminates the redundant "Because it's" and otherwise shortens the key points of each bullet.

Bullet 1 is a little verbose. Suggestion: Written in modern C++ with many different operating systems, compilers, and processors in mind, XInt automatically adapts to the most efficient implementation available.

Bullet 2 could be a little less chatty. Suggestion: Portability is favored over performance, so there's no assembly language or other non-portable optimizations, yet performance is still high.

Bullet 2 should probably link to performance comparison data (which I don't find in the documentation).

Bullet 3 deviates from the preceding text by using a staccato style with sentence fragments and self-describing the documentation as "complete and carefully maintained" is excessive. Suggestion: XInt provides fixed and variable length integer types, modular arithmetic, cryptographically secure prime number generation, bit manipulation, and exception- or error-code-based error reporting, all within a friendly and intuitive interface. XInt's code is thread safe, but that can be disabled when not needed to maximize performance.

Bullets 4 and 5 are unnecessary. Those "features" are common to all of Boost code.

_____
How do I use it?

Para 1 is too chatty. Suggestion: Just include <boost/xint/xint.hpp>, declare a variable of type boost::xint::integer, and use that variable as you would an int, but without worrying about overflowing the range! Refer to A Very Simple Example to see this in action. If you need the more advanced features, refer to the reference section.

Why "Stand-Alone Examples" versus "Examples?" There are no other examples listed, so I think the latter is more appropriate.

The items linked in Detailed Usage Information should probably include explanatory text. For example, why do I need to know about the "Not-a-Number" value? If I want to understand how thread safety is implemented in XInt, is "The threadsafe Template Parameter" what I should read? The normal approach to Boost documentation is to provide some random access to topical sections and to provide a means for sequential navigation through the documentation. This lack makes the XInt documentation less consistent and less accessible.

The reference section, mentioned in para 1 of "How do I use it?" is not called out in the apparent table of contents at the bottom of the page. If someone skims the Main Page, it would be easy to miss that link.

You've done a very nice job of formatting Doxygen's output.

__________
A Very Simple Example

Sample program output would be a useful addition. Otherwise, the example is nice. This example should be linked from the first paragraph of the "How do I use it?" section.

__________
Fibonacci Numbers

While this doesn't illustrate much about the use of xint::integer, it is a nice way to show the value of a variable-length integer type. However, this example is marred by not showing the output. Obviously, it is necessary to know the size of unsigned long on the system producing the output, so the program should report the size and maximum value for unsigned long.

__________
RSA Encryption

There should be links to the documentation for fixed-length integers and the "secure" option in the introductory text. xopt::secure, in the example, is not linked to its documentation. The example also uses xopt::negative_modulus which isn't mentioned in the introduction or comments, and is not linked to that option's documentation.

The first BOOST_STATIC_ASSERT disagrees with the comment. The latter says "at least" while the former uses >.

For those unfamiliar with RSA encryption, some explanation of the algorithm, especially as it applies to the computation of SizeT and Bits, would be helpful.

It would be better to make the string formal parameters be references to const.

The code indentation makes the access sections hard to spot.

I found it surprising that the private constructor was implemented in the class definition while all other member functions were not. This was not even apparent at first because the function body was just an extension of the continued initializer list.

The lack of any differentiation in naming among formal parameters, data members, and local variables, makes the code a bit harder to follow. While you may choose another approach, I like using a leading underscore for formal parameters and a trailing underscore for data members. That makes the source of each variable easy to spot when it appears in an expression. This was driven home when I saw calculate_blocksize(n) at the end of the public constructor. I looked through the ctor body to find n's declaration and was surprised to find it to be a data member.

While on the matter of naming variables, "n," "d," and "e" are not descriptive names for data members. Contrast those names with "blocksize." I'll guess that "n" is for "numerator," "d" is for "denominator," and "e" is for "exponent," but the longer names would remove all doubt.

xint::binary_t and to_binary() are used in number_to_binary_string() and binary_string_to_number() with no explanation.

In the public constructor, if you test whether str.good() after the extractions into recordedbits, c1, c2, and c3, then there's no need to initialize those variables and you'd be sure the input was well formed.

encrypt() and decrypt() use xint::powmod() without explanation. BTW, following the powmod link, I found some issues with that function's documentation. The description references "e" and "m" but the arguments are named "exponent" and "modulus." The complexity description is confusing. What are n-sub-n and n-sub-e? I find the inclusion of a testing argument quite odd. Far better would be for powmod(n, exp, mod) to call powmod_impl(n, exp, mod, false), which leaves room to call an undocumented powmod_impl() with true. What's worse is that, in the Remarks, you backhandedly describe the no_monty argument. (If you keep that remark, s/internally, if/internally if/.)

generate() uses xint::strong_random_generator without its mention in the introduction. Add to that the use of the callback function, for which main() passes ::callback(), which is not explained, and this lack of description is disturbing.

generate() uses invmod() without explanation. I also followed the invmod() link and found the description lacking. No doubt if I understood the mathematics involved, I'd understand the description. As it is, I have no idea how n and modulus are used to produce the result.

The last complaint I can levy against this otherwise very nice example, is that you don't call attention to the value of XInt. For those that have implemented such things with built-in types, this would, I'm sure, be plain. However, your example is to illustrate the value of XInt and therefore you should illustrate alternatives or difficulties overcome.

__________
Fixed-Length Integers

This mentions "the integer_t template." This is the first I've noticed your naming convention. Apparently, you're using "_t" as a suffix to mean a class template. That suffix, in my experience, always denotes "type" and usually "typedef." I've never seen it used to indicate "template." My guess -- not having looked at enough of the documentation to know -- is that you wanted to reserve "integer" as a typedef for some specialization of your integer_t class template. I'd rather you used "basic_integer" as the class template name. There's at least precedent for the "basic_" prefix to indicate the class template that should be specialized to get something useful and basic_string/string exactly parallels basic_integer/integer.

s/memory-efficient/memory efficient/

"They also silently discard the upper bits on any value that is too big for them, which can bite the unwary" should be made more prominent. I'd suggest a warning:

WARNING: Fixed-length integers silently discard the most significant bits of any values that overflow the allocated memory.

_____
Note

Para 2 should start with, "By default, when calculating,..." or, "When not using the negative_modulus option with a signed, fixed-length integer...."

__________
The Not-a-Number Value

Bullet 2 is not quite right, I think: s/tries to throw/would otherwise throw/

Last para should probably be, "If you attempt to use a Not-a-Number in any other way, the result is another Not-a-Number, if the result is a Nothrow integer_t, or a not_a_number exception." If that is not correct or complete, please make it so, but don't leave things at "you will get a special value indicating failure."

__________
Zero and "Negative Zero"

Para 1, the last two sentences should be merged. Avoid starting sentences with conjunctions.

Para 2, needs a little rework. Suggestion: The XInt library has limited support for "negative zero" for use in a planned unlimited-precision floating point library to be built on it...."

Para 3, "the sign function" should not all be a link. Only "sign" should be a link.

Para 3, s/parameter true/parameter set to true/

__________
Generating Prime Numbers

Why isn't this in the Stand-Alone Examples section? This would be helpful before looking at the RSA Encryption example. Indeed, it would have explained the use of strong_random_generator and the callback function in the RSA example.

As before, sample output would be helpful.

__________
Exceptions and the nothrow_integer type

The title of this section is not in title case.

Para 1, Sent 2, avoid starting a sentence with a conjunction; you missed the inefficiency concern of exceptions, too. Suggestion: Exceptions may make your code harder to follow or write, and can certainly affect efficiency.

Para 2, references "nothrow_integer" and "xint::integer" suggesting that the former is in the global namespace.

Reading this page gives me the impression that the "Nothrow variants of integer_t" mentioned in the The Not-a-Number Value page are really "nothrow_integer." If so, please be consistent and link to this page from the The Not-a-Number Value page.

Para 2, uses that vague "special value indicating failure" which is "most often" NaN. This needs to be documented more specifically and clearly.

__________
The on_exception Function And -fno-exceptions Support

Para 2, s/record any error/record an error/

There are two features of the library confusingly conflated in one page. Suggestion:

"The on_exception Function

"The XInt library calls a function whenever it encounters an error that would normally throw an exception. By calling [on_exception], you can customize the behavior in such cases. The installed handler function is called with the file name and line number at which the error occurred, and the exception object that would normally be thrown. If the handler returns, then XInt will deal with the error normally. That means it will throw the exception or call abort() (see [GCC's -fno-exceptions Support] for details).

"GCC's -fno-exceptions Support

"The XInt library includes support for GCC's -fno-exceptions option. When compiled with that option, a program cannot use try, throw, or catch statements. XInt will call abort() instead of throwing an exception to indicate errors unless a user-supplied handler is installed via [The on_exception Function]. If the handler returns, XInt will call abort() anyway. (That means the handler can be used to report the error before XInt calls abort().)

"It is possible to use setjmp()/longjmp() to recover from errors when using this feature, but note that doing so will often result in memory leaks because no destructors or other cleanup code will run.

"To enable this feature, define the preprocessor macro, BOOST_XINT_NO_EXCEPTIONS, before including any XInt library header."

__________
The XInt Random Number System

The names of the various XInt types mentioned on this page should be linked to their documentation.

The linked titled, "the Prime Numbers page," is actually a reference to the Generating Prime Numbers example.

__________
The threadsafe Template Parameter

This page discusses "threadsafe" and "copy_on_write" so the title is misleading.

Para 2 could be simplified. Suggestion: If an application never uses an integer object or its copies outside the thread that created them, it is safe to use in a multithreaded application.

__________
Copying and Argument-Passing

This is a great description of the subject and provides a proper context for what's in The threadsafe Template Parameter. I suggest that the content of the latter be moved to this page.

__________
A Note on Algorithmic Complexity

The contents of this page need to be more visible. I'd hate to suggest that you link it from each function description with a complexity expression, but at the very least, the Main Page should probably contain this text since you don't know how early users will encounter such expressions following the other links. (See my comments above regarding confusing complexity expressions which this text would partly answer.)

__________
The Self-Testing Program

This should probably be called the "library validation program." However, the question is whether or when a user would run this program. Certainly there's value in the test program to show how to use the library, provided the code is well commented, but as a validation suite, I'd expect Boost's testing to be sufficient.

__________
Rationales

This should be called "Rationale."

_____
Then why didn't you follow those exactly?

What is intended by "those" and "them?" Presumably, this content is meant to answer part of the first question, "Why did you do X in the interface?" It would be far better included in that page. Still, the use of "those" and "them" is confusing. If I understand your intent, this would work better:

"Unfortunately, the XInt library was designed before I knew about n1744 and I adapted the library after the fact. This has lead to the following inconsistencies:

" - There are no mathematical primitive functions for long long, as suggested in n1744, partly because long long is not yet standard C++ but mostly because I don't see the need. The conversion constructors are efficient enough for smaller values that there wouldn't be much noticeable benefit from those functions.

" - There is no multiplication scheme with better performance than the "basecase" mentioned in §3.4 of n1692, primarily because of lack of time to research them all. Hopefully, few will use the library with numbers large enough to find the lack of more exotic algorithms overwhelming. I plan to add at least one alternative scheme in the future."

_____
Why use copy-on-write? I've heard that causes problems with multi-threaded code.

This is a better explanation than you provide in The threadsafe Template Parameter.

_____
What's with the nothrow_integer type?

Last para is phrased awkwardly. Suggestion: This design means only nothrow_integer code must check for the Not-a-Number value, so any speed penalty from doing so only applies to code using the nothrow_integer type. This also preserves the no-exceptions behavior...."

_____
I've got a great idea! Why don't you do X?

As phrased, this was probably good while you were developing XInt for Boost inclusion. Now that you're setting up XInt as a Boost library, this is no longer appropriate. The correct response is to submit a Trac ticket requesting the feature, possibly after asking for and discussing the feature on the Boost Users or Developers mailing lists.

__________
Revision History

I don't see the value of this in a Boost library's documentation. If XInt is accepted, such history will be part of Trac and will be excerpted into the Boost release notes.

__________
n/a

s/--/&mdash;/ throughout

__________
Class List

Note all classes have a description. Even the exception classes would do well to have a brief description.

_____
divide_by_zero

It appears you've written this class with a single constructor. I think providing a default constructor and one taking a non-default string can allow for more efficient code. If there are many cases of using the default value in the existing ctor, the compiler will insert string construction code throughout the library and, therefore, application. With a default ctor that constructs the std::invalid_argument base subject with "divide by zero error", the compiler may choose to not inline that ctor so each use will refer to a single implementation of the string construction code.

_____
integer_t

"This class implements the standard arbitrary-length integer type" suggests std::integer_t rather than boost::xint::integer_t. That is, the use of "standard" is confusing.

The description is incomplete in that it says that "most of [the template] parameters must be specified via helper classes." There are two problems here. First, the declaration shown is "template <...>" so the list of template parameters is hidden. Second, what are the template parameters that are not "specified via helper classes defined in the boost::xint::options namespace?"

In the string-based constructor descriptions, s/base-/base /.

_____
integer_t(const integer_t<...> & b, bool force_thread_safety = false)

The description is "This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts." Doxygen offers no guarantee about the order in which member functions are listed. Indeed, in this case, the constructor that appears above this one is the default constructor. Even were the documentation order as expected, a user that wishes to understand this particular constructor must find the other to which you refer in order to get a proper description. Please duplicate all such descriptions rather than rely on such indirection.

The name of the formal parameter is baffling. I often use "that" or "rhs" for copy constructors (though I also use a leading underscore), but even without going in those directions, how about "i?"

_____
integer_t(BOOST_XINT_RV_REF(type) b)

There is no description for this constructor. (The parameter name comment applies here, too, of course.)

_____
integer_t::hex_digits(), is_even(), is_odd(), sign()

The description states that, "The nothrow version returns zero instead of throwing." There's no mention of any exception being thrown. Under what circumstances might that happen? Does "the nothrow version" mean when integer_t is instantiated with xint::options::nothrow? It would be better to say so explicitly rather than leave the reader to wonder if "nothrow" might mean an empty throw specification.

__________
boost::xint::options Namespace

The classes with a brief description starting with "Make" should use "Makes" to be consistent with the rest. (See fixedlength, for example.)

_____
negative_absolute

This name is odd. I'd have expected "absolute_values" or something of that sort. It looks like you tried to create a family of options all named with the "negative_" prefix. I think negative_handling<option> would be a better way to handle that.

_____
negative_exception

The brief description is, I think, wrong: s/unsigned/negative/.

_____
negative_zero

"negative_to_zero" would be more descriptive.

_____
nothrow

Using this option doesn't disable exceptions from the library so much as disable exceptions when using an instance specified with that option. Therefore, change the brief description to something like, "Disables exceptions."

_____
threadsafe

The short description breaks the pattern. It should be more like, "Makes an integer type that is safe to use in multiple threads."

_____
Detailed Description

Why is integer_t's allocator being described in the namespace documentation?

> - What is your evaluation of the potential usefulness of the library?

Extremely useful.

> - Did you try to use the library? With what compiler? Did you have any problems?

No.

> - How much effort did you put into your evaluation?

I spent several hours digging through the documentation and looking through the code.

> - Are you knowledgeable about the problem domain?

No.

_____
Rob Stewart robert.stewart_at_[hidden]
Software Engineer, Core Software using std::disclaimer;
Susquehanna International Group, LLP http://www.sig.com

IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk