|
Boost : |
Subject: Re: [boost] [xint] Boost.XInt formal review
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-03-07 02:13:03
On Sun, 06 Mar 2011 18:35:47 -0800
"Jeffrey Lee Hellrung, Jr." <jhellrung_at_[hidden]> wrote:
>> XInt's existing interface could be frozen at its current state with
>> no problem. Even the removal of the copy-on-write stuff would only
>> result in one function having an unused Boolean parameter. Any
>> additional functionality can easily be added in a way that doesn't
>> break existing client code.
>
> The removal of the COW stuff would result in a huge interface change,
> as it implies a particular usage of xint::integer_t objects. [...]
It might be a change to how people use it, but that isn't a change to
the interface. Code that relies on the copy-on-write behavior may not
work as fast without it, but it *would* still compile and run with no
problem. That's an acceptable cost, in my opinion.
>>> - is_odd, is_even, sign, hex_digits, is_nan: These are all member
>>> functions with no same-named free function...which makes me wonder
>>> why these were chosen to be member functions while the likes of
>>> getbit, setbit, is_prime, etc. were chosen to be strictly free
>>> functions.
>>
>> Because n1692, the paper presented to the C++ Standards Committee in
>> 2004 that I based XInt's interface on, has them there. I saw no point
>> in cluttering the interface by adding free functions that just call
>> those, although if that would satisfy this objection, it can easily
>> be done.
>
> I see. But you understand that their presence as member functions
> makes one wonder about the asymmetry, right? I don't see why is_odd
> is any more fundamental than getbit; indeed, getbit would seem to be
> more fundamental than is_odd or is_even, no?
It's only fundamental because that paper specified it. I'm not
particularly attached to that decision; my only purpose in using that
interface was to have drop-in compatibility with any code that might
have already been written to use it. I have yet to hear of any such
code.
>>> In any case, as far as I know (that is, Chad hasn't given me a
>>> specific example), XInt doesn't use the parameter deduction feature
>>> of Boost.Parameter,
>>
>> integer_t_data should clear that up. If I understood your explanation
>> of what you meant, I don't think the class could accept an allocator
>> in any arbitrary position without it, though I could well be wrong.
>
> I see now. It doesn't look like you're using the keyword and tag
> structure outlined here:
>
> http://www.boost.org/doc/libs/1_46_0/libs/parameter/doc/html/index.html#parameter-enabled-class-templates
>
> Could you explain why not?
As I recall, I tried that. I don't remember why I decided to change it,
just that there was a compelling reason. I believe it had something to
do with the fact that most of the parameters are simple Booleans --
their mere existence conveys all the information they need, forcing
someone to specify true or false as well seemed pretty inelegant when
there was an easy way to avoid it.
> Using Boost.Parameter-style keywords and tags might obviate the need
> for most of the parameter deduction infrastructure you have at line
> 220 in options.hpp.
It might, but why would I want to? That's a powerful feature, and
requiring a slight change to a few pieces of code that might use a
different BOOST_PARAMETER_MAX_ARITY setting is a small price to pay for
it.
>>> The random number stuff and the cryptographic-specific stuff should
>>> not be included within XInt. These should be left to other libraries
>>> (Boost.Random in the former case).
>>
>> It is. Of the two random-number classes in the code, one is just a
>> wrapper around a fast Boost.Random class that provides a default
>> value for convenience, the other implements the same abilities as
>> Boost.Random's random_device class in a form that's slightly easier
>> to work with in the examples (Boost.Random couldn't handle that at
>> all under Windows when I originally wrote it). Client code must
>> specify a random number generator for the few (two, I think)
>> functions that can use one, and they work equally well with
>> Boost.Random's classes.
>
> Are these localized to the examples, then, and hence not included by
> default?
It can and probably should be. It isn't at present, because they used
to be needed as defaults and I hadn't though to remove them yet. I'll
put it on the to-do list.
>> I'm not wedded to the default_random_generator. It's no longer needed
>> by any code in the library, and could be deleted without any other
>> changes. But the strong_random_generator is significantly more
>> convenient for the examples, and somewhat more convenient for casual
>> client code use, as it doesn't require a compiled library to use as
>> Boost.Random's does.
>
> Perhaps you should suggest adding strong_random_generator to
> Boost.Random library, then.
It already has random_device for that purpose. I understand why Steve
Watanabe chose to implement it in a compiled library instead of the
header-only one, but it makes it harder to use it in another library's
example code.
>> The random number functions in the library are there to make it
>> easier to generate large random numbers, which of course
>> Boost.Random can't natively support. That's a legitimate
>> mathematical purpose, one that would have to be reimplemented by a
>> significant number of programs using the library if it were removed.
>
> Okay, I understand. Do you basically just need to fill a range of
> digits with random numbers?
That, setting the highest requested bit (often desired to get the exact
size the developer wants), and clearing any bits above that, are the
most important parts. Fiddly bit twiddling that you or I might be
comfortable with, but most casual developers aren't.
> Perhaps this is something that Boost.Random can make easier (although
> it isn't too hard at present).
It could possibly be modified to help with the first step, not really
with the rest unless Steve wants to add XInt-specific functions.
>>> The fixed-size integers leave something to be desired, but as Chad
>>> himself has emphasized, they are second-class citizens. So-called
>>> second-class citizens probably should not included within the
>>> library, and if they are, not without huge warnings saying something
>>> like "EXPERIMENTAL" or "SUBJECT TO CHANGE". [...]
>>
>> Eh? Only the internals of them are subject to change. The current
>> interface can be preserved without any loss of functionality, and
>> added onto without breaking anything.
>
> I guess I kind of expected a breaking change if you added a
> 2's-complement-based implementation, which would seem reasonable.
> [...]
As I envision it, two's-complement would be an added option. The
interface wouldn't need to change at all for existing code.
> Perhaps the fixed integer stuff *should* actually be separated into a
> different submodule altogether? I don't know, I seem to remember
> that's how you had it originally, but there were arguments in favor of
> merging everything into one integer_t class...?
Exactly.
>>> Chad already knows my feelings on COW, but to be explicit: I'm not
>>> necessarily against use of COW in general, but I have yet to be
>>> given a use case where COW would outperform move emulation in
>>> theory.
>>
>> I've located two: abs and negate. I believe those were the functions
>> that originally prompted me to use it. I'm sorry that I didn't
>> mention them before; I knew I had a reason for going to all the
>> trouble of using it, but I didn't recall what that reason was until
>> I saw those while working with the code today.
>>
>> Think about it. Both involve changes to the sign bit only, and one
>> only sometimes needs to alter it. It would be foolish to deep-copy a
>> ten-thousand-bit or larger number just to flip the sign bit. Even
>> worse, not to do anything to it at all, in the case of abs and
>> already-positive numbers.
>
> Great example. Abs and negate can just return appropriate views
> layered on top of the object at *less* cost than COW. [...]
If you go back and look at some of the earlier versions (I think you
can get them through the sandbox, otherwise I can provide them from
my own repository), you'll see that I've tried that too. I don't
remember why I abandoned the idea, but I do recall that the results
weren't pretty, which would certainly have contributed to the decision.
>> The only concrete complaint that people could give me about CoW was
>> that code using it couldn't be made thread-safe, which is true. So I
>> made a way to instruct the library to force all integer objects to
>> have unique storage before they're returned to user code, and made
>> it the default to prevent unpleasant client-code surprises. The
>> internal code is single-threaded, and completely under my control in
>> any case if I find it useful to make it multithreaded someday, so
>> that answered the thread-safety complaint.
>>
>> The only other complaints people had was that they didn't like it,
>> which is not a valid reason to change it, and that move-emulation
>> makes it irrelevant, which it doesn't in this case.
>
> It's not thread-safe without atomic operations (maybe adds overhead
> in single-threaded environments?);
If you can show me one place where the existing design isn't
thread-safe when used with the threadsafe option, I'll eat my keyboard,
without salt. I put a lot of careful thought into it, and barring an
implementation bug like me forgetting to call _make_unique in a
function I should, I'm willing to swear that it's bulletproof.
> it adds space and time overhead (probably negligible except on small
> integers);
True enough.
> it complicates the implementation (subjective);
True, but so long as I (or other developers hacking around on XInt,
rather than just using it) are the only ones dealing with that, I don't
think that should matter.
> as far as I've seen, it is *still* unnecessary, in the sense that it
> doesn't give any real performance benefits;
That's still undecided.
> and it encourages a style of programming incompatible with the
> standard containers (entirely subjective whether that's good or bad,
> I admit).
If you count std::string as a container, then it's not incompatible
with all implementations of them. :-)
> Further, even if COW has some benefits over a non-COW implementation,
> it should probably be built as another layer on top of a non-COW
> implementation, as you can't go the other way around. COW shouldn't
> be used at the lowest implementation level.
Possibly. If CoW survives the tests after I've finished updating the
code, we can debate that.
>>> The "fast" claim on the primary documentation page is
>>> unsubstantiated, and, according to Chad, somewhat unsubstantiatable
>>> (assuming that's a word). I don't think any claims about
>>> performance should be made *at all* without some kind of benchmark
>>> or statistics to justify it. Any library will be fast on a
>>> sufficiently fast (perhaps idealized) computer.
>>
>> As I've said before, I'm not claiming that it's "faster" or
>> "fastest," which are legitimately questionable claims. I'm saying
>> that it's fast, as a way to conversationally lead in to the
>> explanation that portability is more important, but reassure the
>> reader that speed is still an important design criteria.
>
> Simply put, it is a vacuous claim without further justification.
I've just realized why I keep butting heads with people here over that
line. Most of you guys probably work for companies that are big enough
that you never have to step outside of coding, or you're employed in
another field and program as a hobby. But for many years now I've
worked for a (very) small software company. We've never had more than
three people on the payroll at once, and most of the odd jobs fall to
me -- including marketing. It didn't come naturally to me, I was
dragged into learning it kicking and screaming and I don't claim to be
particularly good at it, but I've done it long enough that it's second
nature now.
Before you say that marketing doesn't belong in the documentation for a
Boost library, step back and think about it. When a developer needs a
library, what does he do? If he's like most, he has three steps:
* He compiles a list of the libraries that sound like they might do
what he's looking for.
* If there are more than a handful, he then glances at each one quickly
and throw out the ones that obviously aren't suitable. (This might
be combined with the first step.)
* He settles down to seriously evaluate the handful that survived that
winnowing.
It's essentially the same as hiring people for a job. If you're a
business owner and you advertise a position, you collect resumes. If
you get more than a handful, you quickly glance at each looking for
excuses to throw them away -- this guy doesn't know Java, that one
claims BASIC as his sole development qualification, etcetera -- getting
ever pickier, until you're down to a manageable level. Then you
interview those that survived. Candidates will stand or fall on their
own merits in the third step, the trick is surviving to get that far.
That whole introductory section is aimed at doing just that. The
keywords in it will catch the attention of search engines, which will
then be more likely to mention it to people searching for such
libraries. The first thing they'll do is glance through the index page
of the documentation, which will immediately show them what the library
is intended for, and hopefully put it on their list. During the
winnowing, it tells them everything they need at a glance; if it's not
suitable, they can quickly drop it, but if it is, it gives them enough
information to keep it in the running. The rest of the documentation is
for the third step.
My goals are to make the library both portable and fast, with
portability taking precedence where there's a conflict. That specific
line is a non-technical way to convey that goal to my target audience,
developers looking for a quick and easy way to do large-integer
calculations in a C++ program. Mostly casual developers, because that's
the vast majority of developers out there, but also professionals like
myself who need it for commercial projects.
Call it fluff if you'd like, but as much as we might wish otherwise,
marketing is necessary. Especially in a field like XInt's, where there
are tons of low-quality hobby libraries and only a few free,
high-quality ones. I want it to be used, and with that section it has a
fighting chance.
-- Chad Nelson Oak Circle Software, Inc. * * *
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk