Boost logo

Boost :

Subject: Re: [boost] [xint] Boost.XInt formal review
From: Jeffrey Lee Hellrung, Jr. (jhellrung_at_[hidden])
Date: 2011-03-04 04:18:53


On 3/3/2011 10:02 PM, Chad Nelson wrote:
> On Thu, 03 Mar 2011 09:07:12 -0800
> "Jeffrey Lee Hellrung, Jr."<jhellrung_at_[hidden]> wrote:
>
>> On 3/3/2011 6:57 AM, Chad Nelson wrote:
>>
>>> And I thought I had answered it, by moving almost all of the
>>> functions out of the integer_t class. Apparently I misunderstood
>>> your intent, though it's something I'd have wanted to do eventually
>>> anyway.
>>
>> What difference, from the point-of-view of the interface or from the
>> user, does it make if the algorithms are implemented in the integer_t
>> class scope or a detail namespace scope? It seems you misunderstood
>> the desire for decoupling the algorithms from the data structure.
>
> Yes, apparently I did. I wish you'd said something about it when I
> asked for the last preliminary review.

Well all I can say is...I'm not the only one that didn't say something...

In any case, I had thought the intent was clear, based on

http://lists.boost.org/Archives/boost/2010/05/166061.php

and

http://lists.boost.org/Archives/boost/2010/05/166068.php

among other posts.

> It's rather late to make huge
> changes to the library at this point.

I shouldn't think so. Do you feel this way because the library is
currently under review?

>> See also Mathias' email regarding "genericity"; I agree with his
>> evaluation in this regard and won't repeat it.
>
> 'Fraid I have to disagree, at least for now. I'll expand on this in
> replies to the later messages that I see waiting for me.

Okay, looking forward to it.

>> Is this decoupling the "something [you]'d have wanted to do
>> eventually anyway", or are you referring to something else?
>
> I was referring to moving almost everything out of the integer_t class.

Okay, well I guess you've done that, although I'm not sure what it has
afforded us.

>>>> 2) In a similar vein, I would like to see the cryptographic
>>>> functionality decoupled from the xint::integer_t data structure as
>>>> well, and released separately.
>>>
>>> That's one of the most common reasons that people want to use a
>>> large-integer library. Separating those functions from the library
>>> would make it that much harder for potential users to get everything
>>> working.
>>
>> Can you be more specific on *how* it would be "much harder", from the
>> user's perspective?
>
> Not "much harder," but "that much harder." And no, I can't at present,
> as it would depend on exactly how the separate functions are made
> available.

I contend it wouldn't be any harder than using, say, Boost.MPL together
with Boost.TypeTraits. I.e., if you design each library with
interoperability with outside modules in mind, it should be little problem.

[...]
>>>> * http://www.oakcircle.com/xint_docs/namespaceboost_1_1xint.html
>>>>
>>>> integer_t<...> square (const integer_t<...> n)
>>>>
>>>> The complexity is just O(n^2). If you want to more precise, don't
>>>> use O(...), just state the number of additions/subtractions and
>>>> multiplications.
>>>
>>> Are you sure about that complexity? That function doesn't just
>>> multiply a number by itself, it uses an algorithm explicitly
>>> designed for squaring a number, which is more efficient. I think the
>>> complexity on that one is correct.
>>
>> Ummm...yes, the stated complexity is correct, but O(n(n+1)/2) =
>> O(n^2).
>>
>> So asymptotically speaking, it's not any faster than using the naive
>> grade-school multiplication algorithm. Like I said, unless you want
>> to give a precise operation count, it's just O(n^2).
>
> Um... if you plug a concrete number into both equations, the answers
> are quite different.

Um... I'm not really sure how to respond to this. It's not a huge deal
as far as the documentation is concerned, but if you *really* think that
O(n^2) and O(n*(n+1)/2) are different guys, then I don't have a lot of
confidence in the rest of your (as-yet-unchecked) complexity claims...

(And, for what it's worth, Big-Theta statements would be more precise,
where they apply, but that's often how Big-Oh is practically interpreted
anyway.)

>>>> integer_t<...> pow (const integer_t<...> n, const integer_t<...>
>>>> e)
>>>>
>>>> I suspect this complexity is, really, O( # of bits in the result ),
>>>> as the last one or two multiplies are what dominate.
>>>
>>> I'm pretty sure the complexity on that one is correct as well.
>>
>> Yeah, I don't think I'm right, either, but let's analyze this...
>>
>> The asymptotic work in your loop in detail::pow is the same as the
>> following loop (we'll assume that e>>= 1 is a constant-time
>> operation):
>>
>> for(; e != 0; e>>= 1)
>> p = square(p);
>
> Okay so far.
>
>> Let p have np bits. Each successive loop iteration *doubles* the
>> number of bits in p, so the work for each successive iteration is
>> np^2, (2np)^2, (4np)^2, ..., ((2^ne)*np)^2 (there are ne loop
>> iterations, where ne is the number of bits in e).
>
> As far as I can tell, that's correct. ((2^ne)*np)^2 seems to describe
> the total amount of work done.

Well, in the context of above, it's only the amount of work done for the
last iteration, but it's that last iteration that dominates.

>> The total work is then
>>
>> (1 + 4 + 4^2 + ... + 4^ne) * np^2 = O(4^ne * np^2) = O(e^2 * np^2)
>
> I followed that all the way to the last step, then I got lost.

Here I was just adding up the work done over all ne iterations. In any
case, like I said, the last iteration dominates.

> If...
>
> O(4^ne * np^2) == O(e^2 * np^2)
>
> ...then logically...
>
> 4^ne == e^2
>
> ...but if you plug in concrete numbers, that doesn't work out. If e is
> 0x800 (2048), then...
>
> ne == 12
> 4^12 == 16777216
> 2048^2 == 4194304
>
> ...and 16777216 is not equal to 4194304.

Oh, I see where you're confused. Fortunately, log2(2048) = 11, not 12
:) So that'll bump your 16777216 down by a factor of 4, which should be
just right.

By the way, ne here *must* be log2(e), so literally the number of bits
in e, not the number of chunks, because the number of iterations is
precisely log2(e). You can fudge it with additive constants (e.g., + or
- 1 doesn't matter), but not multiplicative constants (e.g., by using a
different base in the logarithm).

>> By my count, you're drastically underestimating the complexity ;)
>> Let me know if you agree with the above analysis, I could be wrong.
>
> I see that my complexity estimate is much lower than it should be. I
> didn't take into account that the number of bits it deals with doubles
> in each iteration. It looks like 4^ne should replace ne in my estimate,
> would you agree?

Uh, e^2 I think would be clearer, but otherwise, yes.

[...]
>>>> I believe there is an O(n) square_root algorithm that would be
>>>> asymptotically faster, but...I'd have to look it up...
>>>
>>> That's the best one that I was able to come up with in the time I
>>> had. If you can point me to a good description of a more efficient
>>> one, I'll be happy to implement it.
>>
>> I believe this is what I was thinking of, though the description is a
>> bit terse:
>>
>> http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Digit-by-digit_calculation
>
> I don't have time to dig into that in detail tonight, but it looks very
> much like a binary search algorithm, very similar to what I'm already
> using.

In binary, no search needed! See the snippet of code. I, too, don't
have time at the moment to study it in detail, but I was at one time
very familiar with it. And since it only requires bit-shifts and
additions (no multiplications), I wouldn't be surprised if it were
significantly faster than Newton iteration.

>>>> int is_prime (const integer_t<...> n, callback_t
>>>> callback=no_callback)
>>>>
>>>> Given the documentation, I think a more descriptive name, such as
>>>> is_probably_prime_rabin_miller, would be better.
>>>
>>> Maybe, but a lot more typing for an arguable benefit. The
>>> documentation specifies its limitations, I believe that's sufficient.
>>
>> The "arguable" benefits are:
>> - the name is no longer misleading; is_prime doesn't actually check
>> if its input is prime; and
>> - it is clear in what sense the input is determined to be probably
>> prime.
>
> It *does* check whether the input is prime, with a vanishingly small
> chance that it is mistaken.

Lol...okay, I'm glad the Boost.TypeTraits functions don't have a similar
caveat.

>> Again, think about someone *reading* code that uses this is_prime
>> function.
>
> He would expect that it reports whether the input is prime or not,
> which is exactly the case.

Well, no, it reports "whether the input is prime, with a vanishingly
small chance that it is mistaken", which is different. To me. Sorry,
as a mathematician by training, prime and probably prime are distinct
properties and have very precise meanings. But I'm not a cryptographer,
so maybe that's just the lingo.

> With a name like
> is_probably_prime_rabin_miller, he would probably feel obliged to look
> up the Rabin-Miller primality test and figure out what "probably" means
> in this context (at least, I would). I think the shorter name is
> preferable for the reader, as it accurately conveys the author's intent.

Like I said, my crypto is pretty...well...non-existent... So this might
be a dumb question, but am I to assume that whenever one wants to know
that a number is prime, *probably* prime is perfectly okay?

[...]
>> Besides, COW isn't on by default...
>
> It's used internally regardless, as I control everything within the
> library and can ensure that it's used safely. The thread-safe option
> only prevents it for integers returned to the user.

Whoa, wait a second here. I can't help but see a red flag here, so
color me incredulous and possibly very naive. How can copy-on-write
help you internally where just moving the internal data isn't possible?
  Can you give an example of the internals where you find copy-on-write
necessary?

>>>> I'm also not a fan of the integer_t::integer_t(const integer_t& b,
>>>> bool force_thread_safety) constructor; isn't there already an
>>>> "ensure_unique" member function already there (in the
>>>> implementation) that ensures an integer_t object has sole ownership
>>>> of its storage? Indeed, looking more at some source files, I would
>>>> guess this is _make_unique. I would consider such a member
>>>> function preferable to this constructor overload, especially given
>>>> how the mysterious boolean parameter looks in client code (see
>>>> integer_t::sign comment above).
>>>
>>> Please see the aforementioned "The threadsafe Template Parameter"
>>> page. That Boolean parameter will be used so rarely that I don't
>>> think it's unreasonable to ask the reader of the client code to look
>>> it up if he runs across it.
>>
>> That page, or your explanation here, doesn't justify why you think
>> this constructor overload is preferable to a make_unique member
>> function. Sorry, "because that's the way it is now" doesn't count ;)
>
> I had a publicly-available make_unique function in an earlier iteration
> of the library, and was criticized for it, so I removed it. I don't
> remember if it was you personally doing the criticizing, but
> you-the-Boost-community can't have it both ways. :-)

Well this tagged copy constructor pretty much amounts to the same thing,
doesn't it? Only less efficient, since it will actually perform a deep
copy in situations where it's not needed, e.g., when the object is
already in the thread you want it to be owned in. And more cryptic in
client code. By the way, is there ever any reason to set the boolean
force_thread_safety to false?

[...]

- Jeff


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk