Boost logo

Boost :

Subject: Re: [boost] [xint] Boost.XInt formal review
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2011-03-03 08:46:41


On 03/03/2011 02:28, Chad Nelson wrote:

> I don't recall the reason for passing by value offhand, but I believe
> Boost.Move was part of it. I'll research that before the next update.

You quote Dave's "Want Speed? Pass by value" article as motivation.

Yet it feels like the conclusion of that article, which can be summed up
as the following, wasn't understood:

If you intend on making a copy of an argument on automatic storage,
prefer letting the compiler do that by passing by value rather than
passing by const-reference and doing the copy yourself.

> Eliminating sixteen bits of overhead, on an integer that is almost
> always going to be larger than 64 bits (and often *much* larger), seems
> to be a premature optimization at best.

Consider a language like Python. All integers can be arbitrarily large,
yet in practice, they rarely are.

But then, that's one application.
Can your library allow me to choose a memory layout optimal for the
application of my choosing?
Probably not. How about you let me choose my own layout and let me
interface with your algorithms?

>> - from what I can see, the statement
>> xint::integer b = a0 + a1 + ... + aN;
>> requires n copies of the underlying memory buffer, even though it is
>> possible to do zero copy without even using expression templates.
>
> It could be optimized for such statements, yes. Can you provide a
> real-world case where more than two or three integers would be summed
> like that

It's just a degenerated example easy to play with, since it involves a
single function, N+1 variables, and does N-1 unneeded copies.

And yes, people don't simply do binary operations and store them into
variables each time, they even *combine* operations and even create
temporaries.

> and is common enough to justify the extra work and code
> required for it?

There is no extra code required, on the contrary, it requires *less
code* than your current implementation.

Like I said in a piece you didn't quote, if you implement operator+ in
terms of operator+= using NRVO or move semantics you can get no copy
going on.

Now if you want to make it work with expression templates, that would be
even cooler.
This would solve certain issues with the design of algorithms that work
with arbitrary types too.

>> - unary operator+, operator++ and operator-- return a reference to
>> the object. I don't think that's a good idea, since regular integers
>> don't behave like this.
>
> <http://stackoverflow.com/questions/465517/overloaded-increments-return-value>

My bad, ++i with i an int does return an lvalue and not an rvalue.

>> - I would quite like to get an interface to iterate the integer digit
>> per digit or byte per byte.
>
> Why?

- So that the library is easily extendable
- So that the algorithms only require a range of digits as input if
   that's all they need, allowing them to work with arbitrary types
   of my choosing without having to copy or convert to your types

The main drawback of the library I can see for now is clear lack of
genericity.
For me to like it, it needs concepts, generic algorithms, and the
ability to interwork with other types through retroactive modeling.

There is too much coupling between the algorithms and the data, since I
have to use your data container. That forces me to use some specific
layout that may not be good for my use case, and to copy over the data
multiple times when converting from/to your types even though that might
be entirely unnecessary.

As far as I can see (I don't know too much about the algorithms behind
that kind of library), any range could be treated as a big integer, and
all your algorithms are intrinsically capable of dealing with all of them.

> If you need it for exchanging information with another system or
> another program, the to_binary function should do the trick. If you're
> looking to make changes to the internals of an integer_t without going
> through its interface, it's not going to happen.

If it's not extendable, nor generic, nor interoperable, what does it
bring that other existing bigint libraries do not?
The boost logo and license?

There are big players in the field, and if Boost accepts a library that
does something like that, I think it should have significant advantages
over the competition.

Imagine something similar to the mpn component of the GMP library, but
with a generic and easy-to-use C++ interface.
Through genericity, you could make the algorithms accept ranges of
integers and have them work regardless of digit size, with the digit
chosen at the discretion of the user. If the user provides a SIMD type
as a digit, you'll get optimized SIMD code for free, for example, in a
completely externalized way.
Now *that* would be awesome.

Something that doesn't seem much better than the high-level C++
interface provided for the mpz component of GMP? Not so much.

>> - XInt cannot do SBO
>
> A large-integer math library would be able to take advantage of SBO
> only rarely, and every number object would have to carry around the
> overhead, at least in all the SBO designs I've ever seen. If you were
> complaining about sixteen bits of sometimes-unneeded overhead, I'd hate
> to see how you reacted to, for instance, 128.

"sometimes-unneeded"? COW is disabled by default, and should never be
used anyway, since it is a hack that only makes sense in a language
without move semantics.
And this is mostly a problem due to bad code design, there is no reason
why you couldn't be rid of those two words in the non-COW case, and why
you couldn't be rid of the buffer size in the fixed-width case.

See, if data and algorithms were clearly separated, it would be much easier.

Using SBO can make sense, but making objects unnecessarily large with
data that is never used is just a bug.

>> - Fixed-width integers still store the size etc. Ideally,
>> xint::integer<fixed_size<32> > should have no more overhead than
>> int32_t.
>
> If the design were solely or even primarily fixed-width, I'd agree. As
> it's first and foremost a variable-width system, and the fixed-width
> behavior is secondary, I believe writing and maintaining code
> specifically to eliminate a few extra bits for fixed-width integers
> would not be justified.

Arbitrary-sized yet fixed integers and big integers have enough code in
common that it would be a shame to design a modern library without
thinking of how both can play into its algorithms.

Something else, I went through a code a bit, and I saw virtual
inheritance. Just why do you need dynamic dispatch in a design where all
information is purely static?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk