Boost logo

Boost :

Subject: Re: [boost] Large Integer Library
From: Michael Tryhorn (mjtryhorn_at_[hidden])
Date: 2012-07-02 15:54:37


Phil/everyone,

> It would have been great if you could have joined the discussion a couple
> of weeks ago.

I wish I had!

> The storage of these types is just the actual numeric data, so I should be
> able to do binary I/O on them etc., which is good. I would perhaps be a
bit
> more confident of this if it were just an array of int32s or int64s,
rather
> than the current recursive organisation.

I understand your trepidation, but really there should be no reason why
binary
I/O would not work provided that the bottom-level types are built-in
integers
or an equivalent, of the wordsize or greater. If smaller integer types are
mixed-in, the compiler could add padding. There is also the question of big
or little endian, which will affect the needed declaration order.

> You're using 2's complement, which I believe is the right choice.

Agreed. It both makes the calculations simpler and makes large_int closer
to
the built-in types.

> The simplicity of the code means it could be relatively bug-free, if
we're lucky.

This is my goal in whatever I write, as frankly I don't like testing!
However,
Boost.LargeInt has a complete test suite and I have tried to attain 100%
coverage
therein.

> Bitwise multiplication and division is completely inappropriate.

Agreed, but used for lack of knowledge of a better alternative. I'd be
happy
to collaborate with anyone more experienced in this particular area.

> You have an idiom of working on a temporary and them copying the result to
> *this to "commit" it ...

Actually, incredulous as it may seem, at high levels of optimisation ('O3'
in
GCC), this sort of thing *can* produce faster code. My guess is that it's
to
do with registers vs. stack, but without picking through the assembler I
cannot
be sure. It won't at lower optimization levels, however.

But, I didn't add it for that. I added it for the improved exception-safety
where non built-in types are used as the high and low parts of a large_int.
With copy-and-commit, if any operator other than assignment throws an
exception, the value of the underlying large_int would not change. As
everyone here will no doubt know: built-in integers don't throw, so I may
well
revert to a non copy-and-commit style here.

> Consider a 256-bit int implemented using 64-bit built-in ints via inter-
> mediate 128-bit ints ... All of second half of that is redundant once you
> know that the least significant 64 bits incremented to a non-zero value
...
> if the implementation were just an array of 4 (u)int64_ts,
> you would iterate with a for loop and break when non-zero.

This is another thing that would have to be tested. Unfortunately it's
nigh-on
impossible to know whether a version with or without more branching ('if' or
'for') would be faster without trying it, and even then on multiple target
architectures.

> It would be great to see "hooks" where assembler implementations
> could be inserted.

Nice idea. This could be done in a similar manner to Boost.Multiprecision's
front and back-end implementation. However it would have to be a
templatized,
compile-time decision for the programmer in order to retain the binary-
compatibility with built-in integer types.

> It would be interesting to know if your code could function as a "back
end"
> to the proposed Boost.Multiprecision, using that library's expression
> templates to eliminate temporaries. There are two things that could be
> learnt from that: does that help performance, and does the Multiprecision
> front/backend split work?

I would have to study Boost.Multiprecision in detail for this.
But yes, interesting idea.

Thanks,

Mike

On 2 July 2012 19:43, Michael Tryhorn <mjtryhorn_at_[hidden]> wrote:

> John, Chris, and anyone else interested,
>
> Firstly, to answer a question asked by Marc Glisse, the 'l' prefix on all
> l[u]int types defined by large_int/def.hpp is included to distinguish the
> potentially large_int based types from those provided by cstdint or
> equivalent.
> So, these 'l' prefixed types could be imported into the current or global
> namespace without interfering with any current or future built-in
> equivalents.
> I'm not a fan of 'using namespace x::y::z;', but I know some people do use
> it.
>
> Now onto testing. I've tried the given example code upon my local test
> machine and can corroborate your results. I'm impressed - the performance
> of
> operator* at least is astounding (even for 1024bit integers). I have tried
> tweaking my classes here-and-there, and have even managed to halve the
> time it
> takes large_int::operator* to run (by adding special 'left_shift_by_one'
> and
> 'right_shift_by_one' private operations). I agree that what is needed is a
> replacement of the algorihms rather than their tweaking, as the growth rate
> against number-of-bits is rather high.
>
> It seems that other operators like construction/destruction operator+ and
> operator- are comparable on speed. This is approximate, but I observed
> large_int being quicker on simple operations like '+', '-' and so on at
> 128bit,
> equivalent at 256bit and slower at 1024. But, considering the overall
> speed of
> both Boost.LargeInt and Boost.Multiprecision in this respect, I doubt that
> either would ever be seen as a bottleneck in any real application.
>
> However, I do now have a couple of genuine points of difference between
> Boost.Multiprecision and Boost.LargeInt as it is now:
>
> * Object sizes - you seem to be adding a pointer to all integers
> (presumably
> as a link between the front and back ends). I can understand the need
> for
> it, but this does increase the required memory to store arrays of them.
> As
> boost::large_int::large_int classes are fully concrete and only ever
> consist
> of their two internal parts, they have the same size as what would be
> their
> built-in equivalent. Meaning, sizeof( int128_t ) == sizeof( lint128_t ),
> unless the compiler decides to add class padding but I've never seen
> padding
> added to a large_int object.
>
> * This also leads onto forward compatibility and interoperability. If we
> were
> to serialize a large_int instance, it should have the same content as its
> equivalent built-in type. This aids transparency of use, although it is
> currently only true upon little-endian machines. Big-endian support
> could
> be added by swapping the internal m_hi and m_lo large_int data members
> and
> adding some 'ntohl'/'htonl' equivalent free-functions in the
> boost::large_int
> namespace.
>
> In conclusion, it is obvious that we are in agreement that support for
> arbitary sized integers for C++ is required. I still believe that
> Boost.LargeInt is a good contender. It is has both the above points and an
> elegant design (even if I do say so myself), and would just need
> improvements
> to its integral division and multiplication. Unfortunately, that is
> beyond my
> expertise, hence the current "simple but it works" solution.
>
> If you have any suggestions on how this could be improved or merged into
> existing work, I would be more than happy to collaborate. Any further
> questions or comments would of course be appreciated.
>
> (Phil, I've seen your email and I will respond to it separately.)
>
> Thanks,
>
> Michael J. Tryhorn
>
> P.S. I couldn't get the above testing code to run on either of my local
> 64bit machines, and had to resort to an old 32bit AMD machine which is
> slowly
> gathering dust. This is after inserting Boost.Multiprecision into Boost
> v1.50.0.
>
> System: Ubuntu 11.10 (64-bit)
> Intel® Core™2 Quad CPU Q6600 @ 2.40GHz × 4
> Compiler: gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
> Command: g++ -Wall -ggdb -O3 -march=native -I<snip>/include -L<snip>/lib \
> -o lint_tst lint_tst.cpp -Wl,-Bstatic -lboost_chrono -lboost_system \
> -Wl,-Bdynamic -lrt
> Output:
> 32 bit time = 0.00177922 seconds
> 64 bit time = 0.00188884 seconds
> lint_tst: <snip>/cpp_int.hpp:1090: void
> boost::multiprecision::backends::eval_multiply(boost::multiprecision::backends::cpp_int_backend<MinBits,
> Signed, Allocator>&, const
> boost::multiprecision::backends::cpp_int_backend<MinBits, Signed,
> Allocator>&, const
> boost::multiprecision::backends::cpp_int_backend<MinBits, Signed,
> Allocator>&) [with unsigned int MinBits = 128u, bool Signed = true,
> Allocator = void]: Assertion
> `(std::numeric_limits<double_limb_type>::max)() - carry >
> static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed,
> Allocator>::max_limb_value) *
> static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed,
> Allocator>::max_limb_value)' failed.
>
> System: openSUSE 12.1 (64-bit)
> Intel® Xeon® CPU E5410 @ 2.33GHz × 4
> Compiler: gcc (SUSE Linux) 4.6.2
> Command: [same as above]
> Output:
> 32 bit time = 0.00183981 seconds
> 64 bit time = 0.00189349 seconds
> lint_tst: <snip>/cpp_int.hpp:1090: void
> boost::multiprecision::backends::eval_multiply(boost::multiprecision::backends::cpp_int_backend<MinBits,
> Signed, Allocator>&, const
> boost::multiprecision::backends::cpp_int_backend<MinBits, Signed,
> Allocator>&, const
> boost::multiprecision::backends::cpp_int_backend<MinBits, Signed,
> Allocator>&) [with unsigned int MinBits = 128u, bool Signed = true,
> Allocator = void]: Assertion
> `(std::numeric_limits<double_limb_type>::max)() - carry >
> static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed,
> Allocator>::max_limb_value) *
> static_cast<double_limb_type>(cpp_int_backend<MinBits, Signed,
> Allocator>::max_limb_value)' failed.
>
> (Same failure at other optimization levels, including O0, O1 and O2.)
>
> This failure does not occur when running the same test on 32bit openSUSE
> 12.1,
> and so may be integer-size related. Here are the sizes of the built-in
> types,
> for reference:
>
> openSUSE 32bit:
> sizeof( short ) == 2
> sizeof( int ) == 4
> sizeof( long ) == 4
> sizeof( long long ) == 8
> sizeof( float ) == 4
> sizeof( double ) == 8
> sizeof( long double ) == 12
> sizeof( mp_uint128_t ) == 20
> sizeof( mp_uint256_t ) == 36
> sizeof( mp_uint512_t ) == 68
> sizeof( mp_uint1024_t ) == 132
>
> openSUSE/Ubuntu 64bit:
> sizeof( short ) == 2
> sizeof( int ) == 4
> sizeof( long ) == 8
> sizeof( long long ) == 8
> sizeof( float ) == 4
> sizeof( double ) == 8
> sizeof( long double ) == 16
> sizeof( mp_uint128_t ) == 24
> sizeof( mp_uint256_t ) == 40
> sizeof( mp_uint512_t ) == 72
> sizeof( mp_uint1024_t ) == 136
>
>
> On 2 July 2012 18:38, Phil Endecott <spam_from_boost_dev_at_[hidden]>wrote:
>
>> Hi Michael,
>>
>>
>> Michael Tryhorn wrote:
>>
>>> I have written a Boost-compatible library for the representation and
>>> manipulation of integers of large or arbitrary size (including 128bit and
>>> above). I would like to make a preliminary submission of this library
>>> for
>>> your review and input, for potential inclusion into Boost v1.51.0 or
>>> later,
>>> as per the Boost Library Submission Process.
>>>
>>
>> It would have been great if you could have joined the discussion a couple
>> of weeks ago. Perhaps it is not too late for you to write a review of the
>> proposed Boost.Multiprecision, since you clearly have some experience of
>> the subject.
>>
>> I have had a quick look at your code and my initial thoughts are:
>>
>> - The storage of these types is just the actual numeric data, so I should
>> be able to do binary I/O on them etc., which is good. I would perhaps be a
>> bit more confident of this if it were just an array of int32s or int64s,
>> rather than the current recursive organisation.
>>
>> - You're using 2's complement, which I believe is the right choice.
>>
>> - The simplicity of the code means it could be relatively bug-free, if
>> we're lucky.
>>
>> - Bitwise multiplication and division is completely inappropriate.
>>
>> - You have an idiom of working on a temporary and them copying the result
>> to *this to "commit" it, e.g.
>>
>> template<class T, class U>
>> large_int<T, U>& large_int<T, U>::operator++ ()
>>
>> {
>> // Take a copy of *this, to modify
>> this_type ret(*this);
>>
>> // Perform the increment
>> ++ret.m_lo;
>> if( ret.m_lo == low_part_type(0) )
>>
>> {
>> ++ret.m_hi;
>> }
>>
>> // Commit the result
>> return( *this = ret );
>> }
>>
>> I don't see what the benefit of that is; it just makes ++ much slower
>> than it otherwise would be.
>>
>> This example, operator++, also illustrates what may be a weakness of your
>> recursive style. Consider a 256-bit int implemented using 64-bit built-in
>> ints via intermediate 128-bit ints. The increment does this:
>>
>> increment the 128-bit m_lo:
>> increment the 64-bit m_lo (builtin)
>> test the 64-bit m_lo for zero (builtin) (assume common-case, not
>> zero)
>> test the 128-bit m_lo for zero (not built-in, impl in traits.hpp)
>> test the 64-bit m_hi
>> test the 64-bit m_lo
>>
>> All of second half of that is redundant once you know that the least
>> significant 64 bits incremented to a non-zero value, but it is far from
>> obvious to me that the compiler would successfully optimise it away. On
>> the other hand, if the implementation were just an array of 4 (u)int64_ts,
>> you would iterate with a for loop and break when non-zero.
>>
>> - It would be great to see "hooks" where assembler implementations could
>> be inserted.
>>
>> - It would be interesting to know if your code could function as a "back
>> end" to the proposed Boost.Multiprecision, using that library's expression
>> templates to eliminate temporaries. There are two things that could be
>> learnt from that: does that help performance, and does the Multiprecision
>> front/backend split work?
>>
>>
>> Regards, Phil.
>>
>>
>>
>>
>>
>>
>> ______________________________**_________________
>> Unsubscribe & other changes: http://lists.boost.org/**
>> mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>
>>
>
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk