Boost logo

Boost :

Subject: Re: [boost] Algebraic abstractions related to (big) integers in C++
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2011-03-31 13:29:53


Phil Endecott wrote:
> pavel wrote:
>> taking into account luke's message about adc_pi, i think that it's a
>> perfect opportunity to abstract multiword addition (with carry) like
>> this:
>>
>> unsigned add_with_carry(unsigned *dest, const unsigned *src,
>> size_t size);
>>
>> the returned value is intended to be an overflow flag (read: carry of
>> the result of the most significant word addition)
>
> What you describe is the sort of thing I (and I presume others) had in
> mind when we suggested that arithmetic algorithms be decoupled from
> data structures, i.e. you would have
>
> template <typename ITER, typename CONST_ITER>
> carry_type add_with_carry(ITER dest_begin, ITER dest_end,
> CONST_ITER src_begin, CONST_ITER src_end);
>
> with a general-purpose implementation, and then specialise it:
>
> carry_type add_with_carry(uint32_t* dest_begin, uint32_t* dest_end,
> const uint32_t* src_begin, const uint32_t*
> src_end);
>
> and provide an asm implementation for the specialisation.
>
> However, this doesn't suit the SIMD instruction that Luke described.
> That does multiple adds in parallel, and has carry-in and carry-out to
> each, but those carries are NOT propagated from one adder to its
> neighbour in that same instruction. So to use that instruction
> efficiently, say you are adding 4 multi-word numbers together:
>
> R = A + B + C + D;
>
> You would do
>
> (tmp1, carries1) = A + B;
> (tmp2, carries2) = tmp1 + C + carries1;
> (tmp3, carries3) = tmp2 + D + carries2;
> R = tmp3 + carries3;
>
> Note that the +s in the first 3 lines use the SIMD instruction Luke
> described, and the temporaries are incomplete. In the last line, the
> carries are added in (with the carry potentially propagating across
> the
> whole width); this may need one instruction per word.
>
> In hardware this is called a "carry save adder"; I'm not sure if the
> SIMD people use the same terminology.
>
> Anyway, the point is that even a multi-word addition abstraction is
> not specialisable in a way that fully exploits SIMD instructions.
> Doing
> that probably needs expression templates.

Implementing the add_with_carry function to be the multiword add operation using intrinsics would probably be at least 200 lines of code once you handle all combinations of aligned and unaligned source and dest plus unrolling the loop plus handling partially populated vector in the last iteration of the loop plus shifting and adding the carry back in and looping on that if it itself results in carry, then propigating the last carry forward to the next iteration of the main loop as the carry in. If the function returns a carry you need to extend the result by one word.

Phil's forwarding of carries to the next add using the carry in argument is significantly better than calling add_with_carry function three times because not only do you save instructions for adding carries in you also save on load and store because you will read all four inputs simultaneously and store the final result only once, never reading it again. Implementing N argument addition/subtraction of multiword integers as a template using intrinsics and selecting that algorithm using an expression template is feasible, but now we are talking about thousands of lines of code. It would be a very fun project, though.

Regards,
Luke


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk