Boost logo

Boost :

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2007-04-26 20:03:11


Maurizio Vitale wrote:
> "Phil Endecott" <spam_from_boost_dev_at_[hidden]> writes:
>> - My concerns about how to do, for example, a 32x32->64-bit multiply
>> efficiently without actually doing a 64x64->64-bit multiply.
>
> I don't see too many ways out. One thing people do is to shift right by 16 both operands, do a
> 16x16->32 and then shift the result left by 32. (e.g. compute (x/2^16 * y/2^16)*2^32). This
> maintains at least the magnitude of the result, but whether it is acceptable or not depends on
> the application domain.

No, I'm not interested in anything with loss of precision.

> Btw, X86_64 gives you 64x64->128, I think.

That's interesting. Is there any way to use that from C++? Of course,
it also has a relatively efficient 32x32->64 because it has a single
instruction for 64x64->64. So any code for this problem would have to
be architecture-specific. But here is the sort of thing that I am
thinking about for machines that only have 32x32->32:

In these figures, each letter represents 16 bits.

If I want to do 32x32->64, I need to break up the two operands into two
16-bit portions and do this calculation:

     A B
  x C D
---------
     B*D
   A*D
   C*B
A*C
---------

So that's 4 multiplies. This compares to the naive method where, in
C++, I first cast to 64-bits and the compiler generates a 64x64->64
multiply like this:

           A B C D
       x E F G H
-----------------
               D*H
             C*H
             G*D
           B*H
           C*G
           F*D
         A*H
         B*G
         F*C
         E*D
------------------

That uses 10 multiplies. You don't need to do e.g. A*E as it falls off
the most significant end of the result. (Have I got that right?) If
you are lucky there might be some run-time zero detection that
simplifies it a bit (e.g. in gcc this is done in a library function).
But even then detecting it at compile time by explicitly asking fora
32x32->64 multiplication would still be better.

(Is this sort of integer arithmetic of interest to people, other than
as the implementation of fixed point? I believe there was a big-int
SoC proposal; does that overlap with this?)

> I've seen in your code that you allocate one more bit than requested by the user in order to detect
> overflow. As you note this is problematic when you hit the machine operand length. In those cases,
> besides checking the msbs before and after you can also use the carry flag. It can be accessed through
> inline asm instruction, or used implicitely via conditional branches and conditional moves.

Yes. Storing a 32-bit safe_int in 32 bits is significantly harder than
storing a 31-bit one, however you do it. I believe that most of the
time when we ask the compiler for an 'int', we don't really care
exactly how many bits it has and making a 31-bit value your default
safe_int so that it can use the relatively easy guard bit
implementation would be a good enough solution.

Regards,

Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk