|
Boost : |
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2007-04-26 20:03:11
Maurizio Vitale wrote:
> "Phil Endecott" <spam_from_boost_dev_at_[hidden]> writes:
>> - My concerns about how to do, for example, a 32x32->64-bit multiply
>> efficiently without actually doing a 64x64->64-bit multiply.
>
> I don't see too many ways out. One thing people do is to shift right by 16 both operands, do a
> 16x16->32 and then shift the result left by 32. (e.g. compute (x/2^16 * y/2^16)*2^32). This
> maintains at least the magnitude of the result, but whether it is acceptable or not depends on
> the application domain.
No, I'm not interested in anything with loss of precision.
> Btw, X86_64 gives you 64x64->128, I think.
That's interesting. Is there any way to use that from C++? Of course,
it also has a relatively efficient 32x32->64 because it has a single
instruction for 64x64->64. So any code for this problem would have to
be architecture-specific. But here is the sort of thing that I am
thinking about for machines that only have 32x32->32:
In these figures, each letter represents 16 bits.
If I want to do 32x32->64, I need to break up the two operands into two
16-bit portions and do this calculation:
A B
x C D
---------
B*D
A*D
C*B
A*C
---------
So that's 4 multiplies. This compares to the naive method where, in
C++, I first cast to 64-bits and the compiler generates a 64x64->64
multiply like this:
A B C D
x E F G H
-----------------
D*H
C*H
G*D
B*H
C*G
F*D
A*H
B*G
F*C
E*D
------------------
That uses 10 multiplies. You don't need to do e.g. A*E as it falls off
the most significant end of the result. (Have I got that right?) If
you are lucky there might be some run-time zero detection that
simplifies it a bit (e.g. in gcc this is done in a library function).
But even then detecting it at compile time by explicitly asking fora
32x32->64 multiplication would still be better.
(Is this sort of integer arithmetic of interest to people, other than
as the implementation of fixed point? I believe there was a big-int
SoC proposal; does that overlap with this?)
> I've seen in your code that you allocate one more bit than requested by the user in order to detect
> overflow. As you note this is problematic when you hit the machine operand length. In those cases,
> besides checking the msbs before and after you can also use the carry flag. It can be accessed through
> inline asm instruction, or used implicitely via conditional branches and conditional moves.
Yes. Storing a 32-bit safe_int in 32 bits is significantly harder than
storing a 31-bit one, however you do it. I believe that most of the
time when we ask the compiler for an 'int', we don't really care
exactly how many bits it has and making a 31-bit value your default
safe_int so that it can use the relatively easy guard bit
implementation would be a good enough solution.
Regards,
Phil.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk