|
Boost : |
Subject: Re: [boost] Algebraic abstractions related to (big) integers in C++
From: Thomas Klimpel (Thomas.Klimpel_at_[hidden])
Date: 2011-03-29 15:57:43
Brent Spillner wrote:
> On 3/27/11 00:21 Thomas Klimpel wrote:
> >The goal of this message is to reflect on the two's complement
> representation of negative integers, from a relatively abstract
> algebraic perspective.
>
> I suspect that practical arguments may resonate more deeply with this
> list than theoretical ones.
Well, thanks for providing some practical arguments.
> Indeed, from a theoretical point of view
> it's trivial to convert operations on a sign bit + N-1 magnitude bit
> representation to/from operations on the N-bit 2's complement
> representation, and in fact there is one value representable in the
> two's complement representation that is not in the sign/magnitude
> representation of the same length, which seems like a (very minor)
> theoretical win for 2's complement.
I found out that this is actually a "theoretical" loss. The following program
int i = 0x80000000;
std::cout << i << std::endl;
std::cout << std::abs(i) << std::endl;
prints
-2147483648
-2147483648
So there exists an int value for which the assertion "assert(std::abs(i) >= 0);" fails. But when you convert the result of std::abs to unsigned, you get at least the correct value.
> If the operands are the same size (even if that spans multiple machine
> words), I think 2's complement is clearly preferable to take advantage
> of the hardware instruction set. With the exception of the overflow
> flag, addition, subtraction, and the low-order result of a
> multiplication yield the same bitstring whether the operands (and
> result) are taken to be 2's complement or unsigned, which means that
> you can perform arithmetic on lower-order word-size chunks of a large
> integer without caring whether the highest-order chunk is positive or
> negative. Might as well take advantage of the instruction-level
> building blocks for 2's complement representation (with appropriate
> overflow, carry-in, etc. options) in this case.
So there are "practical arguments" for using two's complement representation for fixed-width integers. Very good.
> If you might be adding or subtracting large integers of different
> length, the sign-magnitude representation is arguably simpler than
> sign-extending the smaller operand (and potentially propagating
> carry-outs up to the highest-order chunk of the result.) This then
> lets you choose to represent negative numbers in the smallest possible
> number of words without worrying about an arithmetic performance
> tradeoff. On the other hand, multiword sign-magnitude arithmetic
> requires two code paths for every operation, with an initial
> test-and-branch to determine whether all your ADDs should be replaced
> by SUBs.
So even by "practical arguments", it's not easy to settle the question whether two's complement representation should be used for variable width integers.
> And given that 2's complement seems preferable for
> fixed-width arithmetic, I think I'd want a library supporting both
> fixed- and variable-width representations to adopt the same convention
> for both.
That was also my initial thought. But then I understood that floating point numbers should not use the two's complement representation. So all algorithms must be implemented for unsigned integers anyway. But then it doesn't make such a big difference how we implement the algorithms for the signed integers in terms of these algorithms for the unsigned integers.
> >"What I consider especially strange from an algebraic point of view,
> is to automatically promote a signed integer type to its corresponding
> unsigned integer type, when >an arithmetic expression contains both
> signed and unsigned integer types. Shouldn't it be the other way round?
> Or wouldn't it be even better, if there would be no >automatic
> conversion at all? At least from an algebraic point of view, any
> natural number is also an integer, not vice versa like in C."
>
> To be precise, this happens only when the rank of the unsigned type is
> greater than or equal to that of the signed type. In the case where
> both operands have non-negative values, this minimizes the risk of
> overflow during addition or multiplication. When performing an
> operation on a unsigned value and a signed value (or subtracting a
> unsigned value from a signed positive value of lesser magnitude), it
> is incumbent upon the programmer to have some notion of what he or she
> is doing. I believe the rationale may be that signed values are the
> default, so if you went out of your way to declare something as
> unsigned then you're accepting some responsibility for using it
> appropriately.
Well, the built-in operator sizeof() returns an unsigned integer value, as do many other functions in C. In C++, c.size() is unsigned, as are many other values. So I don't really have to go out of my way to end up with a mix of signed and unsigned numbers. I don't think that the reason behind the "signed to unsigned promotion" rule is a "rationale". The problem is that "1", "23" have the concrete type "int", which forces the "signed to unsigned promotion" upon us. This is because the type system of C isn't powerful enough to handle this situation more appropriately. One idea how it would have been possible to handle this problem in C would be to introduce a special type for "integer literals", or even "integer constants" that automatically take care of the required conversions.
> Note also that (at least in ANSI C) when performing
> arithmetic between an unsigned value and a signed value of higher
> rank, the promotion is to the signed type--- so the "promote to the
> 'larger' type" principle is consistently applied across the board,
> particularly if you consider unsigned types "larger" than the signed
> equivalent (seems reasonable to me given how uncommon negative values
> are in most integer code.)
You're kidding, right?
Regards,
Thomas
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk