Boost logo

Boost Users :

From: dizzy (dizzy_at_[hidden])
Date: 2008-08-18 05:24:46


On Sunday 17 August 2008 09:32:40 Zeljko Vrba wrote:
> On Sat, Aug 16, 2008 at 11:50:05PM +0300, dizzy wrote:
> > I do not agree. They generally do their job fine (which is provide
> > portable support to work with native, non checked, platform integer
> > types). For any other needs you should probably use another tool.
>
> Well, they do *not* do their job fine: (-1U < 2) == false, which is a
> mathematical nonsense (more on that below).

That does not prove anything. Any good tool has bad use cases. It does not
prove integer types do not do their job fine for what was their purpose. You
seem to want more than their purpose.

> Signed arithmetic overflow is undefined behavior, and some CPUs actually
> raise an exception on overflow (e.g. MIPS). Every 'a+b' expression, with a
> and b being signed integers, is potential UB. Some machines (e.g. x86) do
> not raise an exception but set an overflow flag which may be tested by a
> single instruction, yet I don't know of a compiler which offers run-time
> overflow checking as a code-generation option. Portable checks for
> overflow (relying on bit operations) incur immense overhead (certainly much
> greater than a single instruction).

Which is the reasoning why they were not introduced in the standard since it's
prohibitive or impossible in certain platforms. So? That just means the native
integer types inherit the many limitations of the low level CPU arithmetic
operations of the many platforms out there. Kinda logical to do so if you ask
me (since C++ tries to be defined so it has fast implementations on the many
native platforms out there instead of trying to be strict on his own virtual
machine).

> Writing an extra set of parentheses is not visually intrusive or
> cluttering.

Says who?

> Writing static_cast<int>(expr), (int)(expr), or surrounding the
> offending code with #pragma _is_ cluttering and intrusive.

Common, if we are to argue about how clumsy it _looks_ writing additional
parenthesis vs static_cast I think we have too much time on our hands. There
are countless other compromises we do when programming in C++ just because for
the rest it works so well (and no I don't see that as a flaw in the language,
meaning one could have all the good things of C++ without the bad things, I
see that as result of the good things of C++, every language feature has side
effects, some which we may not find good in certain contexts but that does not
make them a flaw).

> Yes, I want my
> code to be short, concise, and easily readable, in addition to being
> correct. So shoot me :-)

Well that's you. C++ was also made to:
- not pay for anything you don't need (I think RTTI is an exception there)
- allow writing code that is very close to the native platform
- be source compatible with C as much as possible

> I have researched comp.lang.c++.moderated archives on this topic, and other
> sources, and found two advices:
>
> Peter van der Linden in "Expert C Programming: Deep C Secrets" writes:
>
> "Avoid unnecessary complexity by minimizing your use of unsigned types.
> Specifically, don't use an unsigned type to represent a quantity just
> because it will never be negative (e.g., "age" or "national_debt")."
>
> A quote of Bjarne Stroustrup: "The unsigned integer types are ideal for
> uses that treat storage as a bit array. Using an unsigned instead of an int
> to gain one more bit to represent positive integers is almost never a good
> idea. Attempts to ensure that some values are positive by declaring
> variables unsigned will typically be defeated by the implicit conversion
> rules."

I personally use "unsigned" to mean values that cannot be negative and I
always careful about comparing them with signed types and such cases. That is,
in the low level code of course (it's also on layers, starting at some layer
it becomes unsigned after being properly checked for negative values and from
that point on it is used only as unsigned internally tho the outside world can
send signed values). In the high level code (which I am not writing, since my
work focuses on low level code) I would use something like your proposed
library (which proves that C++ not only offers native types to satisfy _my_
needs, you can also use them to do something to satisfy _your_ needs thus they
are not "flawed").

> Yet, all of the STL uses an unsigned type for size_type and the return
> value of size().

It uses std::allocator::size_type which is size_t for obvious reasons. Why
size_t is unsigned is a question to be asked for the C people who made it so
in C90 I guess :) Either way they are low level enough to not complain about
that choice.

> As much as I'd like to use only signed ints, this becomes
> prohibitive (due to warnings) when mixing them with STL. An yes, I've been
> bitten several times in the past by implicit signed -> unsigned conversions
> in relational comparisons. The most sane thing would be to throw an
> exception at runtime if one compares a negative integer with some unsigned
> quantity, instead of getting false for '-1 < 2U', which is a mathematical
> nonsense.

You imply some code to check for that thus violating the basic principle of
not paying for what you don't need.

> signedness of a type *should not* affect its numerical value in
> mathematical operations.

But it can't do that without runtime cost. There are 3 options the compiler
can do about the a < b (a being signed, b being unsigned):
- it can, as it does now, convert a to unsigned (which may result in
unspecified value) and compare that; you find it wrong
- it can convert b to signed and compare with a; you think this is right but
it isn't because it's exactly as the option above, it has cases in which
conversion of b to signed results in unspecified value (because the unsigned b
value may be bigger than what the signed a can represent); would you still
prefer this over the above option? maybe so but others will not since the need
exactly the other way (the current solution)
- it can promote both arguments to the signed integer type that supports all
values of both (say long) and compare that; this may seem fine to you but it
has the following problems:
        - it assumes there is a type that can represent both those values; we can
clearly imagine that a and b might have been long and unsigned long in the
beginning so that won't work; not to mention there is no guarantee that even
if they are int then long can fit them both; long can have the same size of
int (as it does so on many platforms)
        - promoting to a bigger type than int and comparing that may prove more
expensive than comparing ints (which the standard defines as being the natural
integer type of the platform, thus the "fastest" one); so then we would be
violating the principle of paying for what we don't need in cases of those
people that they do know a is not negative thus a < b won't do anything bad;

Your solution to this problem?

> I *do* realize that. However, simple wrappers won't fix operators. I like
> having as many low-overhead sanity checks in my code as possible, but I
> don't want to litter the code with a bunch of assert()s in front of every
> arithmetic expression. Should I replace every 'if(a < b)' with 'if(less(a,
> b))' and provide a bunch of overloaded less() functions?

No but why would you do it? You can't know at certain locations in your
program that something may be negative or not? That's actually _why_ I use
unsigned types, to signal that it can't be negative from that moment on (ie
the user code sees the interface taking an "int" but the called function
converts it unsigned after checking for the precondition of being positive
value and then from that point on uses only unsigneds so having unsigned types
actually helps knowing that variable can't be negative).

Or otherwise, how do you know a pointer can't be invalid at every point of use
in the program? You can't, you can have invariants based on the code flow up
to that point or to help you a bit with this you can have the value stored in
something that moves the precondition earlier (like a reference, for which to
be initialized you would need to dereference the pointer and if that's invalid
the UB happens then). That's the same with receiving a user value as int and
storing it in unsigned to be used later as unsigned. The later code sees
unsigned (just as the later pointer code sees the reference) and knows it's a
positive value (just as the later pointer code knows it's a valid reference).
They know this because at the location of converting from int to unsigned (or
from pointer to reference) you performed the check.

> Oh, the library would allow conversion from integer<32, 0, 4294967295> to
> integer<16, 0, 65535>, and insert an appropriate run-time range-check.

Sounds fine.

> So
> types integer<N1,L1,H1> and integer<N2, L2, H2> would be compatible if the
> intersection of [L1,H1] and [L2,H2] is not empty; the resulting type would
> be coerced to integer<max(N1,N2), min(L1, L2), max(H1, H2)> (though other
> bounds are necessary for other operations, e.g., addition), with a run-time
> check inserted (this makes it possible to maintain rather lax bounds at
> compile-time).
>
> ===
>
> Actually, I think I could simplify my requirements a bit:
>
> - define template class integer<T> with T == char, short, int, long, long
> long (no unsigned types allowed!)

So you can't know at some point in code if some value is positive without
assumption (may be error prone) or runtime check (too costly). I think you
should allow to signal syntactically that some type takes only positive
values, it is something often needed in code to know this for sure about some
variable. It won't break your invariants, it can be made compatible with the
rest of your library ideas.

> - allow initialization from _any_ integer<T> or primitive type, larger or
> smaller, signed or unsigned, provided it is in the range of T; throw
> exception otherwise

Initialization with implicit conversion too? If so that might become tricky
with the bellow requirement about conversion to integer types (in general it's
tricky to get implicit conversions right).

> - allow conversion to _any_ underlying integer type, signed or unsigned,
> provided the value fits; otherwise throw an exception
> - allow mixed arithmetic between mixed integer<T> classes (e.g.
> integer<int> + integer<char> would be defined) as well as between
> integer<T> and any primitive type, subject to the conversion rules above
> - arithmetic would be checked for overflow

This, as you said, can be overly expensive on some platforms or not? But if
you need it sure.

> - comparisons between integer<T> and unsigned types are allowed as long
> as integer<T> is positive or 0; otherwise an exception is thrown

Mathematically speaking "0" is a positive value ;) On the serious side, why
throw? Wouldn't be better to still try to perform them somehow (and only if
there is no underlying common type that can represent all their values and be
able to compare them throw). This of course comes with associated costs and if
as you say below you want a flag that disables all associated runtime costs
then I understand why you need to throw here in the debug version.

> - bit manipulation operators would behave as if T were unsigned; an
> additional method or function signed_shift(integer<T>) would be provided

But you should make some more strict requirements then. You can't assume a
negative value encoding (such as 2's complement) thus you should not allow
bitwise operations affecting more than the bits forming the value, ie without
the sign bit. Otherwise you (as the C++ standard) get "unspecified value".

> - arrays of integer<T> must have the same low-level layout as arrays of T

I'm not sure how this can be accomplished. Sure you can have only the integer
value as the single non static data field but I don't think the standard says
that a struct A { int a; /* any members here except non static data members */
}; has the same binary layout with a struct A { int a; };. I know it makes
sense to have but I'm not sure about the letter of the standard on this.

Conclusion: it's a nice library. It's something that I also suggested to some
colleagues of mine when they ran into some of those signed/unsigned warnings
issues (into their high level code). But notice that some of your checks may
be too expensive, more expensive than someone is willing to pay to get those
features, so then to make it generic you will need to do some sort of compile
time configuration, like for example with policies. You say you would be using
it only in debug mode so you think of it just as a form of assert() but using
the operator overloading syntactic sugar to still have common integer
arithmetic syntax without assert()s polluting the code everywhere. But you
would use it as native types when in release mode, so then the C++ integer
types are not flawed that you would use them in release mode? Your only
complain is that they were flawed for not having expensive runtime checks when
used in debug mode? :)

> Example:
>
> integer<int> a;
> integer<short> b;
> short result = a + b + 7;
>
> This code would convert b to integer<int>, check for overflow before (or
> after[*]) adding it to a, compute a+b, check for overflow before adding 7,
> add 7 to the result, check that the result fits into short, done. If any
> check fails, an exception is thrown.
>
> [*] Overflow checks could be implemented in platform-specific assembler;
> e.g. x86 sets the overflow flag _after_ addition.
>
> This is the semantic that I'd like to have in debug versions; in "release"
> version of the program, everything would behave as ordinary arithmeic on
> primitive types. I'm studying numeric_cast<>, but.. it covers only
> conversions, not other items listed above.
>
> > There is a lot of possible dangerous code that can be done in C or C++
> > and some compilers tell you about it but that do not make it wrong code.
> > Warnings
>
> The problem is that every 'a+b', with a and b signed, is dangerous
> (potential UB), unless you either 1) formally *prove* that the expression
> won't overflow or 2) insert extra run-time checks (which clutter the code).
> :/

Sure, just as "*p" is potential UB unless you:
- formally prove p is a valid pointer
- you insert extra run-time checks (with pointers this is actually worse since
other than the null pointer value you have no way to determine if the pointer
is invalid)

As I said, it's just normal C++, many places of potential danger but that
might be fine.

> Something as common as simple addition should at least have an _option_ of
> *always*, under *all* circumstances having defined behavior.

Portable and without runtime costs, no. It has no such option. That's the
reason why undefined behavior has been introduced in C (and in C++). There are
many cases where because of technical and portability reasons, to provide a
defined behavior would be either impossible or too expensive. So the standard
says then it's undefined behavior.

> The integer<>
> class proposed above is just one of possibilities; but that should have
> been included in the standard :/

You mean in the standard library? Well I work often with open source projects,
and in the open source community we have a saying when someone complains about
something not already available in software: "if it's not available it means
it's not needed". To a certain degree this is valid for the current problem
too. Search on the Internet, how many std-like hash implementations you find
vs how many such range checked integer libraries you find? That is a good
estimation of the need of such a library IMO. Yes, it is useful, but it is not
very very useful. Plus, because it can be implemented in C++ proves that C++
integer types are not flawed and they allow you to do what you need.

-- 
Dizzy
			"Linux is obsolete" -- AST

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net