Boost logo

Boost Users :

From: Lang Stefan (SLang_at_[hidden])
Date: 2008-08-18 06:06:08


Some thoughts:

1. Original problem
Assigning a size to an ID type, IMHO appears to be completely unrelated
to the general integral types' problem. There are two possibilities:
A) you intend to use your ID to later reference some address or offset
(which is the same) in memory. If that is your intention, then you
should stick to the size type (i. e. std:size_t). Problem solved.
B) if you don't intend to ever use your ID in the future to refer to an
address in memory, then using an integral type might prove
short-sighted. Unless you have a very concise knowledge about how many
IDs you will need in, say, 5 years, or maybe 10 years from now, it's
dangerous to limit yourself to a specific integral type for such a
value. The least I would do is a typedef to hide the true type of that
value and prevent anyone from making presumptions about it's potential
size. Better yet define a class for this ID. If you do that you can
write your own conversions and mathematical operators exactly the way
you want them to be. Plus you have a central place to fix your
conversions if you decide to change your ID class in a couple of years.
Problem solved - in a very clean way. (IMHO)
Then again I might be overestimating the importance and scope of this ID
value. But in that case, why is 'unsigned long' the 'correct' type?
Regarding this question, see my comments in part 3 of this message.

2. Size types (and difference types)
I disagree on your complaint about size types all being unsigned. The
quotation from B. Stroustrup explicitly made an exception for specific
cases, and as I understand it he *was* referring to size types. The
problem with size types is that they do need to be able to point to
every location within the addressable memory space. Unfortunately this
means they will need every single bit of the biggest data entity a CPU
can handle at any one time. If size types were signed, they would need
one extra bit for the sign, effectively doubling the amount of memory
such a type would take up in memory. Unfortunately 'difference types'
technically should be both signed and able to address every legal
address in memory - which means one bit more than the current size
types! However, see below....

3. Integral types situation
I agree with your general assessment of the integral types' situation.
For several reasons actually, some of them not pointed out yet:
A) Whenever signed and unsigend types are mixed, the compiler implicitly
converts some of the values or intermediate results. Unfortunately it's
often not obvious, which values and terms are being converted, and
whether that can lead to problems. Most of the time (but not always, as
you have correctly pointed out) the compiler issues warnings in such
situations, but many compilers just point to the line, not the exact
variable or term.
B) What makes this problem even more difficult, is the widespread use of
libraries, which may or may not use signed/unsigned values for specific
properties, and thus don't even leave it open to the developers to avoid
such combinations!
C) Built-in integral types are based on internal representations of
numbers. From the perspective of a designer this is just plain wrong! It
is a violation of the principle of information hiding! Types shouldn't
reveal anything about their internal representation, and after 20+ years
of history in object oriented programming I really don't understand why
we are still forced to use such types! When a developer selects an
integral type for a variable, he considers the preconditions and
postconditions, and decides the range of values his variable could
legally assume. However, he shouldn't be forced to map this range to one
of a few predefined ranges, just because these predefined ranges happen
to fit the compiler's preferred internal representation. If a variable
can assume values in the range [0..999999], then is 'long' the 'correct'
type? Or is it 'unsigned long'? My answer is: neither! If the developer
chooses either type, others looking at the code might not recognize that
certain values outside that range (but well within the limits of the
chosen type) will cause problems or might indicate an error elsewhere.

4. Resolution
I like the suggestion of an integral type that is defined as a range of
valid values. An additional (optional) parameter could be provided to
define policies such as behavior on overflow, conversion problems and
the like. This would be in the spirit of information hiding. In the case
of size types or difference types, the address space could be reasonably
restricted to a range that allows for a sign bit without compromising
the maximum internal data size. In most cases anyway. And when it turns
out an additional bit is still needed, then so be it! This design would
take away the need to think in internal representations, and instead
would allow the developers to concentrate on the true limitations of
their variable types.

5. Advantages
A) Developers can set more concise limits on their variables and
behavior on violation of these limits.
B) Predefined behavior on violations at runtime.
C) Compiler developers can optimize internally needed storage by
clustering mechanisms, depending on whatever memory access and
processing capabilities the available cores provide.
D) Compiler developers need to deal with just one basic integral class
instead of a dozen or more. Depending on processor architecture this may
or may not be very helpful (it might be neccessary to redefine the
current integral types internally), but at least the use of these types
would be more secure and less prone to programming errors. (Of course
this implies the annihilation of current integral types which won't
happen for some time, even after a standard basic adaptable integral
type would be provided.)

So these are my 5 cents, sorry for the lengthy post.

Stefan


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net