[Boost-users] The problem isn't C's integer philosophy; it's your design. (was: size_type doubts / integer library..)

18 Aug 2008

      Have you ever read a thread in a programming news-group, where some  
newbie asks how to implement some wacky scheme?  Several far-out  
solutions come forth, then flame-wars erupt over the pros & cons.   
Then finally someone asks what the newbie really needs and gives a  
completely different solution based on that response.  Basically, the  
newbie was trying to do something in a manner s/he shouldn't even  
thought of, let alone suppose it was good enough to try  
implementing.  This is one of those times.

On Aug 16, 2008, at 1:32 PM, Zeljko Vrba wrote:
...
Integer types in C and C++ are a mess.  For example, I have made a  
library
where a task is identified by a unique unsigned integer.  The extra  
information
about the tasks is stored in a std::vector.  When a new task is  
created, I use
the size() method to get the next id, I assign it to the task and then
push_back the (pointer to) task structure into the vector.  Now,  
the task
structure has also an "unsigned int id" field.  In 64-bit mode,
sizeof(unsigned) == 4, sizeof(std::vector::size_type) == 8
I get a warnings about type truncation, and obviously, I don't like  
them.  But
I like explicit casts and turning off warnings even less.  No, I  
don't want to
tie together size_type and task id type (unsigned int).  One reason is
"aesthetic", another reason is that I don't want the task id type  
to be larger
than necessary (heck, even a 16-bit type would have been enough),  
because the
task IDs will be copied verbatim into another std::vector<unsigned>  
for further
processing (edge lists of a graph).  Doubling the size of an  
integer type shall
have bad effects on CPU caches, and I don't want to do it.
What to do?  Encapsulate into "get_next_id()" function?  Have a  
custom size()
function/macro that just casts the result of vector::size and  
returns it?
Well, using a custom "size" function will shut the compiler up.  But  
the "get_next_id" function is better because you can change the  
implementation of the ID and external code shouldn't have to change.   
(The ID is a typedef and not a naked "unsigned," right?)  Anyway,  
does it really matter; this ID generation code is only used during  
task construction, right?

Actually, writing this response is hard.  I've read 20+ responses,  
talking about how much built-in integers "suck."  Then I decided to  
look at the original post again, and something bugged me about it.   
Why are you using a number to refer to a container element in the  
first place?  Then I realized that you can't use iterators because  
they're not stable with vector's element adds or removes.  Then I  
wondered, why are you using a vector in the second place?  Wouldn't a  
list be better, so you can add or remove without invalidating  
iterators, leaving them available to implement your ID type.  And you  
don't seem to need random-access to various task elements.  (A deque  
is unsuitable for the same reason as a vector.)  Then I thought,  
these tasks just store extra information, and have no relation to  
each other (that you've revealed).  So why are you using any kind of  
container at all?  You have no compunctions about using dynamic  
memory, so just allocate with shared-pointers:

//================================================
class task
{
     struct task_data
     {
         // whatever...
     };
     typedef boost::shared_ptr<task_data>  sp_type;

     sp_type  data_;

     // Hidden member-wise constructor
     explicit  task( sp_type d )  : data_( d )  {}

public:
     // Constructors of various configurations, possibly including
     // a default constructor; but use the automatically-defined
     // copy-constructor and destructor
     task(/*whatever*/)  : data_( new task_data(/*whatever*/) )  {}

     // Forced copy
     task  clone() const
     {
         sp_type  result_data( new task_data(*this->data_) );
         task     result( result_data );

         return result;
     }

     // Use automatically defined copy-assignment operator
     bool  operator ==( task const &o ) const
     {
         //return this->data_ == o.data_;  // shallow
         return *this->data_ == *o.data_;  // deep
     }
     bool  operator !=( task const &o ) const
     { return !this->operator ==( o ); }

     // Regular task functionality follows...
};
//================================================

(I was going to suggest using Boost's pointer-containers, but then I  
realized that you really don't need containment at all.)  Now you'll  
pass this class around instead of an integer type.  The size may be  
higher though, two pointers (and an 'int' in debug-mode).
...
==
Another example: an external library defines its interfaces with  
signed integer
types, I work with unsigned types (why? to avoid even more warnings  
when
comparing task IDs with vector::size() result, as in assert(task->id <
tasks.size()), which are abundant in my code).  Again, some  
warnings are
unavoidable.
What to do to have "clean" code?
You tasks IDs are conceptually opaque, why is any external code  
wanting to mess with them?  The external code shouldn't be doing  
anything with the IDs besides comparing them with each other (only !=  
and ==, not ordering) and using them as keys to your task functions.   
This is why the ID's implementation should be hidden in a wrapping  
class, external code can't mess with them by default; you would have  
to define all legal interactions.  In other words, define the main  
task functionality in member functions and friends of the "task"  
class I suggested, and have any ancillary code call that core code.

And if any code besides your test-invariant function is doing those  
asserts, especially functions outside the task class, you're doing  
your wrong method wrong.
...
==
Does anyone know about an integer class that lets the user define  
the number of
bits used for storage, lower allowed bound and upper allowed bound  
for the
range?  Like: template<int BITS, long low, long high> class Integer;
The integer library in Boost has this.  The various class templates  
only support one of your parameters at a time, though.  (Either bit- 
length, maximum, or minimum, not any two or all three.)
...
BITS would be allowed to assume a value only equal to the one of  
the existing
integer types (e.g. 8 for char, 16 for short, etc.), and the class  
would be
constrained to hold values in range [low, high] (inclusive).
[SNIP rant with big ideas for integers he doesn't currently need]
You'll have to enforce a constraint range yourself.  But there is a  
numeric-conversion library in Boost to help you there, too.
...
Or should I just listen to the devil on my shoulder and turn off the
appropriate warnings?
No, you should rethink your design on why you need integers in the  
first place.

-- 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT hotmail DOT com