 # Boost :

From: Dean Foster (foster_at_[hidden])
Date: 2001-10-01 08:10:22

The following idea kept me awake last night.

GOAL:

Construct a system would allow an unlimited number of different units
to be defined and still catch 99.9% of all type errors. It should
automatically generate new units when basic units are multiplied or
divided.

SUMMARY:

A units class is described that will have a single integer that
represents a "hash" of the unit involved. If two units (say foot and
lbs) are multiplied a hash of the product (foot_lbs) is created to act
as a signature of this new unit. If two units are added and their
hashes don't agree--the system complains at compile time. Type safety
is thus 99.999% assured. (there is a chance that two complex units
would have the same hash value and so mistakenly look the same.) The
hash function has the property that any way of generating foot_lbs
from other units will generate the same computed hash.

INTRODUCTION:

There are two problems the units library can solve: unit conversion
and type safety. If the first goal is totally dropped, I think we can
generate an elegant solution to the second problem. This would
suggest that we should have two libraries at the end of the day:
SIunits which does conversions (but ignores most type safety), and a
type safe library to be named latter (that does no unit conversions).

It seems that several of our candidate libraries look something like:

// feet
// | lbs
// | | seconds
// | | |
// v v v
//
typedef Unit< 0 > pure;
typedef Unit< 1 > foot;
typedef Unit< 0, 1 > pound;
typedef Unit< 0, 0, 1> second;
typedef Unit< 1, 1, 0> foot_lbs;
typedef Unit< 1, 0, -1> feet_per_second;

What makes this totally ugly is that if you want 10 different units,
you have 10 template integers. Not only is the code unreadable, but
to use 12 different units instead of 10 is almost impossible without
learning sed first!

Basically what is going on is that when two units are multiplied we
want to add a vector that represents the units involved. When two
units are divided, we subtract the vector of their units.
Mathematically this means we need a "ring" to represent the dimension
of our units. Currently all the systems that I've heard about use
didn't use basis elements, we wouldn't need such a large space to
represent each unit.

Each basic unit is given a hash value h(unit). Now if we multiply two
units

h(unit_A * unit_B) = h(unit_A) + h(unit_B)

if we divide two units:

h(unit_A * unit_B) = h(unit_A) - h(unit_B)

A complex transformation looks like:

h(unit_A^2 * unit_B / unit_C) = 2 * h(unit_A) + h(unit_B) - h(unit_C)

Using this scheme we can represent the previous units as:

typedef Unit< 0 > pure;
typedef Unit< h_foot > foot;
typedef Unit< h_pound > pound;
typedef Unit< h_second > second;
typedef Unit< h_foot+h_pound > foot_lbs;
typedef Unit< h_foot - h_second > feet_per_second;

where h_foot, h_pound and h_second are carefully chosen integers. I
laid it out so that it looks like the vector math we did above.

The cool thing about such a representation is that now we should be
able to replace the last two templates with the following:

Product_of_units<foot,pound> foot_lbs;
Ratio_of_units<foot,second> feet_per_second;

where

Product_of_units<unit_A,unit_B>

is interconvertable with a

Unit< h(unit_A) + h(unit_B) >

and similarly for Ratio_of_units. Thus our final code would look
something like:

typedef Unit< 0 > pure;
typedef Unit< h_foot > foot;
typedef Unit< h_pound > pound;
typedef Unit< h_second> second;
typedef Product_of_units<foot,pound> foot_lbs;
typedef Ratio_of_units<foot,second> feet_per_second;

Such a scheme would allow as many type safe doubles as desired.
Regardless of the number of types introduced almost all incorrect
assignments will be captured.

ISSUES AND PROBLEMS:

o It would be nice to use a proper hash function--say addition
modulo 2^32. This would allow larger hashes and better colosion
avoidance. Is there an easy way of doing modulo arithmetic in
templates? (I know they are Turing complete, but kinda slow!)

o Is there a way of automatically generating good hash values for the
basic units? Something close to random would be ideal. If we can't
do the modulo arithmetic they have to be kinda small though. (say
around 10-100 Million of a typical machine with 2^31 being the maximum
signed value.)

o I don't know how to do the conversions easilly between the
Product_of_units and the basic Units.

=============================================================================
Dean Foster dean_at_[hidden]
Statistics, Wharton, U. Penn 215 898 8233