Boost logo

Boost :

From: Eric Ford (eford_at_[hidden])
Date: 2001-10-01 12:52:25


> What makes this totally ugly is that if you want 10 different units,
> you have 10 template integers. Not only is the code unreadable, but
> to use 12 different units instead of 10 is almost impossible without
> learning sed first!

Well, even if you used a list of rationals rather than a hash, you
could increase code readability by using a typedef for common units
and compile time structs which have a typedef to provide the units of
products.

> Basically what is going on is that when two units are multiplied we
> want to add a vector that represents the units involved. When two
> units are divided, we subtract the vector of their units.
> Mathematically this means we need a "ring" to represent the
dimension
> of our units. Currently all the systems that I've heard about use
> basis elements--that leads to the vector space. But if instead we
> didn't use basis elements, we wouldn't need such a large space to
> represent each unit.

What's wrong with having a large space? Is the concern with the list
of rationals about readability? Having a large space concern?
Extensibility?

> The cool thing about such a representation is that now we should be
> able to replace the last two templates with the following:
>
> Product_of_units<foot,pound> foot_lbs;
> Ratio_of_units<foot,second> feet_per_second;

That could be done using the list of rationals as well.
 
> ISSUES AND PROBLEMS:
> o Is there a way of automatically generating good hash values for
the
> basic units? Something close to random would be ideal. If we
can't
> do the modulo arithmetic they have to be kinda small though.
(say
> around 10-100 Million of a typical machine with 2^31 being the
maximum
> signed value.)

You could write a routine to evaluate the quality of a set of hashes.
It would basically make a list of hashes obtained by reasonablely
likely combinations of all units and return the number of duplicates.
A more advanced version might weight more likely collisions as worse
(For a first guess, I'd assume small is more likely for both numerator
and denominator.). Then you could just try several hashes (and
transformation functions to multiply/divide units).

However, we already know the solution to a very similar problem. If
our goal is to avoid collisions for n+1 integers, a_i, each in [0,b),
then the solution is to write a function like z = a_0 + a_1 *b + a_2
*b^2 + ... a_n * b^n. But for b=numeric_limits<unsigned int>::max,
this reduces to the same thing as a big list of unsigned ints. Which
is basically what you were originally trying to avoid.

So I think your idea may not be perfectly suited for the application
of a SIunits-like setup. We have a priori knowledge about this system
(we can easily make a basis set of units and know they will only occur
in rational combinations) which I suspect we should use to our
advantage.

However, for another system of units for which we don't have as much a
priori information, your hash idea might be a very good solution.
We've been tossing around examples of things like a liter of soda, a
pound of rice, etc. The "units" of such a system need to have two
parts: a basic unit (meter, second, etc.) and a tag identifying what's
being measured (gold, cpu time, etc.). While boosters will probably
write a very good bottom layer for handeling the SI (and other
systems) basic units, the people writing the boost library won't know
what crazy tags the users will create. We do not have much a priori
knowledge of what kind of tags they will use. I'm thinking that the
top layer might be well suited to a hash function approach. A hash
could be performed on the name of their tag (most of the time they'll
want to create a string for printing along side the value).

I beleive this functionality could also be obtained by using a linked
list type template parameter to pass along the dimensions of the
units. But if users create many units, this could lead to very long
compile times. I'm guessing that a fixed length template parameter to
handel the SI dimensions plus a hash to handle user defined units/tags
might have the same benefits and compile faster (just my guess).

For user defined units/tags, the desired allowable operations are
different. For example kg of liquid = kg of water + kg of vinegar
makes sense. I'm having a harder time coming up with a ring to
describe those kinds of relationship. I might handel this by giving
each unit/tag a list of hashes that they can convert to. So the
vinegar unit/tag might know that hashes of other unit/tags such as
liquid, acid, ingredient. Then when the compiler saw an addition like
the example above, it could go through the list of allowed conversions
and see if there was a match. I'm thinking it should probably be
recursive. E.g. a unit/tag like cider vinegar would only need to
contain the hash of vinegar and the compiler would search through
vinegar's list of hashes as well.

One advantage of a hash over just an integer starting from 0 and being
incremented each time they add a unit is that then the unit/tag ids
could be consistant across multiple versions or even programs with
different numbers and orders of user defined units. This might be
desirable for someone wanting to output and later input a data file
which included unit information.

Thanks,
E


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk