Boost logo

Boost :

From: Michael Kenniston (Msk_at_[hidden])
Date: 2001-09-04 15:51:36


I've been out of town for a while (and I'm one of those misfits
who does /not/ read email at the beach :-), so I'll try to
respond to the last 40 or so messages on this subject all at
once by lumping together similar concerns and comments.

TOPIC: You need rational powers, not just integer powers.

I agree.

TOPIC: Why not make plane_angle and solid_angle be their own
dimensions?

Because they aren't. (SI's definition, not mine.) The possibility
of confusing a "plain" number with an angle is only one example
of the ways in which two quantities can have the same dimensions
but mean different things. While it would be very handy to detect
angle-vs-number and degrees-vs-radians errors, I don't want to
add a special case and leave the rest of the problem untouched.
If I keep playing with the layered approach discussed below maybe
a good general solution will emerge.

As an aside, other examples of confusible quantities which have
the same dimensions are:

    energy-density / pressure
    absorbed-dose / dose-equivalent / specific-energy
    activity-of-a-nuclide / angular-velocity / frequency
    heat-density / surface-tension
    kinematic-viscosity / thermal-diffusivity
    heat-flux-density / irradiance
    energy / moment-of-force / torque
    heat-flow-rate / power / radiant-intensity

I'm no physicist so some of these may just be synonyms, but some
are clearly different kinds of animals that should not be mixed up.

TOPIC: Dynamic range problems.

As I understand it, this is simply a problem of exponent overflow
in the underlying representation: i.e. if you are dealing with
both "megaparsecs and femtometres" you may exceed the capacity
of whatever floating point type you are using. There should be
no problem with significant digits; this is purely an exponent
issue.

One suggested solution is to parameterize the quantity types to allow
the user to specify what a "1.0" in the underlying representation
type should actually represent. I believe SIunits calls this
"calibration", and my prototype quantity library in the files
area includes a cheap macro hack to do something similar.

However upon further consideration I now believe that, although the
problem it attempts to solve is a legitimate one, calibration
is the wrong approach. Here's my reasoning:

1) It is simply good design to hide the underlying representation
from the user. There are practical problems with this (overflow),
but there are even greater practical problems (complexity,
compatibility, maintenance) with exposing it.

2) Calibration does not fully solve the problem. Assume that Joe
writes a library that requires femtometers (not simply "uses," but
/requires/, since he presumably wouldn't have gone to the trouble
of recalibrating to femtometers unless he was running into overflow
problems). Sally writes a library that for similar reasons /requires/
megaparsecs. Now I want to to use both, but I can't, at least not
easily. Even if conversion functions are defined, I will likely run
into overflow problems myself when I start mixing the two libraries.

3) If we insist that a quantity/dimensions/units library include a
calibration feature to avoid overflow problems, then for exactly
the same reasons we should logically insist that /every/ library
that deals with floating point numbers should also include
calibration. This would be a nightmare.

4) This same problem has been faced - and solved - before.
Think back to the 1950's or so (yes, I know we're all /much/ too
young to remember, so just pretend :-) when all we had were integers.
To do scientific or engineering calculations, you had to put an
implicit decimal (or binary) point into each number. Then you
had to keep track of where it was, and adjust it with shifts or
divides from time to time. It would clearly be quite an
understatement to characterize this process as "tedious and
error-prone," so we invented floating-point representations.
(If you strip away the details, all a floating-point number really
does is keep track of the decimal point for you.) There must
have been vigorous debate about efficiency vs. simplicity etc.,
but in the end the new, more sophisticated floating point data
type was the clear winner.

It appears to me that we have a nearly identical problem with
quantities. We've merely encountered a problem domain where
the underlying representation type just isn't up to the job, so the
appropriate solution is to build a better representation type, i.e.
a floating point with a much larger exponent range. I don't know
enough to build one, but I'm sure someone here does, and the
"unlimited integer" classes provide ample precedent for this kind
of thing.

Once such an extended-range float type exists, it can be used as
the underlying representation type for /any/ class that needs it.
(I do think it reasonable to ask that libraries parameterize
on representation type.) It isn't even terribly surprising that
cosmologists who deal with quarks and galaxies simultaneously would
need a special floating point representation that can deal with both
/very/ large and /very/ small numbers far beyond the typical
requirements of accountants and engineers.

TOPIC: I still think the "meter()" syntax is ugly. My users
want to say "meter".

(Ok, nobody actually said this recently, but I wanted to answer it.)

I think it's ugly too, and I found Walter Brown's survey results
compelling. If we use the trick suggested for the arithmetic
constants library to make a function call look like a variable
reference, there is probably no reason not to use the more
natural "meter" syntax. In fact I'm starting to have doubts
about whether even that complexity is worth it; on today's machines
perhaps just making "meter" a vanilla variable and doing one extra
multiply whenever you say "* meter" is acceptable in real programs.

TOPIC: The representation should be parameterized to match
the units that my users customarily use.

This is the "I put in 11.2 and now it says 11.1999" problem, but I
still don't see that it's really a problem. Anyone sophisticated
enough to be using a C++ quantity library should realize that all
real-world measurements are approximate, and they should be producing
output with the proper number of significant digits. In effect this
is the same problem as "I divided 2 feet by 10 and got 0.1999999 feet",
or "I typed in 1.2345678901234567890 and got back 1.2345679",
which is simply they way computer floating-point numbers work.
Not everyone has to be a numerical guru, but any programmer using
floating points needs at least this basic level of understanding
of their inexactness.

Overall, I think the value added by such parameterization would be
marginal, and that it's not worth the added complexity (see "zero
cost features" below). I would also emphasize again that users
/never/ see SI units unless they want to, and that the use of Imperial
units is no more complicated than the use of SI units, so this
really isn't a user-interface issue. I agree completely that it
must be very simple to use the library for simple tasks, and that
a clean design will not significantly favor one system over the other.

As far as "imposing a conversion", the users never see that -- they
just write code like "x = 4 * feet". (Granted the prototype
quantity library prints things out in SI units, but that's just
a default that I haven't bothered to extend, not an architectural
decision. SIunits actually does allow user-specification of
output units. Ideally this should probably be done with locales.)

Some have claimed that doing conversions internally will introduce
error. Theoretically yes, but I fail to see how multiplying by a
float or a double will introduce an error anywhere near as
large as the one you introduced when you took the original
measurement by looking at a ruler really, really, really closely.
And if you actually do have instruments capable of 32-bit
resolution, then you should be using 64-bit calculations anyway.

My inclination here is to always use the same unit internally
(simpler code, simpler file compatibility, etc.). My inclination
is also not to tell anyone what that unit is, since they shouldn't
have to know, and if they don't then they can't complain about not
liking it. :-)

TOPIC: I don't need joules and farads and newtons, I want to define
my own units.

I very much appreciate the input about units that don't fit into
the SI-framework, e.g. for financial calculations. I'm convinced
that there is a real need here for user-defined quantity-types,
and I like the suggestion to layer a dimensions library on top
of a lower-level foundation quantity library.

In fact I now suspect there are at least three layers:
the definition of the basic framework, the definition of specific
systems of quantity-types and values (one instance of which is the
dimension-checking system for physical quantities), and the
definition of the actual units used (which are merely arbitrary
quantity-value constants). One catch here is that when we define
the dimension layer, we have to use templates to define a rather
large set of types and operations without specifying them all
individually.

I don't quite see how this all fits together yet, especially the
kilogram-of-chocolate/kilogram-of-concrete distinctions, but I'll
keep trying to find some kind of "formalism of measurement"
(it sounds too grandiose, and is probably incorrect, to call it an
algebra or calculus) that will provide some guidance for
structuring such libraries. And as if this all weren't hard
enough, methinks times and dates should fit in here somewhere
as well, along with the "affine vs. linear" distinctions
mentioned much earlier.

There must be some prior art in the literature for the class-per-unit
with explicitly defined conversions approach. I'll see what I can dig
up, but if anyone knows of anything relevant please point me in
the right direction.

TOPIC: Some want their units to convert automatically, others don't.

This one is easy: The library should do automatic conversions only
where appropriate.

I'm not just being cute here; with a layered approach the layer
that defines the quantity types can also define which automatic
conversions are appropriate. If you don't like the way it does
it, just swap in a different implementation of that layer.

TOPIC: "Zero cost" features.

A common statement is that feature X can be provided at no cost.
No runtime cost, perhaps, but let's not forget the cost of
compilation, and especially the cost of intellectual complexity.
If adding a feature makes the whole library harder to understand
and bogs down the compilation of large programs that use it, I
consider that a significant cost and a serious argument against
inclusion.

The one major exception (that I can see) is when you can layer the
library so that most users never see or use the complexity they
don't need. Then the cost of a feature really can approach zero.
I'd really like to figure out how to use this layering approach
to factor "multiple models of the universe" out of the base
library so that only the high-energy physicists have to pay for it.
It also looks very appealing to factor the whole dimension-checking
business out so the financial analysts don't have to pay for that.

TOPIC: Miscellaneous comments about "base vs. derived" units and
"pounds".

SI defines specific meanings for the terms "base" and "derived" units
which don't necessarily match the intuitive meanings. There is also
something we might call a "basic" unit, i.e. the thing for which
the underlying representation uses a "1.0". Any quantity library
(especially the documentation) must be careful to use standard terms
correctly, but its design need not be constrained by them, and in
particular the "basic" unit need not be the SI "base" unit.

The whole discussion about pounds all boils down to the fact that you
have
to be careful about names. Even my little prototype quantity library
defines "pound_avdp", "pound_troy", "slug" (all units of mass), and
"pound_force" (i.e. weight). To avoid confusion there is quite
deliberately no unit named just "pound". You could easily add
"pound_german", "pound_Casseler", "pound_Berliner", "pound_Weiner",
etc. without introducing any ambiguity; they are all just arbitrary
constants.
Of course you /cannot/ convert pound_avdp to pound_force without
explicitly specifying the strength of the gravitational field.

For an even better example of name proliferation look at the eight
different "tons" in the prototype headers.

TOPIC: NASA and Mars probes.

My understanding is that the error was mismatched units, not dimensions,

but any dimensions library would almost certainly have included checking

of units as well. It's tempting to make a snide remark here about the
USGovt (NASA) not talking to the USGovt (Fermi), but in all fairness I
must admit that at the time the software on the space probe was
developed,
SIunits probably wasn't released yet.

TOPIC: In some kinds of numerical analysis it's useful to have
floating-point numbers of roughly unit magnitude.

This is completely new to me. Can it be explained in 1000 words
or less? Is it important to support directly in a library, or
would it be better for algorithms that need it to adjust their
inputs?

TOPIC: Some "constants" are experimental numbers with finite precision,

rather than defined values; they may have to be updated when experiments

get better.

There seem to be two choices: the values are either compiled-in or
read in at run-time. I find it rather interesting that a similar
issue arises with timezone calculations, where the rules that determine
daylight vs. standard time in any given locality may change
unpredictably
at the whim of the legislature. In the last system I worked on we
decided
this information should be read at runtime (also the way *nix does it),
but that leads to some configuration pain because your executable is no
longer as portable. It was tolerable in our context, but I'd hate to
have
to drag around a special file just to be able to use the speed of light
or mass of an electron.

--
- Michael Kenniston
  mkenniston_at_[hidden]
  msk_at_[hidden]    http://www.xnet.com/~msk/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk