Boost logo

Boost :

From: conrado_at_[hidden]
Date: 2003-07-28 12:16:59


Dear Boost friends,

> was unwittingly working with apples and oranges. Correct me if I'm wrong,
> but it really feels like the two containers are a scalar data type (ignoring
> the attached associative data structures) and a metric (or "distance"
> function). "Metric" is already a much more generic term than "distance" or
> even "spatial," although related. I don't have a problem with the word
> "metric," (although I suspect it's a term more familiar to physicists or
> topologists than to programmers), but I'm damned if I can think of a good,
> simple term for the first data type. "Distance" is certainly one
> possibility, although it carries a lot of denotative freight that most
> programmers wouldn't be aware of; it has common meaning, which wouldn't
> cover all the uses of this data object, and yet is really a pretty abstract
> topological concept (again, as is "metric"). Anything mensurable can have a
> metric associated with it; what then, do you call the "distance" the metric
> is measuring? Especially if it isn't a "distance" in the conventional
> sense?
>

The discussion about terminology is really useful, because we already
thought a lot about it and we know there is room to improve it.

When we talk about spatial containers we think of containers whose elements have
keys which belong to the Cartesian product

D = D1 x D2 x D3 x ... x DK

for some fixed value K (the "dimension") and for totally ordered
domains D1, D2, ..., DK. The typical example of course is
D1 = D2 = ... = DK = R (the real numbers). Obviously you can easily
endow this space with a metric (= distance), but it is not clear how
to do it for other domains D. For instance, D = Employee with
D1 = name, D2 = salary, and D3 = age. For spatial containers, the
typical queries are orthogonal range queries and partial matches
(example: list all employees that are 23 years old, how many employees
earn more than 50,000 USD whose name starts with "A").

Indeed the name "spatial" is not very fortunate (as Reid points out, in the mathematical
sense a space has a norm (=metric) by definition).
The term "spatial containers" is perhaps confusing (and definitively
not rigorous!), but we decided on that name
because of Samet's well known textbook "The Design and Analysis
of Spatial Data Structures".

We use the term "metric container" when the elements belong to a type
T for which we have a distance function (well, you pass it via a
template parameter which is function object). The function must
satisfy the usual properties of distances, in particular, the
triangular inequality. This inequality is systematically exploited to
speed up nearest neighbor searches: given an object x, find the object
closest to x in the container according to the given distance.
The terms "metric" and "(dis)similarity measure" have been often used
in the literature about algorithms and data structures for the nearest
neighbor and proximity search, and we finally decided on that name.

Conclusion: Spatial containers support range queries and its elements
are K-tuples. Metric containers support nearest neighbor queries and
there is a measure of similarity or distance between elements. Of
course there are case for which both types of queries make sense (so
spatial containers can support nearest neighbor searches if a metric is
supplied; but the metric is not required at construction time, whereas
the metric is a must when you create a metric container).

Last but not least I must confess that "Spatial and Metric" made for
a nice acronym: SMTL ;-)

Conrado


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk