|
Boost : |
From: Reid Sweatman (drunkardswalk_at_[hidden])
Date: 2003-08-04 05:43:36
> -----Original Message-----
> From: boost-bounces_at_[hidden]
> [mailto:boost-bounces_at_[hidden]]On Behalf Of conrado_at_[hidden]
> Sent: Monday, July 28, 2003 11:17 AM
> To: boost_at_[hidden]
> Subject: [boost] Re: Boost Digest, Vol 445, Issue 1
>
>
> Dear Boost friends,
>
> > was unwittingly working with apples and oranges. Correct me if
> I'm wrong,
> > but it really feels like the two containers are a scalar data
> type (ignoring
> > the attached associative data structures) and a metric (or "distance"
> > function). "Metric" is already a much more generic term than
> "distance" or
> > even "spatial," although related. I don't have a problem with the word
> > "metric," (although I suspect it's a term more familiar to physicists or
> > topologists than to programmers), but I'm damned if I can think
> of a good,
> > simple term for the first data type. "Distance" is certainly one
> > possibility, although it carries a lot of denotative freight that most
> > programmers wouldn't be aware of; it has common meaning, which wouldn't
> > cover all the uses of this data object, and yet is really a
> pretty abstract
> > topological concept (again, as is "metric"). Anything
> mensurable can have a
> > metric associated with it; what then, do you call the
> "distance" the metric
> > is measuring? Especially if it isn't a "distance" in the conventional
> > sense?
> >
>
> The discussion about terminology is really useful, because we already
> thought a lot about it and we know there is room to improve it.
>
> When we talk about spatial containers we think of containers
> whose elements have
> keys which belong to the Cartesian product
>
> D = D1 x D2 x D3 x ... x DK
>
> for some fixed value K (the "dimension") and for totally ordered
> domains D1, D2, ..., DK. The typical example of course is
> D1 = D2 = ... = DK = R (the real numbers). Obviously you can easily
> endow this space with a metric (= distance), but it is not clear how
> to do it for other domains D. For instance, D = Employee with
> D1 = name, D2 = salary, and D3 = age. For spatial containers, the
> typical queries are orthogonal range queries and partial matches
> (example: list all employees that are 23 years old, how many employees
> earn more than 50,000 USD whose name starts with "A").
>
> Indeed the name "spatial" is not very fortunate (as Reid points
> out, in the mathematical
> sense a space has a norm (=metric) by definition).
> The term "spatial containers" is perhaps confusing (and definitively
> not rigorous!), but we decided on that name
> because of Samet's well known textbook "The Design and Analysis
> of Spatial Data Structures".
>
> We use the term "metric container" when the elements belong to a type
> T for which we have a distance function (well, you pass it via a
> template parameter which is function object). The function must
> satisfy the usual properties of distances, in particular, the
> triangular inequality. This inequality is systematically exploited to
> speed up nearest neighbor searches: given an object x, find the object
> closest to x in the container according to the given distance.
> The terms "metric" and "(dis)similarity measure" have been often used
> in the literature about algorithms and data structures for the nearest
> neighbor and proximity search, and we finally decided on that name.
>
> Conclusion: Spatial containers support range queries and its elements
> are K-tuples. Metric containers support nearest neighbor queries and
> there is a measure of similarity or distance between elements. Of
> course there are case for which both types of queries make sense (so
> spatial containers can support nearest neighbor searches if a metric is
> supplied; but the metric is not required at construction time, whereas
> the metric is a must when you create a metric container).
>
> Last but not least I must confess that "Spatial and Metric" made for
> a nice acronym: SMTL ;-)
Well, I'll just tag a final comment on here, no need for reply. I hadn't
been aware of the Samet work; my approach was (obviously) via the topology
of metrizable spaces, which is a pretty abstract animal. The cross-product
definition of "spatial containers" you cite is, of course, exactly the
definition from that branch of mathematics. The two data objects in your
library, though are two aspects of that definition: the "metric" is the
Cartesian relation itself, while the "spatial container" represents objects
in the range of the metric.
So while I still feel the names are ill-chosen, and could be made more
general and correct...given that they're well-accepted names in the branch
of CS you're working in, and that they result in such a nice FLA <g>, I
concede the issue and leave the opposition in possession of the field. And
I apologize for my penchant for obscurantist metaphors <g>. Gee, ain't I
magnanimous?
I'm guessing that the Samet text you mentioned is an application of metric
topology, likely restricted to the more common engineering cases, to
algorithmic design. Another book to add to the old "list of reading I can't
afford to buy and don't have time to read" <g>. Sounds interesting, though.
Post Scriptum...
Hmmmn. Sorry for this, but I feel the urge for a Parthian Shot, since it
just occurred to me. Is it possible that some of the confusion of types is
due to the fact that there are really two branches of CS combined here? By
this, I refer to distance (in the abstract) measure and database search. It
seems another of those function/object dichotomies that's at the heart of
OOP to begin with. Queries are the main point here, but there's a sense in
which any query is irredeemably defined in terms of some metric, no matter
how abstract. When you come down to it, the whole notion of measure is
really a set-theoretic concept (well, you can define all of math from set
theory as a base, and it's one common approach, but then, you can also
define all of mathematics from other bases, such as probability theory).
Okay, that's probably stretching everyone's tolerance (damn, but I want to
make the obvious pun there... <g>) for OT material, so I have hereby ceased
and desisted. Like I said, no need for reply, unless anything I said was
actually interesting or useful.
Reid Sweatman
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk