Boost logo

Boost :

Subject: Re: [boost] Geometry and spatial indexes, my opinion
From: Patrick Mihelich (patrick.mihelich_at_[hidden])
Date: 2008-10-08 17:33:41


On Wed, Oct 8, 2008 at 7:51 AM, Brandon Kohn <blkohn_at_[hidden]> wrote:

> Ok, I take your point. I wasn't suggesting that a library not be able to
> handle more than 3 dimensions. Rather I would suggest that the population of
> developers who have such requirements must be very small. Considering that
> generalization on dimension is not trivial for many algorithms, this leads
> me to the conclusion that the specific algorithms will decide the
> requirements on their inputs. This is why I chose a traits based interface
> for the geometry library I've been implementing. There's nothing to stop one
> from creating a multi-dimensional Cartesian/polar/parametric access traits
> interface for any algorithm which requires it. So I suppose I agree, the
> geometry library should not constrain on the number of dimensions. But I
> don't agree with full agnosticism (to mean agnosticism of coordinate frame
> as well.) There should be a way to differentiate points further to align
> with the concrete traits of the dimension they describe.

This sounds reasonable. Certainly there are many 2D/3D algorithms for which
there is little point in generalizing to higher dimensions.

> Yes, I have taken a slightly different route than the other libs I have
> found in the vault. In my library the algorithms require inputs conform to a
> specific access traits concept. I have been using get_x, get_y, get_z as the
> general interface for cartesian (and would have used get_r, get_theta,
> get_phi for the others.. if I had gotten to that), but I think now that I'm
> fairly well persuaded to investigate a compile time indexed mechanism like
> boost::tuple. Has anyone done a performance test on using such a mechanism
> in practical settings? While I'm sure it's constant, how big is that
> constant in practice? Is it worth paying in cases where an algorithm
> requires a constrained dimensionality? I'm guessing nobody has really
> addressed these questions... so perhaps I'll take a gander and see if it's
> worth providing both interfaces in my access traits.

Cool, compile time indexing is very useful. But, I'm not sure I see the need
for using tuples. I can't offhand think of a use case where the dimensions
are of heterogeneous type, so wouldn't a simple array work?

>
> Back to Mathias' criticisms - although I realize it is an alpha, I was
>> similarly a bit disappointed when looking at the Spatial Indexes library.
>> It
>> is good code (I hope that Federico will not be discouraged by criticism!),
>> but it is very limited in what it can do currently. It has no nearest
>> neighbor search, nor even a traversal interface that would allow the user
>> to
>> implement it himself. It only provides 2D structures when it is very
>> useful
>> to support spaces of arbitrary dimension. I would like for the library to
>> be
>> generic in the dimension, not just the (2D) point type. Perhaps a good
>> place
>> to start is the quadtree - I do not see why the difference between a
>> quadtree and an octree should be more than an int template parameter.
>>
>> I'm admittedly opening up a bit of a can of worms here - nearest neighbor
>> search in high dimensions is a Hard Problem. However, there are
>> straightforward approximate nearest neighbor algorithms that are efficient
>> and plenty good enough for many applications.
>>
>>
> I can't say that I have seriously considered this. Just out of curiosity,
> how do kd-trees scale memory-wise on average in high dimension?

They scale quite well memory-wise - each node splits on a single dimension,
so the size of the internal data is independent of the number of dimensions.
Of course the size of the data points themselves is linear in the number of
dimensions.

The real problem with using kd-trees in high dimensions (for nearest
neighbor search) is the "Curse of Dimensionality." Nearest neighbor search
using kd-trees consists of (very succinctly) a depth-first search to the
node containing the query point, followed by a back-tracking
branch-and-bound search to ensure that we find the data point closest to the
query point (as they need not be in the same node). For low dimension this
tends to be quite efficient. For d > 10 or so, we often end up looking at
most of the nodes in the tree during the back-tracking search... which can
be slower than a naive linear scan of the data points. This is why we resort
to approximate methods in high dimensions. One popular way is to simply cut
off the back-tracking search after some number of nodes, which we expand in
order of their nearness to the query point. In the vision community this
algorithm is known as Best Bin First.

Cheers,
Patrick


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk