Hi,

Andrew Hundt wrote:
This also brings up another important question about R-tree, what are the dimensional limitations of this implementation, can simple geometries such as a point, box, sphere or ellipse be handled in 3 dimensions, 4 dimensions, or even 100 dimensions?

Yes, as far as your compiler can compile such deep recursive tamplates.

 
I'm not sure how performance would compare, and I haven't tested it. It may vary a lot based on how sparse/dense the data is and the number of entries. I would hypothetically expect the octomap to have an advantage as density increases since with fixed voxels repeated entries at the same location would affect confidence levels rather than increasing the total amount of data stored. Perhaps there would also be a similar way to elide data with R-trees as well and thus mitigate that possible source of performance differences.

Yes, as I wrote above, the R-tree could be used to always store Boxes of a grid of the lowest level of the quadtree - voxels. Since the R-tree is an index and its interface doesn't allow the modification of the stored values, voxels geometrical data (Box) should be stored with some ID, and voxels data should be stored in some container outside the R-tree. Or some custom updating visitor could be used for this, but this is not officially supported.

I see, this is one of the interesting details I was hoping to find out. So essentially this should be used as an index for fast lookups, rather than a container for the real data.

In my possible applications one of the challenges is that everything is (soft) real time, so new data is constantly arriving and the geometries are not static. For instance, a car, person, or arm will often move between time steps.


AFAIU this is different use-case than the one mentioned above, where samples are stored?

What are the limitations here of a spatial index? 
  • Could it eventually be used to store mutable data, or would that pose a fundamental problem?

It could but is not supported. Otherwise the user could modify the part of the stored Value keeping the geometrical data which would brake the index. This is similar thing to the std::set<> storing const Value and std::map<> storing const Key. And it's a spatial index, not container.

    • Perhaps there is more than one level at which data is "mutable" such as data/geometries that change location vs fixed data/geometries that change a stored value?

Initially I thought about implementing a Container or two Containers with the interface similar to std::set<> and std::map<> but decided to go into the direction of the Index capable to store arbitrary types.

We could of course return to the idea of Containers and think about what should be done in order to implement some based on the R-tree code. Should they be separate templates or could the rtree be modified? E.g. what if the rtree for passed Value = std::pair<Box, Id> always store std::pair<Box const, Id>. Would it brake users' code? Probably yes. Should we rather implement geometry::container::rtree_map<Indexable, Data> and always store std::pair<Indexable const, Data>? Etc.

    • What are the implications of a custom updating visitor, given that it is unsupported?
      • Would it definitely break things or probably work?

Custom, user-defined visitors would definietly work but might stop compiling with some future release of Boost.Geometry if something were changed internally.

      • Would it be extremely tricky or relatively straightforward?

It depends what you'd like to do. The visitors are quite simple to write, they're similar to Boost.Variant static visitors. E.g. take a look at https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/utilities/print.hpp line 132. It's a visitor recursively printing the contents of the R-tree.

      • Could a custom updating visitor update the locations of data or just the value?

Definietly it could not just update the stored value. It might naiively expand all of the ancestor nodes boxes but this would result in great overlap of the nodes. So instead this moving_visitor should search for the old value top-down remove it from the node (handle underflow locally), search for the place to insert the new one somewhere near the old node bottom-up and down again (handle overflow locally) and then handle underflow and overflow globally bottom-up. To implement it the inserting and removing visitors of all balancing algorithms should probably be searched for parts which could be reused, parts of the algorithms run per level. This in fact could be a nice addon to the R-tree and could be a part of the official interface.

It's very probable that in most of the cases the Value would be moved inside the same node or somewhere very close. But still we'd be forced to search for the old value which would be O(logN), however in the case of the R-tree it could be quite ok since the R-tree might have small number of levels. E.g. for Max = 16 to represent 1M Values one'd need 5 levels.

It would be possible to implement a container similar to std::map<> providing non-invalidated Iterators to Values. They could be passed to the Rtree to indicate a position of the removed Value (this would be O(1)). And they had to be not invalidated by the remove() because otherwise it would be not possible to remove more Values than one. To do this we'd be forced to add some pointers to parents to the R-tree nodes and to allocate the memory for each ValueNode separately but this could be done. On the other hand, going bottom-up we'd be forced to search the parent node for the child we came from (O(Max)). So we'd get the non-invalidated Iterators and (probably) faster remove but the queries would (probably) be slower.

The above are only my predictions since all of those things should really be implemented and the performance measured to be sure.

    • Could shared pointers to objects be stored?

Yes, but:
1. it could be not safe since one could modify the object pointed by the shared_ptr (and modify the Indexable data or in other words a Key),
2. it would probably be slower than storing the Geometry directly in the R-tree because dereferencing shared_ptrs<> would ruin cacheing of Values.
So I'd advice storing std::ptr<Box, ID> or std::ptr<Box, shared_ptr<>>, but one can use whatever he like.

  • Would it make sense to use the index to access or cross reference dense data such as pixels of a depth image and points from a laser range sensor, or is it best for sparse data sets like querying higher level objects such as a "ground region" or "people on the ground" after processing raw data?

It would make sense in both situations to use some kind of spatial indexing or partitioning technique. In some use-cases the R-tree could be better than in the others.

  • What would be the best way to index or track objects/data that move? 
    • Examples include an object representing a person walking around or joints moving on an arm.
    • Perhaps the index would need to be reconstructed, or objects removed and reinserted at each step to incorporate the new data?

Yes, currently the old value should be removed and the new one inserted.

    • Would one be more efficient than the other due to bulk loading?

Bulk loading would definietly be faster than inserting each value separately. I tested its speed and for Points it's as fast as the kd-tree created in the contigeous memory using a recursive algorithm.
The difference in the time of rebuilding the whole tree and reinserting values would depend on the number of values. If it were some big number it would probably be faster to create the container from scratch.

I know the answer to some of these questions may vary dramatically from one specific problem/implementation to the next, but it is helpful to at least get an idea of what is known, unknown, and what is highly variable. 

A reasonable approach would be to pick some container/library which is fairy simple to use and then check the performance, profile to find bootlenecks and if they were caused by the container, test some other one or think about an other algorithm.

Regards,
Adam