I have an rtree where each item in the tree has a group ( Colors perhaps:
red, green, blue, orange...)
I want to do a nearest neighbor search where the search returns the nearest
k-neighbors with at most
one object from each group.
For example, if we have
Object A: Red, distance 1
Object B: Red, distance 2
Object C: Blue, distance 3
Object D: Green, distance 4
and I do a nearest neighbor at most one from each group search with k = 2
I get object A and C. B is skipped since A is closer of the same group.
It looks like I could get this done by creating a specialization of
distance_query_result class and have the distance_query_result::store method
to check the groups/distances of already stored items and act accordingly.
Then create a specialization of distance query and maybe query_dispatch to
get distance_query to use the new distance_query_result.
Is there a better way to go about this?
No, there is no "official" (i.e. exposed in the interface) way of
doing it besides calling query N times for each group/category
(using nearest() && satisfies()) and after that merging the
results. So your approach is more optimal than that however it
relies on the internals of the R-tree so have in mind that they may
be changed some time in the future.
I'd suggest keeping N stacks (neighbors) one for each group and
track them separately.
You also have to somehow pass the number of groups into the
Furthermore you have to implement distance_query_result::greatest_comparable_distance()
and return the greatest distance from all groups but only if all
heaps contain at least one element each.
But I'm wondering, are the elements from all groups distributed
more or less in uniform fashion? I'm asking because otherwise the
query may have to traverse big chunks of the tree before finding
at least one element from each group. Consider the situation where
there is a clear separation between groups and groups are
spatially far from each other. In this case my guess is that it'd
be better to have separate R-tree for each group, perform knn
query on all of them and merge the results.
An alternative to the solution above (for groups spatially far
from each other) would be to have one R-tree but to store in the
nodes the information about the groups of objects sotred in
underlying nodes and custom query visitor taking into account
currently stored neaighbors and groups covered by analysed nodes.
So the idea is that the pruning of nodes would be done both using
the distance to the furthest neighbor already found as well as
groups of neighbors. This would not require to merge the result at
the end since the query woudl handle everything. But it would
require even more changes to the R-tree including introducing new
nodes (containing info about underlying groups), insertion visitor
(gathering the info about the groups from children and passing it
to the newly created node) and packing algorithm (if needed, the
same as insertion).