Subject: [ggl] rtree query interface
From: Adam Wulkiewicz (adam.wulkiewicz)
Date: 2011-08-07 14:53:30
Barend Gehrels wrote:
>> t | within(g) - get values within some geometry
>> t | intersects(g) - get values intersecting some geometry
> Basically I like the clean syntax. These range adaptors, having the same
> name, should be in another namespace and if users hoist both namespaces
> with the using-clause, it will be ambiguous. So namespace will be
> necessary here (or in the normal function), making it a bit longer.
Of course you're right. I assumed that everything is in the
geometry::index namespace. So some algorithms from geometry corresponds
to the filters in geometry::index.
> What if we don't do this and just use the "filtered" adaptor defined by
> t | filtered(within(g))
It's ok for me. There also may be some standard filters implemented:
index::nearest_filtered. intersects_filtered is implemented already as
Btw note that geometry::index filters should be the first ones in the
sequence of filters aplied to the container.
t | some_not_index_filter() | index::filtered(...)
> The advantage is that AND, OR and NOT are much easier to handle (if
> people write their own predicate - but maybe Phoenix or Lambda can be of
> help here, didn't check this now).
Or Proto if it couldn't be done with Phoenix.
> However, the basic requirement is that queries like this are using the
> spatial index. So I don't know how the filtered would use that index. I
> assume that "within" would do that (did you already did some work
> here?). So we indeed need "something which is a Boost.Range adaptor,
> uses the spatial index, and is capable to do user-defined queries at
> We might consider adding a "spatially_filtered" adaptor which acts like
spatial_filter<...> is implemented as mentioned before (intersects).
It's just generated by
Currently the querry is performed in the spatial_filter<...>'s
constructor by calling rtree<...>::find(...) method.
>> Note that some filters may refer to the geometry which is passed or to
>> the values stored in the rtree. Someone may want to get values within
>> some geometry or to get values containing some geometry. He may just
>> use previously described filters and connect them (described further)
>> but then e.g. 2 operations must be done instead of 1. And for some
>> queries this couldn't be possible (with operator| only). So there
>> could be:
>> - additional filters: t | outside(g)
> Outside is actually not (always) the same as ! within, because if a
> geometry partly intersects another, it is ! within and also ! disjoint.
> Outside might be renamed to "disjoint" here.
Got it. I forgot about disjoint. Any other algorithms which could be used?
>> - filters generated by some operator: t | ~within(g) - confusing
>> - object passed before in the sequence: t | inv | within(g) - even
>> more confusing
>> - object passed with filter: t | inv(within(g))
>> In addition to this every filter can have it's negation generated by
>> !within(g) == not_within(g)
>> !intersects(g) == not_intersects(g)
>> Eventually there could be additional filters e.g. "get objects with 0
>> coordinate greather than some other object" coords<0, more>(g) etc. It
>> would be nice to have e.g. "get points on the right of the linestring"
> Basically I like the idea of ! within(g), it is very readable.
> But it is, if I'm right, different from Boost.Range adaptors which
> cannot be negated like this. The Boost.Range adaptor overloads operator|
> to get its effect, you additionally need to overload operator! for a
> range adaptor as well. I didn't try but don't know if this is possible
> (probably it is).
> Besides that we have to be careful what is exactly the negation, see
> comment above.
In the case of using ranges, just another type of range would be
generated by operator! e.g. index::ranges::not_within. And the meaning
should be exactly as the meaning of !geometry::within(). Just like in
the case of using the loop, checking all elements and returning values
meeting !within(some_geometry) requirement.
>> 2. K nearest neighbour filters
>> nearest(g) - get nearest value to the geometry g
>> nearest(g, 5) - get nearest 5 values to the geometry g
> What will the first line return? All values in order of distance?
One value nearest to the geometry. If we have container of points and
the geometry is a point it would be the closest point. Internally there
would be algorithms from geometry used so if there was
comparable_distance()/within()/intersects()/etc. implemented for
Indexable and Geometry the query would compile, otherwise it wouldn't.
>> Folowing filters describes geometrical relationship but they'd
>> probably be used with NN filter.
>> further(g, comp_dist1)
>> closer(g, comp_dist2)
>> range(g, comp_dist1, comp_dist2)
>> In addition there could be k furthest neighbor filter:
>> furthest(g) == !nearest(g)
> The last line I don't understand... What if we have only 4 elements and
> we want the nearest(g, 5). It will return 4 values. The furthest will
> also return 4 values, so it is not the negation.
operator! was a suggestion. If we want to return sorted ranges for
nearest() and furthest() then nearest() will generate range sorted from
closest object and furthest() from furthest.
>> 3. Filters connection
>> 3.1. operator|
>> "get values within geometry g1 AND not intersecting geometry g2"
>> t | within(g1) | !intersects(g2)
> Concatenation is possible in Boost.Range adaptors and I like it. So it
> will work automatically (besides the ! syntax). It might not be the most
> efficient in all cases because they are filtered twice, so the case you
> describe (g1 AND not g2) is implemented differently (though the effect
> is the same). You mention that also below. And how do both make use of
> the spatial index?
> Therefore, the filtered predicate might be better, being able to do this
> in one step.
As you noticed, in order to work efficiently, filters from g::index
shouldn't be processed exactly like filters in Boost.Range -
transparently. They should be used only with the rtree object or other
filter from g::index. operators | should generate one compound object
(or an expression if you like) which would be passed to the
rtree::find() e.g. when the begin() method of this object (which is also
a range) is called.
>> "get 5 nearest to g2 values overlapping g1" which means
> I don't understand this example, because something overlapping something
> else does have a distance of 0.0, so nearest is more or less
> undertermined here. With the centroid below it is more clear.
I assumed that nearest just uses comparable_distance() so it will work
if it is implemented for geometries. Moreover I assumed that
comparable_distance() returns closest distance between e.g. geometries'
perimeters or perimeter and some point etc.
If we have a river(linestring) in a country (polygon) and rtree
containing some indexables(e.g. POI) on the entire continent this query
would be something like: "find 5 indexables nearest to the river inside
Btw distances between boxes and other geometries could be useful in the
>> "get values overlapping g1 AND 5 nearest to g2"
>> t | overlaps(g1) | nearest(g2, 5)
>> Using this operator filters may be processed sequentially from left to
>> right. Like in streams there may be manipulators used e.g. "get
>> objects which centroids are nearest to the geometry g1" could look
>> like this:
>> t | centroid | nearest(g1, 5)
> Do I understand here that the centroid is a range adaptor, which returns
> the centroids from all geometries present in the spatial index? In that
> case I agree with this syntax.
It would be a part of the expression but this syntax probably won't be used.
>> instead of e.g.
>> t | nearest<centroid_tag>(g1, 5) or
>> t | nearest(g1, 5, centroid) or
>> t | nearest(g1, 5, some_strategy)
>> t | centroid::nearest(g1, 5)
>> looks nice.
> Also possible but might be confusing - it mixes namespace with
> behaviour. In this case I would prefer centroid_nearest (which is IMO
> looking good)
>> But, queries with logical alternatives can't be described this way
>> e.g. "get objects within g1 OR intersecting g2". This kind of query
>> performed by one tree traversal could be faster than two separate
>> queries. Moreover one probably will have to check if there are
>> repeated values in both results.
> So here the spatially_filtered adaptor will be of great help.
The expression may be passed to some function or operator
t | index::filtered(Expr) | some_filter()
t | some_filter() | index::filtered(Expr) // Error
t | (Expr) | some_filter()
t | some_filter() | (Expr) // Error
t[Expr] | some_filter()
index::filter(t, Expr) | some_filter()
index::find(t, Expr) | some_filter()
>> Btw pipe operator| is used here as a conjunction and this is c++
>> binary alternative.
>> 3.2. expressions
>> By using more operators we lose sequential nature of the querry but we
>> get more powerful expressions which may be passed to the query e.g.
>> "get objects within g1 AND not intersecting g2"
>> t[within(g1) & !intersects(g2)]
>> "get objects within g1 OR intersecting g2"
>> t[within(g1) | intersects(g2)]
>> "get points closest to the linestring in the areas intersecting
>> geometry g1 or within geometry g2"
>> t[(intersects(g1) | within(g2)) & nearest(ls, 10)]
>> "get boxes which centroids are closest to the g3 in the areas
>> intersecting geometry g1 or within geometry g2" e.g.
>> t[(intersects(g1) | within(g2)) & centroid(nearest(g3, 10))] or
>> t[(intersects(g1) | within(g2)) & centroid::nearest(g3, 10)]
> Yes, it is looking nice, and I would suggest to verify if we can use
> Boost.Range filtered in combination with Phoenix to get this effect.
> Further the & should be &&, and the | should be ||, IMO.
> So something like t | filtered(within(_1) && ! intersects(_1))
Have you had in mind something like this
t | filtered(within(_1, g1) && ! intersects(_1, g2)) ?
It would be nice because one might write within(_1, g1) or within(g1,
_1). There would be no additional names.
Nodes and values must be handled differently while traversing. There
would be different checks performed for each but it should be doable.
>> But I don't know if all future expresions would be easy to implement
>> e.g. expressions with some number of KNN filters alternatives would
>> probably gather results in appropriate number of containers because
>> objects must be sorted by distance for each KNN filter. Then results
>> would probably be connected (checks for repeating objects). But this
>> is probably possible.
>> Or there could be only one KNN search per query but this isn't nice.
> So, in general, I like the idea of querying spatial indexes. I would
> suggest to see how Phoenix helps us by defining spatial query
> combinations, while making use of the index as well. It is very
> interesting and it would be great if it would all work clear and
> efficient like that.
Do you think that some simple queries are needed too? E.g.
t | intersects_filtered(g)
t | within_filtered(g)
t | nearest_filtered(p, 5)
What other types of queries besides geometry-related and knn would you
like to have?
Geometry list run by mateusz at loskot.net