Subject: [ggl] rtree query interface
From: Barend Gehrels (barend)
Date: 2011-08-01 01:55:43
Thanks for your proposal.
On 30-7-2011 13:45, Adam Wulkiewicz wrote:
> 1. Geometrical relationship filters
> Some filters may be taken directly from ggl algorithms:
> intersects(g) or intersected_by or intersecting
> overlaps(g) or overlapped_by or overlapping
> t | within(g) - get values within some geometry
> t | intersects(g) - get values intersecting some geometry
Basically I like the clean syntax. These range adaptors, having the same
name, should be in another namespace and if users hoist both namespaces
with the using-clause, it will be ambiguous. So namespace will be
necessary here (or in the normal function), making it a bit longer.
What if we don't do this and just use the "filtered" adaptor defined by
t | filtered(within(g))
The advantage is that AND, OR and NOT are much easier to handle (if
people write their own predicate - but maybe Phoenix or Lambda can be of
help here, didn't check this now).
However, the basic requirement is that queries like this are using the
spatial index. So I don't know how the filtered would use that index. I
assume that "within" would do that (did you already did some work
here?). So we indeed need "something which is a Boost.Range adaptor,
uses the spatial index, and is capable to do user-defined queries at
We might consider adding a "spatially_filtered" adaptor which acts like
> Note that some filters may refer to the geometry which is passed or to
> the values stored in the rtree. Someone may want to get values within
> some geometry or to get values containing some geometry. He may just
> use previously described filters and connect them (described further)
> but then e.g. 2 operations must be done instead of 1. And for some
> queries this couldn't be possible (with operator| only). So there
> could be:
> - additional filters: t | outside(g)
Outside is actually not (always) the same as ! within, because if a
geometry partly intersects another, it is ! within and also ! disjoint.
Outside might be renamed to "disjoint" here.
> - filters generated by some operator: t | ~within(g) - confusing
> - object passed before in the sequence: t | inv | within(g) - even
> more confusing
> - object passed with filter: t | inv(within(g))
> In addition to this every filter can have it's negation generated by
> !within(g) == not_within(g)
> !intersects(g) == not_intersects(g)
> Eventually there could be additional filters e.g. "get objects with 0
> coordinate greather than some other object" coords<0, more>(g) etc. It
> would be nice to have e.g. "get points on the right of the linestring"
Basically I like the idea of ! within(g), it is very readable.
But it is, if I'm right, different from Boost.Range adaptors which
cannot be negated like this. The Boost.Range adaptor overloads operator|
to get its effect, you additionally need to overload operator! for a
range adaptor as well. I didn't try but don't know if this is possible
(probably it is).
Besides that we have to be careful what is exactly the negation, see
> 2. K nearest neighbour filters
> nearest(g) - get nearest value to the geometry g
> nearest(g, 5) - get nearest 5 values to the geometry g
What will the first line return? All values in order of distance?
> Folowing filters describes geometrical relationship but they'd
> probably be used with NN filter.
> further(g, comp_dist1)
> closer(g, comp_dist2)
> range(g, comp_dist1, comp_dist2)
> In addition there could be k furthest neighbor filter:
> furthest(g) == !nearest(g)
The last line I don't understand... What if we have only 4 elements and
we want the nearest(g, 5). It will return 4 values. The furthest will
also return 4 values, so it is not the negation.
> 3. Filters connection
> 3.1. operator|
> "get values within geometry g1 AND not intersecting geometry g2"
> t | within(g1) | !intersects(g2)
Concatenation is possible in Boost.Range adaptors and I like it. So it
will work automatically (besides the ! syntax). It might not be the most
efficient in all cases because they are filtered twice, so the case you
describe (g1 AND not g2) is implemented differently (though the effect
is the same). You mention that also below. And how do both make use of
the spatial index?
Therefore, the filtered predicate might be better, being able to do this
in one step.
> "get 5 nearest to g2 values overlapping g1" which means
I don't understand this example, because something overlapping something
else does have a distance of 0.0, so nearest is more or less
undertermined here. With the centroid below it is more clear.
> "get values overlapping g1 AND 5 nearest to g2"
> t | overlaps(g1) | nearest(g2, 5)
> Using this operator filters may be processed sequentially from left to
> right. Like in streams there may be manipulators used e.g. "get
> objects which centroids are nearest to the geometry g1" could look
> like this:
> t | centroid | nearest(g1, 5)
Do I understand here that the centroid is a range adaptor, which returns
the centroids from all geometries present in the spatial index? In that
case I agree with this syntax.
> instead of e.g.
> t | nearest<centroid_tag>(g1, 5) or
> t | nearest(g1, 5, centroid) or
> t | nearest(g1, 5, some_strategy)
> t | centroid::nearest(g1, 5)
> looks nice.
Also possible but might be confusing - it mixes namespace with
behaviour. In this case I would prefer centroid_nearest (which is IMO
> But, queries with logical alternatives can't be described this way
> e.g. "get objects within g1 OR intersecting g2". This kind of query
> performed by one tree traversal could be faster than two separate
> queries. Moreover one probably will have to check if there are
> repeated values in both results.
So here the spatially_filtered adaptor will be of great help.
> Btw pipe operator| is used here as a conjunction and this is c++
> binary alternative.
> 3.2. expressions
> By using more operators we lose sequential nature of the querry but we
> get more powerful expressions which may be passed to the query e.g.
> "get objects within g1 AND not intersecting g2"
> t[within(g1) & !intersects(g2)]
> "get objects within g1 OR intersecting g2"
> t[within(g1) | intersects(g2)]
> "get points closest to the linestring in the areas intersecting
> geometry g1 or within geometry g2"
> t[(intersects(g1) | within(g2)) & nearest(ls, 10)]
> "get boxes which centroids are closest to the g3 in the areas
> intersecting geometry g1 or within geometry g2" e.g.
> t[(intersects(g1) | within(g2)) & centroid(nearest(g3, 10))] or
> t[(intersects(g1) | within(g2)) & centroid::nearest(g3, 10)]
Yes, it is looking nice, and I would suggest to verify if we can use
Boost.Range filtered in combination with Phoenix to get this effect.
Further the & should be &&, and the | should be ||, IMO.
So something like t | filtered(within(_1) && ! intersects(_1))
> But I don't know if all future expresions would be easy to implement
> e.g. expressions with some number of KNN filters alternatives would
> probably gather results in appropriate number of containers because
> objects must be sorted by distance for each KNN filter. Then results
> would probably be connected (checks for repeating objects). But this
> is probably possible.
> Or there could be only one KNN search per query but this isn't nice.
So, in general, I like the idea of querying spatial indexes. I would
suggest to see how Phoenix helps us by defining spatial query
combinations, while making use of the index as well. It is very
interesting and it would be great if it would all work clear and
efficient like that.
Geometry list run by mateusz at loskot.net