Subject: [ggl] rtree query expression
From: Simonson, Lucanus J (lucanus.j.simonson)
Date: 2011-08-22 22:57:20
Adam Wulkiewicz wrote:
> Adam Wulkiewicz wrote:
>> kNN in the next email.
> kNN (k nearest neighbours) is a specific example of a type of query
> I'll be calling sorting query. Consider queries:
> Get k nearest/furthest to some point objects
> Get k objects with smallest/greatest area
> Get k objects with smallest/greatest perimeter
> Some characteristics of this type of query
> 1. It make sense only if some number of elements is going to be
> returned. If k is infinite all elements are returned so there is no
> need to perform this type of query.
> 2. Number of returned elements k should describe whole query because
> we can't return e.g. 5 elements and 10 elements at the same time.
> 3. It doesn't make sense to perform some number of sorting queries at
> the same time e.g. object won't be closest to point1 and at the same
> time to point2 or won't be closest to some point and at the same time
> probably won't have smallest area, this may be taken as a secondary
> criterion (e.g. if distances are equal) but still, just the closest
> elements will be returned.
> 4. Still along with this sorting query, spatial predicates may be
> checked at the same time e.g. "find k objects closest to some point
> but further than 10 and objects should be within some geometry".
> 5. It doesn't make sense to return some number of elements using
> not-sorting query e.g. "get 5 elements within some geometry" because
> we don't know which one should be returned.
> It would be the best if grammar of the expression will take those
> points into account. It's obvious that sorting queries should be in
> some way separated from non-sorting query. Moreover it should be
> natural to ommit one of them.
Before we create an EDSL for this lets see if a simple design will work.
It seems to me that sorting queries are fundamentally different from query criteria, no need to mix them.
Why not have a regular query operation that is a fuction named query() that accepts query criteria such as inside/outside of query boxes. Then for each type of sorting query you have an alternative function call like nearest_neighbors() that accepts arguments specific to nearest neighbor query and then the normal arguments that query would accept. For predicates why do we need to combine them with operators? Why not just compute the query region and have a flag (or enum) whether to accept geometries that touch the query region versus are completely contained within it. For example: "find k objects closest to some point but further than 10 and objects should be within some geometry" would become:
neighbor_result = nearest_neighbors(query_tree, some_point, k, 10, some_geometry, contained_completely);
Why not use the boolean operations on polygons to generate the single geometry that is your query region. Use the bounding box of the query_treen NOT whatever keepout region to invert the logic of the query region. Only if you have a complex case such as give me objects completely inside box1 AND touching or completely inside box2 would you need something more, since different semantics apply to different areas of the query region. That is a pretty strange query, not at all the common case. I would find support for it surprising.
Geometry list run by mateusz at loskot.net