# Geometry :

Subject: [ggl] rtree query expression
Date: 2011-08-22 22:07:35

...
>
> kNN in the next email.

kNN (k nearest neighbours) is a specific example of a type of query I'll
be calling sorting query. Consider queries:

Get k nearest/furthest to some point objects
Get k objects with smallest/greatest area
Get k objects with smallest/greatest perimeter

Etc.

Some characteristics of this type of query

1. It make sense only if some number of elements is going to be
returned. If k is infinite all elements are returned so there is no need
to perform this type of query.

2. Number of returned elements k should describe whole query because we
can't return e.g. 5 elements and 10 elements at the same time.

3. It doesn't make sense to perform some number of sorting queries at
the same time e.g. object won't be closest to point1 and at the same
time to point2 or won't be closest to some point and at the same time
probably won't have smallest area, this may be taken as a secondary
criterion (e.g. if distances are equal) but still, just the closest
elements will be returned.

4. Still along with this sorting query, spatial predicates may be
checked at the same time e.g. "find k objects closest to some point but
further than 10 and objects should be within some geometry".

5. It doesn't make sense to return some number of elements using
not-sorting query e.g. "get 5 elements within some geometry" because we
don't know which one should be returned.

It would be the best if grammar of the expression will take those points
into account. It's obvious that sorting queries should be in some way
separated from non-sorting query. Moreover it should be natural to ommit
one of them.

Instead of using some number of different words in our expression to
describe different queries (nearest, furthest, with_smallest_area etc.)
we could have e.g. two functions smallest() and greatest() and use other
which return values. e.g.

smallest(comparable_distance(_,P))
greatest(distance(_,P))
smallest(comparable_distance(return_centroid(_),P))
smallest(area(_))
greatest(perimeter(_))

Lets consider queries:

a) "Return objects inside geometry G1 and not intersecting G2"
b) "Return 10 objects closest to point P"
c) "Return 10 objects which are inside geometry G1 and not intersecting
G2 and are closest to the point P"

Example possibilities:

1. Wrap the whole query inside smallest/greatest function:

a) within(_, G1) && !intersects(_, G2) // ok
b) smallest(distance(_, P), 10) // quite ok
c) smallest(distance(_, P), 10, within(_, G1) && !intersects(_, G2))

Not intuitive

2. Use wrapping function

a) same as above
b) same as above
c) query(
smallest(distance(_, P), 10)
, within(_, G1) && !intersects(_, G2)
)
OR
query(
within(_, G1) && !intersects(_, G2)
, smallest(distance(_, P), 10)
)

Not intuitive either, why not use && ?

b) smallest(distance(_, P)) | 10
OR
10 << smallest(distance(_, P))

With second form meaning "closest objects returned in number of 10".

Still not intuitive because query(,) is used.

4. Because && has small priority and there isn't many operators with
lower one left we may change logical operators in spatial query part to
bitwise and comma to logical and.

a) within(_, G1) & !intersects(_, G2)
b) 10 << smallest(distance(_, P))
c) 10 << smallest(distance(_, P)) && within(_, G1) & !intersects(_, G2)
OR
within(_, G1) & !intersects(_, G2) && 10 << smallest(distance(_, P))

But what if we allow the user for using more expressive language i nthe
spatial query? Replace all logical operators by bitwise in the spatial
query part? Except operator! ?

within(_, G1) & !intersects(_, G2) & !(something & something_else | foo)
| bar

Again not intuitive.

5. Force some sequence of &&

c) 10 << smallest(distance(_, P))
&& ( within(_, G1) && !intersects(_, G2) )
OR just
10 << smallest(distance(_, P))
&& within(_, G1) && !intersects(_, G2)

And use allways this sequence of query factors:
sorting_query
OR
spatial_query
OR
sorting_query && spatial_query

This isn't intuitive either because && should be homogenious and
commutative.

6. Don't force a sequence. Just check if there is only one sorting query
defined

c) smallest(distance(_, P), 10) && within(_, G1) && !intersects(_, G2)
OR
within(_, G1) && smallest(distance(_, P), 10) && !intersects(_, G2)
ETC.

6. Not commutative operator, bitwise shift operator may have different
meaning with streams and we can use similar operator either.

c) 10 <<= smallest(distance(_,P))
<<= within(_, G1) && !intersects(_, G2)

It would be good if the operator between queries was processed left to
right but it's ok I guess, the expression must be transformed anyway.

7. operator << but unfortunately with ()

c) 10 << smallest(distance(_,P))
<< ( within(_, G1) && !intersects(_, G2) )

7. Other?

Feel free to propose different expressions.

Thoughts?

Regards,