Boost logo

Geometry :

Subject: Re: [geometry] Changes in WKB (read) behaviour proposal
From: Mateusz Loskot (mateusz_at_[hidden])
Date: 2013-11-07 05:35:03

On 4 November 2013 20:31, Mats Taraldsvik <mats.taraldsvik_at_[hidden]> wrote:
> On 11/04/2013 03:26 PM, Mateusz Loskot wrote:
>> On 2 November 2013 19:27, Mats Taraldsvik <mats.taraldsvik_at_[hidden]> wrote:
>>> Therefore, I added a (more or less trivial) run-time check that compares
>>> the
>>> actual byte size and the expected byte size of the points, linestring,
>>> polygon etc. which will give a run-time error during parsing, if the WKB
>>> does not have the expected data/dimensions/size.
>> Right, that's the mixed run-time and compile-time test we need to
>> consider indeed.
>> I see there are a couple of possibilities:
>> 0) If type of WKB input and (geometry) type of target container are
>> different, then throw error and give up.
>> 1) WKB input w/ geometry of higher dimension than target container
>> a) throw error
>> b) flatten (or clamp) the coordinates. I think we can assume we always
>> have at least 2D target container with X and Y. So, we can simply ignore
>> Z and M dimensions in this case.
>> 2) WKB input w/ geometry of lower dimension than target container
>> a) throw error
>> b) ignore the higher dimension coordinates in the target geometry,
>> or set with zero. I think, this option makes little sense.
>> My opinion is that the 0) gives most consistent behaviour, but from
>> usability
>> point, we may prefer combination of 1)b) and 2)a)
>> What do you think?
> While testing the linestring, I tried assigning a 3D WKB Linestring to a 2D
> (boost::geometry) Linestring. The WKB Linestring (1 2 3, 4 5 6, 7 8 9)
> became the (boost::geometry) Linestring (1 2, 3 4, 5 6), which was a
> surprise.

Yes, that's what is happening now.
The reason being, if you look in parsing_assigner<P, I, N>, the values
of I (current dimension index) and N (total dimensions count) are
based on the output geometry.
In your example of parsing WKB into 2D geometry, I=[0,1] and N=2.
IOW, current implementation of WKB parser assumes the WKB stream
carries coordinates with the same number of dimensions as the output
geometry object.

So, for if we consider stream (think this is in binary form)
input = [1 2 3 4 5 6 7 8 9] // WKB 3D input of 3 points

Then, parser that is reading into 2D geometry will 'grab' it as follows:
1) take coordinates in twos (because for 2D N counts up to 2 dimensions)
2) take only 3 points, as the input WKB stream of that LineString says 3 points
So, from 1) and 2), we will get output with *first* 3 pairs of
numbers: [1 2 3 4 5 6]

That is considered to be a limitation of the current implementation
and it should be possible to work around it by changing the WKB's
parsing_assigner to consume dimensions *not* based on dimension of
output geometry, but the geometry type indicated by WKB.
So, from static compile-time looping based on I and N template parameters,
it should be changed to run-time iteration based on what WKB indicates.
Obviously, WKB-based dimension will have to be checked against output
geometry dimension, and here is where some of my 0) or 1) or 2) rules
can be applied.

I hope it makes sense :)

I'd also like to hear others opinion, especially Barend's as we may
need to reimplement the WKT parser in similar way.

> If this is equal to the 1b) alternative you propose, I'm not sure
> if I'd want that behaviour.

No, it is not what I meant. As I explained above, what you currently
see is a limitation/deficiency of the current WKB parser.

> Because of this, I implemented the check on byte size/ number of points,
> which I described above, which corresponds to your 1a and 2a check.

Right, and there is technically nothing wrong with your approach, I suppose.
My point was that we should try to rely on the metadata we get encoded
within the WKB stream - they will free us from calculating anything.

> I used to get the WKB, so it should be what PostGIS would
> output. It appears that the geometry type from PostGIS on a 3D Point is
> Point (1), not PointZ (1001), but perhaps that is EWKB, and not WKB.

You should use ST_AsBinary to get the real OGC WKB.
Without it, PostGIS will output internal byte stream format, as ST_AsEWKB does.

AFAIR, the confusing type numbers of this is a legacy of WKB 2.5D before
OGC adopted 3D geometries.

If you feed PostGIS 2.0+ with this SQL:

    ST_AsText('POINT(1 2 3)'::geometry)
    encode(ST_AsBinary('POINT(1 2 3)'::geometry), 'hex')

you will get both WKT and WKB in OGC form:

POINT Z (1 2 3)

and (I use pipe '|' to separate fields):


where e9030000 value is POINT Z type code: 1001

> However, as this might be easy to mix up, I'd vote for the length check I
> have implemented.

IMHO, we should try to keep the WKB parser conforming to the OGC spec.
Later, based on WKB work, we can make a separate parser for EWKB format
which would be similar but different.

So, also based on what I explained earlier in thsi post,
my advise would be to try to make use of the metadata in WKB.

But please don't feel like I'm trying to reject your efforts, which
n.b. I greatly appreciate. Please, treat my words as technical
discussion/advise only.
If your current implementation does the job for common cases, let's
accept it into Boost.Geometry. Then, we can (I can) make another
iteration improving it to rely on WKB metadata and to ensure it's OGC
WKB compliant.

> What about the 0) check you propose, which should always be an error, and 1a
> and 2a as an additional check that can be turned off with a template
> parameter?

I guess that's possible, similar to clockwise and closed configurable
properties of ring/polygon.

> What error should be thrown?

Something similar to read_wkt_exception, i.e. create read_wkb_exception

>>> The parser should support 3D as long as both the expected point
>>> dimensions
>>> and the wkb point dimensions match. I have tested this for 3D/Z. M is not
>>> supported by WKT yet, apparently, so I could not test this easily.
>> M is there too, see OGC specification in version 1.2.0.
> I meant boost::geometry's WKT reading. At least, I got an error when I tried
> POINTM(1 2 3), but that might be the wrong syntax.

OGC requires space between POINT and M, but I'm not sure BG's WKT can
handle that, would need to check the code.

Best regards,

Mateusz  Loskot,

Geometry list run by mateusz at