Boost logo

Geometry :

Subject: Re: [geometry] Changes in WKB (read) behaviour proposal
From: Mats Taraldsvik (mats.taraldsvik_at_[hidden])
Date: 2013-11-16 15:26:54

Hi Mateusz,

On 11/07/2013 11:35 AM, Mateusz Loskot wrote:
> On 4 November 2013 20:31, Mats Taraldsvik <mats.taraldsvik_at_[hidden]> wrote:
>> On 11/04/2013 03:26 PM, Mateusz Loskot wrote:
>>> On 2 November 2013 19:27, Mats Taraldsvik <mats.taraldsvik_at_[hidden]> wrote:
>>>> Therefore, I added a (more or less trivial) run-time check that compares
>>>> the
>>>> actual byte size and the expected byte size of the points, linestring,
>>>> polygon etc. which will give a run-time error during parsing, if the WKB
>>>> does not have the expected data/dimensions/size.
>>> Right, that's the mixed run-time and compile-time test we need to
>>> consider indeed.
>>> I see there are a couple of possibilities:
>>> 0) If type of WKB input and (geometry) type of target container are
>>> different, then throw error and give up.
>>> 1) WKB input w/ geometry of higher dimension than target container
>>> a) throw error
>>> b) flatten (or clamp) the coordinates. I think we can assume we always
>>> have at least 2D target container with X and Y. So, we can simply ignore
>>> Z and M dimensions in this case.
>>> 2) WKB input w/ geometry of lower dimension than target container
>>> a) throw error
>>> b) ignore the higher dimension coordinates in the target geometry,
>>> or set with zero. I think, this option makes little sense.
>>> My opinion is that the 0) gives most consistent behaviour, but from
>>> usability
>>> point, we may prefer combination of 1)b) and 2)a)
>>> What do you think?
>> While testing the linestring, I tried assigning a 3D WKB Linestring to a 2D
>> (boost::geometry) Linestring. The WKB Linestring (1 2 3, 4 5 6, 7 8 9)
>> became the (boost::geometry) Linestring (1 2, 3 4, 5 6), which was a
>> surprise.
> Yes, that's what is happening now.
> The reason being, if you look in parsing_assigner<P, I, N>, the values
> of I (current dimension index) and N (total dimensions count) are
> based on the output geometry.
> In your example of parsing WKB into 2D geometry, I=[0,1] and N=2.
> IOW, current implementation of WKB parser assumes the WKB stream
> carries coordinates with the same number of dimensions as the output
> geometry object.
> So, for if we consider stream (think this is in binary form)
> input = [1 2 3 4 5 6 7 8 9] // WKB 3D input of 3 points
> Then, parser that is reading into 2D geometry will 'grab' it as follows:
> 1) take coordinates in twos (because for 2D N counts up to 2 dimensions)
> 2) take only 3 points, as the input WKB stream of that LineString says 3 points
> So, from 1) and 2), we will get output with *first* 3 pairs of
> numbers: [1 2 3 4 5 6]
> That is considered to be a limitation of the current implementation
> and it should be possible to work around it by changing the WKB's
> parsing_assigner to consume dimensions *not* based on dimension of
> output geometry, but the geometry type indicated by WKB.
> So, from static compile-time looping based on I and N template parameters,
> it should be changed to run-time iteration based on what WKB indicates.
> Obviously, WKB-based dimension will have to be checked against output
> geometry dimension, and here is where some of my 0) or 1) or 2) rules
> can be applied.
> I hope it makes sense :)

Yes, thank you for the explanation :)

> I'd also like to hear others opinion, especially Barend's as we may
> need to reimplement the WKT parser in similar way.
>> If this is equal to the 1b) alternative you propose, I'm not sure
>> if I'd want that behaviour.
> No, it is not what I meant. As I explained above, what you currently
> see is a limitation/deficiency of the current WKB parser.
>> Because of this, I implemented the check on byte size/ number of points,
>> which I described above, which corresponds to your 1a and 2a check.
> Right, and there is technically nothing wrong with your approach, I suppose.
> My point was that we should try to rely on the metadata we get encoded
> within the WKB stream - they will free us from calculating anything.

I agree, and I have changed the checks to a simple assert that checks if
the bytestream is long enough to contain the number of points it should
contain, according to num_points in the WKB.

>> I used to get the WKB, so it should be what PostGIS would
>> output. It appears that the geometry type from PostGIS on a 3D Point is
>> Point (1), not PointZ (1001), but perhaps that is EWKB, and not WKB.
> You should use ST_AsBinary to get the real OGC WKB.
> Without it, PostGIS will output internal byte stream format, as ST_AsEWKB does.
> AFAIR, the confusing type numbers of this is a legacy of WKB 2.5D before
> OGC adopted 3D geometries.
> If you feed PostGIS 2.0+ with this SQL:
> ST_AsText('POINT(1 2 3)'::geometry)
> encode(ST_AsBinary('POINT(1 2 3)'::geometry), 'hex')
> you will get both WKT and WKB in OGC form:
> POINT Z (1 2 3)
> and (I use pipe '|' to separate fields):
> 01|e9030000|000000000000f03f|0000000000000040|0000000000000840
> where e9030000 value is POINT Z type code: 1001

This was my fault, the online PostGIS instance I used had version 1.5,
so I installed PostGIS locally. It outputs the correct WKB now :)

>> However, as this might be easy to mix up, I'd vote for the length check I
>> have implemented.
> IMHO, we should try to keep the WKB parser conforming to the OGC spec.
> Later, based on WKB work, we can make a separate parser for EWKB format
> which would be similar but different.
> So, also based on what I explained earlier in thsi post,
> my advise would be to try to make use of the metadata in WKB.
> But please don't feel like I'm trying to reject your efforts, which
> n.b. I greatly appreciate. Please, treat my words as technical
> discussion/advise only.
> If your current implementation does the job for common cases, let's
> accept it into Boost.Geometry. Then, we can (I can) make another
> iteration improving it to rely on WKB metadata and to ensure it's OGC
> WKB compliant.

As stated earlier in my reply, I have removed the proposed checks, to
rely on static information instead. I guess it will be more efficient,
and we want an efficient WKB parser :)

I have implemented tests and support for MultiLinestring and
MultiPolygon as well, and updated with a patch formatted
according to your instructions.

The next step might be to handle Z, M and ZM geometries.

We need to add the new types in ogc.hpp. The geometry_type_parser needs
to change. I already removed the multipolygon-check, but the current
bitwise AND trick does not work -- as far as I can tell -- with pointz
(1000) etc.

Also, all the geometry_type-checks in *_parser needs to handle multiple
types (point, pointz, pointm, pointzm).

Perhaps, after this, we add the feature to gracefully handle e.g. a 2d 
linestring read from a 3d linestring WKB.
>> What about the 0) check you propose, which should always be an error, and 1a
>> and 2a as an additional check that can be turned off with a template
>> parameter?
> I guess that's possible, similar to clockwise and closed configurable
> properties of ring/polygon.
>> What error should be thrown?
> Something similar to read_wkt_exception, i.e. create read_wkb_exception
>>>> The parser should support 3D as long as both the expected point
>>>> dimensions
>>>> and the wkb point dimensions match. I have tested this for 3D/Z. M is not
>>>> supported by WKT yet, apparently, so I could not test this easily.
>>> M is there too, see OGC specification in version 1.2.0.
>> I meant boost::geometry's WKT reading. At least, I got an error when I tried
>> POINTM(1 2 3), but that might be the wrong syntax.
> OGC requires space between POINT and M, but  I'm not sure BG's WKT can
> handle that, would need to check the code.
> Best regards,

Geometry list run by mateusz at