Boost logo

Boost :

Subject: Re: [boost] Formal Review: Boost.Polygon
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2009-09-14 12:37:02


Phil Endecott wrote:
> The first step was to add the "magic" required by the library to
> support my rectangle
> type. The examples don't seem to include a custom rectangle, and the
> decomposition
> of a rectangle into a pair of intervals rather than the more common
> pair of points or
> point and size, made this a bit tricky. More serious were the
> following problems:
>
> ** The library does not seem to use any sort of explicit concept
> checking; if types do
> not satisfy their concepts random errors from deep inside the library
> are emitted. The
> verbosity and incomprehensibility of errors from complex C++ code is,
> in my opinion,
> the greatest single failing of the language today. In the face of
> these ridiculous
> messages, some users will simply abandon trying to use the library (or
> the whole
> language, and go back to C) or limit themselves to only what can be
> done by copying
> examples. Fixing this needs effort from the language designers,
> compiler writers, and
> libraries. I am not certain that concept checking can totally cure
> this problem,
> but it can only help.

I believe that I should supply concept checking tests (similar to the examples in the docs) for all concepts so that users who try to map their object types to my concepts can check whether it is correct. I don't know how much I can do to make error messages more friendly as a library author. My intention was to provide a library that demonstrates what a C++ concepts based library would look and feel like. I've been successful in that. I think that the verbose error messages issue is something that is a problem many boost libraries have.
 
> ** The library spews out enormous quantities of warnings (gcc 4.1.2),
> I
> think mainly
> related to unused parameters. This has to be fixed (and I don't
> believe fixing it is
> hard).

I thought I had fixed it already. I made a few changes, compiled my unit tests without warnings with gcc 4.1.2, and checked those changes in during the review.

> ** The library failed to work with std::pair being used as the
> interval
> type, despite
> this being mentioned as a possibility in the documentation. The
> problem seems to be
> that the std namespace is brought into ADL and std::set is found
> rather
> than the
> library's own set function. I have not investigated how hard this
> would be to fix;
> instead I wrote my own pair template.

I was aware that ADL poses a problem for the functions I gave exceedingly common names to, not least the operators. I can guard useage of set() with (set)() throughout the library and add tests that std::pair can be adapted to both point and interval.

> Having completed the "magic", I started to adapt my marker drawing
> code. My first
> attempt was a minimally-invasive version that would check if each
> marker covered any
> new area and not plot it otherwise:
>
> if (!contains(markers,b)) {
> markers |= b;
> plot(b);
> }
>
> Unfortunately this failed since contains() does not seem to be defined
> for a
> polygon_90_set and a rectangle. This surprised me somewhat. A
> valuable addition to
> the documentation would be a matrix indicating which operations are
> supported between
> which types. Somebody will need every combination of operations and
> types, and it
> would be useful to see quickly which are provided, which are planned
> for a future
> version of the library, and which cannot be implemented due to some
> restriction of the
> library design.

It is possible to make a list of all free functions and all the concepts they accept. It was suggested by other reviewer. I could make an index section to the documentation.

> I could have tried to implement contains(polygon_90_set,rectangle) in
> terms of e.g.
> intersection but I was concerned that that would have much worse
> complexity than an
> optimised contains() algorithm could have. Which raises another
> issue:
>
> ** Algorithms should indicate their complexity guarantees, and (where
> different) their
> expected complexity for common inputs.

I can add this to the documentation.

> So I moved on to a more invasive modification where I accumulate all
> of
> the rectangles
> and use get_rectangles() at the end to extract a minimal set of areas
> to plot:
>
> for (...) {
> markers |= b;
> }
> get_rectangles(rects,markers);
> foreach(rect in rects) {
> plot(rect);
> }
>
> This works, and reduces the number of rectangles that are plotted from
> about 6,000 to
> about 900. Unfortunately get_rectangles() takes about 100 ms to run,
> which is
> slightly longer than plotting the redundant markers would have taken,
> so overall the
> frame rate is not improved.
>
> In that code, I was surprised to see that get_rectangles takes an
> output container by
> reference. In this case, I have no need to allocate memory and store
> these rectangles
> at all. I would have prefered some way to get the rectangles as they
> were generated
> and pass them directly to my plot function, e.g.
>
> for_each_rectangle(markers, &plot);
>
> An output iterator would be a more conventional (but in this case more
> verbose) way to
> achieve this.
>
> ** Unless there is some good reason for using output reference
> parameters, algorithms
> should use output iterators.

I expected this feedback to come up during the review. I can add the output iterator based inerfaces.

> I was also a little confused by the difference - if any - between
> "markers |= b" and
> "markers.insert(b)". I believe that I know what the former does, but
> I
> worry that
> unioning-in each rectangle in turn is inefficient and that it would be
> better to
> construct a set of rectangles and self-union in one go. Perhaps this
> is what insert()
> does - yet there is no self_union function, and in practice I seem to
> get the same
> results. Maybe get_rectangles() has done the self_union. Some
> clarification of this
> in the documentation would be useful.

markers |= b is implemented by a call to insert, get_rectangles does indeed to the self union. I have discussed this optimization during the review and it is described in the documentation as well.

> One issue that I noted while reading in the polygons is that the
> provided polygon_data
> type is immutable: to read from a file, I need either an exotic
> iterator that can be
> passed to its constructor, or a temporary. I'm unsure why
> polygon_data
> can't be more
> like a mutable std::container so that I can simply read points from a
> file and
> push_back() in a loop. Trying to use a std::vector<point_data>
> required more traits
> specialisation that I expected; can't this be pre-defined?
>
> ** The polygon_data type should be mutable (i.e. support things like
> push_back()), or a
> mutable version should be provided, or std::vector/list<point_data>
> should work with
> minimal traits specialisation.

The example custom_polygon.cpp shows the traits for a list<CPoint>. I could make all containers of points models of polygons_concept by default, but I wouldn't know which polygon concept the user prefers.

> I also found that polygon_data is specialised by the coordinate type,
> not the point
> type, so anyone using their own point type will have to provide their
> own polygon type.
> Is there some reason why I can't have polygon_data<my_point>? Again,
> making it easier
> to use std::container<my_point> would help.

Point type is a trait.

> I also found little mention of winding direction in the documentation.
> I haven't
> checked whether the data I have winds consistently; what does
> unknown_winding actually
> do?

Forces a runtime evaluation of the winding direction when that information is needed.

> As I guessed, contains(polygon,point) is very slow if done naively:
> the library's polygon concept (and the provided default
> implementation) have no way
> to determine
> whether a point lies within a polygon in better than O(N) time.
> Testing first with the
> bounding box makes it O(1) in the common case. So I tried to write a
> polygon type that
> would do this, something like this (but not exactly this!):
>
> struct my_polygon: polygon_data {
> rectangle bbox;
> template <typename ITER>
> my_polygon(ITER begin, ITER end):
> polygon_data(begin,end),
> bbox(extents(*this))
> {}
> };
>
> bool contains(const my_polygon& poly, point_t pt) {
> return contains(poly.bbox,pt) && contains(poly,pt);
> }
>
> Hmm, well that looks OK until the very last "contains"... no doubt
> someone will
> immediately spot the "right" way to do this. I ended up making
> contains() a member of
> my_polygon, and this worked reasonably well. Note that the extents()
> function does not
> work as written above: it takes a reference to a rectangle as an out
> parameter. center() does something similar. This seems wrong to me;
> returning a
> point or
> rectangle would be more conventional I think.

Since it is just as expensive to compute the bounding box of a polygon I left the bounding box optimization to the user.

> ** extents() and center() should return their results, rather than
> using an output
> reference.
 
Out parameters and equational reasoning have been discussed elsewhere in the review.

> I have a fixed point type that I use for GPS coordinates (32 bits is
> good for lat/lng
> GPS positions, but 24 bits isn't) and using it with this library
> worked
> well. The
> Coordinate Concept page doesn't say much about the type requirements
> ("expected to be
> integral") and spelling out in more detail what is needed would be
> useful. In practice
> I must have already implemented everything that it needed.

Interesting. I provide default traits for coordinate concept, if these are sufficient for you all you need to do is specialize geometry_concept for your coordinate type.
  
> However, I can't see any way to perform these boolean operations other
> than using the
> operators, and I think this is an ommission. If I were writing code
> that I expected
> someone else unfamiliar with the library to understand I would like
> the important calls to be conspicuous.
>
> ** The library should provide functions to perform the boolean
> operations, as well as
> operators, for the benefit of those who want to make their code more
> verbose.

The polygon_set_data objects have operator member functions that can be used instead, if you manually convert the inputs to polygon_set_data objects in user code.
 
> I dislike the use of operators with integers for the grow/shrink
> operations. My
> concern is that this meaning is non-obvious, and since these operators
> are also
> defined for boolean operations, there is potential for error:
>
> polygon_set a, b, i;
> ...
> for (int i = 0; i<10; ++i) {
> ...
> a -= b+i; // oops, forgot i is now an int
> }

I agree there is a potential for error. It is a judgement call and preference in this is very subjective.
 
> Isotropic Style
> ---------------
>
> I recall Luke describing his isotropic style on the list a long time
> ago, and the
> documentation would benefit from an explanation along those lines.
> One "FAQ" that it
> would need to address is the question of the performance of a run-time
> selection of X
> vs. Y compared to a compile-time selection in more conventional code.

I use the isotropic style extensively in the implementation details of algorithms in the library. If the user doesn't like it they generally have alternative APIs available to them that do not require it, and can ignore its existence except when setting up the point and rectangle traits.

> Barend has also raised questions about the worse-case performance of
> Luke's
> line intersection algorithm, and it seems that Luke accepts that it is
> not optimal.
> Achieving the best possible algorithmic complexity is something that I
> would consider
> very important or essential for a Boost library. At the very least,
> Luke needs to
> document the complexity of his current algorithm and explain why he
> cannot do any
> better. The views of experts should be sought if this cannot be
> resolved by the
> review manager.

If Barend, or anyone, implements a fast, robust line intersection algorithm and licenses it under the boost license I can easily replace my own line intersection algorithm with the faster one. Implementing fast, robust line intersection is extremely difficult and time consuming. Our internal performance requirements are met by what I implemented.
 
> In view of all this, I suggest that this library should be rejected
> for
> now. This
> will tell Barend that he still has an opportunity to present his
> library for review,
> and that it will be considered on a level playing field. If Barend's
> library is
> reviewed and found to be more complete, more performant and at least
> as
> usable as this
> library, then it should be accepted. On the other hand, if Barend's
> library is found
> to be deficient in some way (or is not submitted for review), then
> Luke
> will have an
> opportunity to resubmit an updated version of this library, which I
> anticipate should
> be accepted.

I don't believe it is necessary to position the decision as an either or GGL/Boost.Polygon proposition.

Thanks,
Luke


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk