Boost logo

Boost :

Subject: Re: [boost] Formal Review: Boost.Polygon starts today August 24, 2009
From: Thomas Klimpel (Thomas.Klimpel_at_[hidden])
Date: 2009-08-30 21:34:44


Here comes my review of Boost.Polygon.

> Please always state in your review,
> whether you think the library should be accepted as a Boost library!

Boost.Polygon should be accepted as a Boost library. However, it should not be released as a Boost library before the most important issues raised in the "What is your evaluation of the implementation?" section are addressed.

Boost.Polygon should be accepted because it is a very useful library with a good design and a sufficiently good and complete documentation.

> - What is your evaluation of the design?

The design is good.

The library is focused on a well defined problem space and states this explicitly: "The boost polygon library provides algorithms focused on manipulating planar polygon geometry data."
This is good from the design perspective of a Boost library, because it helps the potential user to quickly find out whether the library can help him with his problem at hand or not.

The library is build on the foundation of concepts for the fundamental objects of its problem space. The design of the concepts and the corresponding type system is well done. The concepts and the type system are adequate for the problem space addressed by the library, and are well explained in the documentation.
However, the description of the concepts is focused on their type system. The role of the Interval concept remained a bit unexplained for me. It seems to be the 1D version of the rectangle concept, and only the rectangle concept seems to make use of the interval concept. However, at least in theory, an interval would also be the 1D version of a polygon, so I wonder why there is no Interval Set concept?

However, even so the concepts are really nice, I doubt that it was a good idea to organize the documentation and the code exclusively with respect to the concepts. The often cited "Data Structures and Algorithms" paradigm requests that both data structures and algorithms should get some attention. But since the concepts only correspond to "Data Structures", it looks as if the "Algorithms" didn't get enough attention in the documentation, and the source files implementing the concepts got overloaded with algorithms (or algorithm dispatch code).

> - What is your evaluation of the implementation?

The fundamentals of the implementation are well done. However, some work is still needed before the library is ready for release as a Boost library.

There is a huge amount of source code in Boost.Polygon. I was only able to review a fraction of it, but the code I reviewed was well readable and to the point. It would be interesting to know whether anybody but the author ever reviewed all the code in detail. However, the available (active, i.e. not commented out) unit-tests don't cover the full functionality of the code and are currently not in a form suitable for automatic regression testing. (I say this because the unit-tests successfully compiled with gcc-3.4.4, gcc-4.3.2 and msvc-9, but three of the examples from the documentation failed to compile with msvc-9, even so they successfully compiled with gcc-3.4.4 and gcc-4.3.2.)

- The unit-tests should be split into smaller files and use a nonzero exit value to indicate failure. They should be placed in libs/polygon/test.
- Failing unit-tests should not be commented out just because they fail to compile.
- The examples from the documentation should be available in libs/polygon/example. All mistakes in them (like "namespace gtl = boost::polygon; namespace gtl {...}" or "assert(gtl::extents(rect, poly)); //get bounding box of poly") should be fixed.
- The examples should also use polygons that are not rectangles. It might even be a good idea to show what happens for self-overlapping or self-intersecting polygons.
- Some of the source files contain tabs. You have to the tabs to 4 spaces in your editor in order to conveniently view these source files.
- The source files include other files via ' #include "isotropy.hpp" ' instead of ' #include <boost/polygon/isotropy.hpp> ' as is otherwise common for boost libraries. However, I admit that it will work nevertheless.
- Is seems like the code still has some trouble with msvc-9. This should be fixed.

- It could make sense to slightly change the organization of the code into source files. For example, I see no reason why coordinate_traits is defined in isotropy.hpp.
- It could make sense to check that the source files include all required headers.

- It is true that msvc-9 spits out many warnings. However, it is also true that "warning C4127: conditional expression is constant"
for a code like
  template <typename value_type, typename geometry_type_1, typename geometry_type_2, int op_type>
  void execute_boolean_op(value_type& output_, const geometry_type_1& lvalue_, const geometry_type_2& rvalue_,
...
      if(op_type < 2)
is nonsense, because op_type is a template parameter. So I have the impression that the only way to get rid of the msvc warnings is by disabling the nonsense warnings with appropriate pragmas.

- I'm not sure whether I should trust to code when used with floating point coordinates. It's just that I think it wasn't used often with floating point coordinates, and the behavior of floating point is sufficiently different from the behavior of integer to lead to enough surprises to make untested code fail.

> - What is your evaluation of the documentation?

The documentation is sufficiently good and complete.

- The documentation is easy to read, and seems to be complete. The "complete" refers to the fact that all undocumented features are really not meant to be used by the user.
- There is a huge amount of duplication in the documentation. Even the typos are often duplicated, which indicates to me that the problem with the duplication is no accident. It's probably the reason why some libraries have a well readable (incomplete) user documentation without much duplication and a terse but complete reference documentation with arbitrary amounts of duplication.
- As already stated in the evaluation of the design, I think it could make sense to describe some of the algorithms independent from the concept on which they operate.
- The documentation could be more explicit about the runtime complexity of certain operations. What is the runtime complexity of contains(polygon, point) or extends(rectange, polygon), for example?
- The documentation could be more explicit about the behavior of operations that can fail. What will happen when deconvolve(interval1, interval2) has no valid solution, for example? Or what will center(interval) return if the center is not an integer number?
- The documentation could be more explicit about how self-overlapping or self-intersecting polygons are interpreted/resolved.
- What about signed and unsigned coordinate types? Are both signed and unsigned allowed? Are there any possible surprises?
- The documentation could be more explicit about covered and uncovered problem space, and potential future extensions.
- There are still many typos left in the documentation, but these should be easy to address.

- The references to the paper and the presentation are very welcome, and really add value to the documentation. But what is meant by "Performance of GTL could be improved by up to 10X with further work on the arbitrary-angle Booleans"???

The documentation has more pictures that the GPC documentation
http://www.cs.man.ac.uk/~toby/alan/software/gpc.html
but less pictures than the boolean documentation
http://boolean.klaasholwerda.nl/algdoc/top.html

> - What is your evaluation of the potential usefulness of the library?

The library is very useful. I don't think that the focus on integer coordinate types is a serious restriction. The benchmarks by the author also indicate that the library is very efficient.

Not all provided query functions are efficient, contains(polygon, point) and extends(rectange, polygon) are examples for this. These also point out that the "planar polygon geometry data with integer coordinates" problem space is not yet fully covered by the library (-> efficient query functionality for planar polygon geometry data is missing, for example). But even without it the library already has a huge code base, so this should not be interpreted as a fault of the library.

I also notice that the library is not intended to directly handle real GDSII or OASIS layout data, which is typically stored in hierarchical form. (This hierarchical form means more or less that some basis shapes are directly defined as polygons, and the more complex shapes are defined by reference to basis shapes or already defined more complex shapes.)

> - Did you try to use the library?
> With what compiler? Did you have any problems?

I tried the library with gcc-3.4.4, gcc-4.3.2 and MSVC-9. MSVC-9 has problems with custom types for the concepts, i.e. "custom_point.cpp", "custom_polygon.cpp" and "custom_polygon_set.cpp" failed to compile with MSVC-9. The examples files has to be repaired first, because "namespace gtl = boost::polygon; namespace gtl {...}" is not valid C++. I stepped through the code in a debugger to learn how it works, which was quite interesting.

> - How much effort did you put into your evaluation?
> A glance? A quick reading? In-depth study?

An in-depth study. I tried to read the documentation from start to end and at least look superficially at all existing source code. However, I had to give up after many hours, because the code base is just to huge for a complete review. I tried to review the implementation of the scanline algorithm, but I didn't succeed. So I simply trust the author that the implementation is as reliable as he claims. At least I haven't found anything that contradicts the claims of the author while reviewing the code.

> - Are you knowledgeable about the problem domain?

I'm a normal user of this sort of library.

So I finally have to thank Lucanus Simonson and it's employer for writing this excellent library and proposing it as a boost library.

Regards,
Thomas


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk