Subject: [boost] [Polygon] performance
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2009-09-02 16:37:49
Performance has become a critical aspect of this review with the revelation that Barend claims significant performance advantage with his recent implementation. I would like to address this issue directly and, unfortunately, I must do so by addressing Barend's implementation and comparing it to my own, which I would have preferred to avoid. I realize that this comparison of the two algorithms may appear confrontational. My intention is to focus on technical issues to allow people to understand Barend's performance claims in the context of what the two algorithms have in common and how they differ. My understanding of Barend's algorithm is based solely on what he has told me, and he refused to tell me enough to allow me to implement it myself, so if I am in error in my description of it I trust he will promptly correct me.
First, I do not believe that Barend's implementation is numerically robust when instantiated with float or double. Successfully passing tests is not proof that his algorithm will not generate self intersecting polygons at its output. Specifically, it is my understanding that his algorithm snaps intersection points to the floating point grid at the end of computation when he writes out output polygons. This means that edges may move laterally a distance dependent on the exponent of the floating point value of the result. In floating point coordinates a vertex may lie arbitrarily close to a nearby edge. Even a very small lateral movement of that edge due to loss of precision in the floating point representation has the potential to cross a nearby vertex and interduce self intersection in the output polygons. Since "simple polygons" is a common precondition to many algorithms (including Barend's own) it is simply unacceptable for there to be any chance of this post condition not being satisfied. Note that infinite precision floating point calculations don't solve this problem if you convert the output to regular floating point at some later stage, which is the common way the algorithm's output would be consumed.
Second, Barend has stated that his algorithm does the same thing as mine. This is simply not true. Barend's algorithm imposes quite a few preconditions on its inputs (non-self intersecting being just one example) while mine imposes none. The two arguments of a boolean operation in my library may be any number of intersecting and overlapping polygons, Barend requires that all polygons be simple and non-overlapping. My algorithm accepts a broader range of inputs, which makes it more general and more powerful than Barend's algorithm and a better algorithm to use in a generic interface where preconditions become concept requirements for input geometry data. Furthermore, my algorithm has more output modes such as trapezoidization and keyholing as well as the flexibility to perform n-layer operations such as connectivity extraction and map overlay. Barend's algorithm does not. Comparing Barend's intersection algorithm to my own is not an apples to apples comparision, it is an apple to fruit basket which happens to contain an apple comparision.
For these two reasons I think we should take Barend's performance claims with a grain of salt. When compared to other libraries that are known to be numerically robust (GPC, CGAL, Geos) my library is sometimes slower, sometimes faster in Barend's own benchmarking. Only Barend's algorithm is always faster in his benchmarking. However, if Barend's library has the potential to produce self intersecting output it is not even in the same class as these others.
My benchmark data can be reproduced using the forcgal.cpp file in the sandbox/gtl directory. I provided it when Andreas asked.
I admit that my algorithm could be faster. For the benchmarks I performed I estimated it could be up to 10X faster. For Barend's benchmarks with many pointed star polygons which result in O(n^2) pathological performance of my admittedly suboptimal line segment intersection algorithm that number might be higher sill.
I would prefer that the focus of the review be the concept interfaces (function declarations) and how the typesystem is implemented so that we can decide whether it is a good model for a generic geometry API. Changes to eg. the line intersection algorithm or internals of polygon_set_data class can be made without breaking code that depends on the library API. If someone implements a true drop in replacement for my polygon intersection algorithm (or just for line segment intersection) and licenses it under the boost license we can drop it in if it is faster and all any user should notice is that the library is faster in a new release of boost. I don't think there are any decisions I made in the design or implementation of the concept type system or library interfaces that prevent or even make difficult such a change to its implementation details.
Overall, I don't think we should view this review as an either or Boost.Polygon now or GGL later decision. There is no reason both libraries cannot be accepted in due time. Barend has made quite a bit a progress since boostcon and I am genuinely happy for him for what he has achieved.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk