
Boost : 
Subject: Re: [boost] [Review] GGL review starts today, November 5th
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 20091113 13:02:07
Barend Gehrels wrote:
>> Your algorithm is: for all red montonic sections, inspect all blue
>> monotonic sections for bounding box overlap and if overlap exists
>> inspect segments of monotonic section pair for line intersection and
>> report said intersections. It has O(n^2) runtime complexity wrt.
>> monotonic sections. Monotonic sections are in the worst case
>> equivilent to the number of edges.
>>
> Sure. As I said, there is room for improvement there. But it is not
> necessary. See below.
>
>> My library is not relevant to the discussion.
> Was mentioned by you in terms of "My own implementation...", "I was
> hoping to upgrade my algorithm..." etc. But I agree, let's leave it
> out.
>
>> I don't understand what you mean my not convenient.
> That I did want to go to sleep.
I owe you an apology for this question, it was unfair, and I commend you for responding to it in a professional manner.
>> My concern about performance is that your benchmarking may not
>> expose performance problems. Your claims about performance are
>> based upon one benchmark (with sever minor variations of one of the
>> two input geometries) from which you generalize about performance.
>> The algorithm has a fatal weakness to
>>
> to ... everything?
>
> We've had this discussion several times before, offlist and onlist.
> You asked me to create a moreintersectiondense environment and I
> did.
> It *is* testing pathological cases. It *is* testing the diagonals you
> mentioned. You asked me to deliver cases and pictures where things
> went
> wrong in another library and I did. You asked me if I could create
> tests testing 90degree and 45degree polygons and I didn't, because
> I said
> you could create those tests yourself, they belong to your
> environment,
> but I didn't receive them. You're objecting against our interiorring
> approach and I will test it, but not now. You asked for more
> documentation and I did. All the things I do and write are returned by
> words like fear and fatal, but nothing more than such words. So I
> think
> it's useless to continue like that.
Yes, I plan to perform my own benchmarking of you implementation to verify whether my concerns about its scalability are warrented.
> Anyway, worst case tested:
>  19 minutes (GGL, best)
>  73 minutes (second best)
>  20 hours (worst)
>
> The last one is widely used within the GIS world and has variants in
> Java, C++ and C#. It's 60 times slower in pathological cases and ~10
> times slower in more normal cases, so I think that potential Boost
> users
> can safely use our library without taking the risk that it is too
> slow.
>
> Full info: http://trac.osgeo.org/ggl/wiki/IntersectionMore
> The tested polygons have 10000 vertices, have the form of a star,
> which points are located horizontally, vertically and diagonally
> across the
> input. A factor more, 100000 points didn't fit in computer memory,
> with
> all libraries tested. It *is* a pathological case
Your complexity is n*m wrt. red and blue monotonic sections. In your benchmark n is basically the average of the number of monotonic sections of all the countries in the world. I don't know what it is, but I expect the avage length of monotonic sections in a country border is pretty large, implying the number of sections is small. By varying m but not n you will not be able to observe quadratic behavior of your algorithm, it will appear to scale linearly. Furthermore, if n is sufficiently small on average your algorithm should be expected to perform nearly linear regardless of how large m is.
> The testsuite is open source and open for anyone to reproduce the
> results or want to extend it. The input can be choosen, it is a
> runtime parameter.
I plan to do so.
> And I repeat, if the first phase, which is quadratic wrt sections and
> so quadratic in worst cases, even then takes too long there is room
> for improvement. As said in our paper
> (http://www.boostcon.com/sitemedia/var/sphene/sphwiki/attachment/2009/05/05/GGL_Paper.pdf)
> the Priority RTree spatial index, optimal in worst case scenario's,
> might be useful.
>
> So you can repeat that our implementation is fatally weak, but it is
> not, on the contrary. That our tests are wrong and grains of salt but
> they are not, on the contrary.
Last night I redirected my admittedly unbecoming negative emotions that seeing your nested for loop brought to the surface to something positive. I enhanced my own line intersection algorithm with a 1D spatial indexing based divide and conquer. Let me quickly explain. Previously I sorted all line segments by their low x coordinate and then for each segment in sorted order scanned in the sorted sequence comparing all segment pairs until the low x of the segment found in the inner loop exeeds the high x of the segment in the outer loop. This gives the benefit of 1D spatial indexing in the x dimension. On top of that I add divide and conquer optimization in the y dimension. I construct a histogram of the number of line segments that overlap intervals in the y orientation (the same thing that Joachim's ITL does) then recursively subdivide the y dimension by finding the min horizontal cutline in the middle 1/3 of the histogram. I then bin edges into the horizontal strips and run the original line segment intersection algorithm on each strip. This provides algorithmic speedup and is effective at speeding up even the hand xor spiral test case that I have a picture of in my documentation. This should be equivilent to the speedup achievable by using a 2D spatial index (such as kdtree) but is only about 75 lines of code. I'd be willing to submit a patch for your line segment intersection algorithm that adds both the x and y spatial optimization on top of your montonic sections optimization. This would fix the majority of algorithmic weakness to large n and m. Similarly I can fix the algorithmic weakness to large numbers of holes. My polygon formation algorithm is independent of line intersection and boolean operation stages, and should be compatible with floating point coordinates. In the case that the number of holes in the output is large you can opt to call my polygon formation algorithm on all the polygon edges to assocate holes to outer shells in n log n time.
>>> Also here we're happy to provide it, it is not satisfying your
>>> needs, especially if this enhances interoperability we're certainly
>>> willing to adapt and finetune things. Actually I'm also glad to
>>> hear positive feedback about the way we designed it for linestring.
>>>
>>
>> I think fixing the polygon traits to be like the linestring/ring
>> traits would improve both consistency and genericity of the
>> interfaces and should be required before the library is accepted
>> into boost.
>>
>>
> Our concepts are completely clear and sound, documented and have
> examples. We're conforming to boost ranges and, for mutable, to
> stdlibrary. I mentioned adaption to increase interoperability with a
> recently accepted library, your library.
>
> Objecting acceptance on return of such an offer, adapting to a library
> accepted one hour after announcement of review of ours, is quite
> surprising.
You misunderstand my objection. I don't object because you can't adapt my polygon data type to your polygon concept. I object because the can't adapt a class of polygons to your polygon data type, but could easily do so with a simple change that makes your interfaces more consistent and simple. It is better that this change be made before the library is accepted to boost rather than after. As an aside, if I can adapt your polygon concept to my polygon concept that would make my polygon formation algorithm drop in compatible with your polgyon intersection algorithm output interfaces, which would make it easy for you to use my polygon formation algorithm to optimally associate holes to outer shells.
By the way, I think that relaxing the requirement on linestring that the iterator be random access would be a good idea. You can specialize the function that requires random access to call different code for forward and random access iterators if random access is needed for an important optimization.
Regards,
Luke
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk