Boost logo

Boost Users :

Subject: Re: [Boost-users] [Polygon] Test among 3 libraries
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2010-09-09 13:35:01


Angus Johnson wrote:
>> That's a worthwhile goal to be sure. When I discussed scalability I
>> meant algorithmic complexity and ability to process large data
>> volumes in input geometry in a single test case.
> Ahh, thanks for the clarification. I haven't yet done any significant
> testing of processing very large data volumes, nor am I competent to
> determine an O value, so I'll defer to your estimation there.

What you have already done is significantly harder than determining O values. Don't sell yourself short. If you plot number of vertices in input plus output on the x axis and runtime on the y axis in a excel spreadsheet you can have excel create an exponential trend line for the data and display the equation on the graph. The equation will look like 2.79235 * x^1.34984. Your empirical scaling factor (empirical O value) is then 1.35. In this hypothetical example 1.35 is faster than O(n * sqrt n) but slower than O(n log n). You should be able to estimate your empirical scaling factor very quickly by generating progressively large random test cases with this method. Read the experiment section of my boostcon paper
http://www.boost.org/doc/libs/1_44_0/libs/polygon/doc/GTL_boostcon2009.pdf
to get an idea how to structure the experiment and what the graph should look like (page 9.) I can share the code for the test harness with you if you like. The tricky part is making sure that the geometry doesn't qualitatively change as you make successively larger test cases because this can introduce a systematic bias in you empirical scaling factor. I'd be interested in seeing this kind of performance analysis on your clipper implementation.

If you are using a linked list as your sweepline data structure your worst case runtime complexity is O(n^2) wrt. input plus intersection vertices and your expected runtime complexity is O(n^1.5) in the common case. In the worst case there is O(n) line segments in your sweepline at each sweepline "stop" (unique x coordinate) while in the common case there is O(n^0.5) line segments in your sweepline at each sweepline stop and O(n) stops.

>> Also, I believe GPC does handle high order intersections because I'm
>> quite certain I was generating large numbers of them in the tests I
>> was running that GPC seemed to handle just fine. I could be wrong of
>> course, but I suggest you double check that GPC doesn't properly deal
>> with complex intersection. I know it doesn't deal with overlapping or
>> self intersecting polygons at all, but that is a separate issue from
>> polygons that have vertices in common or polygons that abut along
>> edges but do not overlap. GPC is used quite extensively and paid for
>> by many people, we should be responsible about making claims about
>> its short comings.
>
> I was careful to qualify my statements about GPC with "I don't think"
> and "I believe". However, I accept that this still isn't fair
> treatment and I should be sure before making these claims. Let me
> simply state that under my stress-testing, GPC fails about 4% of the
> time (and with extreme stress testing - rounding random vertices to
> nearest 100 instead of nearest 10 - it fails far more frequently). The
> full source code for my stress-tests can be downloaded here:
> http://www.angusj.com/delphi/clipper.php

What is GPC's failure mode that you observe when it fails your stress test? Does it crash or does it just produce garbage output polygons. In my testing I found that given self intersecting or overlapping polygons in an input GPC produced garbage output. I would prefer a crash since it is as much work to check the result of a boolean is correct as to perform the boolean. An algorithm that produces the wrong result and reports success is worse than an algorithm that crashes.

If you are substantially more robust than GPC they you are substantially robust. Epsilon methods won't get you to 100%, but there are other methods that you can use that get you all the way. You are asymptotically approaching them with your iterative epsilon idea. The really tough problem wrt. robustness is that numerically error move line segments laterally and can introduce spurious intersections in the output. These intersections become fatal errors for down stream processing of the output of your algorithm. There is a paper on floating point snap rounding by Milenkovic that could help you with 100% numerical robustness that avoids spurious intersections. Search for Polygon Set Operation and or Milenkovic and use cite seer to walk the reference links. I can send you my archive of papers if you don't have access to an academic database.

I greatly approve of your stress testing approach. Try turning off rounding of the random vertices and then we'll really see where epsilon methods fall short. It isn't really fair to round the inputs with your knowledge of which epsilon you use and compare to someone elses algorithm. The kind stress testing you have been doing is exactly the right way to stress test numerical robustness. I wish others would do the same. Perhaps you can include the Boost.Geometry clipping implementation in your scaling and stress testing. If your implementation is both faster and more reliable than what they have I would think we would use yours to replace theirs in their library. I don't think users of the library would find that confusing. Also boost geometry is something of a test bed for development of better algorithms in addition to a vehicle for releasing them. From that standpoint multiple algorithms that do the same thing are part in parcel of developing better algorithms.

Regards,
Luke


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net