Boost logo

Boost :

Subject: Re: [boost] [geometry] robustness approaches
From: Fernando Cacciola (fernando.cacciola_at_[hidden])
Date: 2009-03-14 14:26:35


Hi Luke,

> Fernando Cacciola wrote:
>> The library proposed by Lucannus, OTOH, at least acknowledges the
>> problem appropriately and approaches a solution in the right
>> direction (there is still some important design-influencing
>> considerations but are not showstopers)
>>
> Thank you, Fernando. I agree with you 100% that robustness needs be
> designed in up front and not tacked onto an existing geometry code
> base as an afterthought. In particular, the robustness strategy
> needs to be comprehensive and not patchwork. I acknowledge that my
> library needs a lot of work in this area to make it robust for both
> integer and floating point, particularly since it was only designed
> with integer robustness in mind. I will have to redesign the low
> level stuff quite significantly to get both integer and floating
> point fully robust and sharing the same algorithm. I don't really
> have the knowledge I need to make my library floating point robust.
> I need to read some of the papers you mention.
>
I'll try to sketch the general ideas below to get you started.

> Making the coordinate data type a template parameter and "allowing"
> the user to specify infinite precision numerical data type if they
> want robustness is not a solution

Precisely :) life would be too easy it that very already fast enough.

>, because there are very simple and
> effective ways to make integer and floating point operations robust.
> I don't know if we need to design in points of customization to
> expose the ability to do that to the user, but it needs to be at
> least possible for the library developer to specify different
> strategies for different coordinate data types and different
> contexts.

Exactly... and the library as a service to other higher level libraries
should and can provide a general methodology and the corresponding tools
for doing that.

Ultimately users shouldn't be concerned at all with all this, except
when there is reasonable speed/robustness tradeoff that users would like
to threshold themselves.

>
> I think Brandon's library is trying to handle both integer and
> floating point robustness, and I've noticed he is at least as
> interested in multi-precision numerical data types as I am, which is
> a good sign that he's doing the right things. I'm interested in
> learning more about his approach.
>
As I'll try to sketch below, bigfloat are part of the toolset, but there
is more to it.

> Performance is fine as a goal, but if the algorithms fail (crash,
> hang, drop polygons or "gracefully" throw an exception) even a small
> percentage of the time due to robustness issues then they are
> worthless, regardless of how fast.

Exactly.

> If we tried to use a non-robust
> library and it failed in a mission critical situation it costs us
> over one million dollars a day to fix the problem. Failure is not an
> option. When I say 100% integer robust I'm serious. Its all or
> nothing and the stakes for me are high.
>
And so are for end users.. it is the developers that think: "this works
in all cases I tried, then that's good enough".
But it isn't, and there is no need to compromise since lots of
techniques exists and while they don't cover all of the possible
algorithms, there is clearly little justification for not using the
techniques where they are applicable.

=== LONG EXPLANTION BELOW ===

OK, I'll try to sketch the general idea behid floating point robustness
in geometric computing. There is a lot on the subject on the
literature, is just difficult to find, so consider this just an intrduction.

On the one hand, if an algorithm only needs to perform algebraic
operations, a "Bigfloat", i.e. a software-based adaptive unlimited
precision floating point type, delivers totally exact results. If
division is needed then you can use a rational of bifgloats. If other
operations are needed, such as sqrt() or transcendentals (say cos, sin)
they you can use a number type that tracks the expression trees and
avoids actually performing the non-exact operations as much as possible,
specially when you query for discrete properties such as its sign.

On the other hand, it is possible to design and implement any geometric
algorithm in terms of "predicates" and "constructions" and it is
possible to consider the output as having separate topological and
geometric parts.

Let's consider an example: the straight skeleton (and the voronoi
diagram in this case) of a square contains one and only one vertex in
the center and one and only one edge from that center node to each
square corner. That result has a topology (how vertices and skeleton
nodes are connected) and a geometry (the coordinates of the center node).

The topology might be correct even if the geometry is not: i.e. the
center node is not exactly in the center, yet they are connected correctly.

For many many geometric applications, it is the topology what needs to
be correct, not the geometry. If you do clipping for instance, it might
very well not matter at all (for instance in drawing applications) if
the vertices are slightly off. But if an entire region is included in
the result when it should not, then that is a disaster for whatever
application.

Since the input to an algorithm is error free no matter the
representation (or more precisely: there is nothing the algorithm can do
about its implicit error), it is easy to see that, if an algoritm is
designed in such a way that the topological construction is based
exclusively on the results of predicates called exclusively on the input
(and never ever on an intermediate result), then the topology would be
correct no matter what capabilities the underlying input arithmetic has,
provided the predicates can do whatever magic is neccesary to produce
exact and correct results given input of any type.

That is, if there is a predicate that can take machine integer or
floating point coordinates, and, in whatever way, produce a guaranteed
error free result, then you can write the code of an algorithm totally
agnostic of the limitations in the type of input coordinates. That is,
the agorithm itsef can just pretend the computation is performed in the
RealRAM.

Researchers spent a lot of time figuring out how to implement exact
predicates based on finite precision integer or floating-point input.
While the results are limited and somewhat complicated from an
engineering POV they do allow for incredibly fast algorithms that take
simple native type coordinates (int or double) and produce 100% robust
code. But even then, the *concept* of a black box predicate is still
useful because there are other programming-specific techniques to
provide simple-to-program but fast predicates: the so called "floating
point filters" which I'll explan below.

IMO, any geometric library should provide a toolbox of exact predicates
(which shoud be exact independetly of the input type) and state as a
guideline that any algorithm should only make critical decisions based
on the result of predicates, and, unless totally impossible, the
predicates should never be called with intermediate results.

Here's a simple example:

   T d1 = distance(a,b)
   T d2 = distance(c,d)
   if ( d1 < d2 )
      blabla

that is fundamentally flawed beyond repair because rebustness becomes
the resposability of T, and at the end of the day the poor user finds
himself wrapping his input into a fancy numeric type just because of
that flaw.

If OTOH that were written like so:

   if ( compare_distace(a,b,c,d) )
      blabla

then the input could still be of any type yet some basic library tool
could go to whatever extent is needed to give the correct answer.

And if you are like me and love to exploit C++ unsurpreased expressive
power, you can even write that like so:

   distance_result d1 = distance(a,b)
   distance_result d2 = distance(c,d)
   if ( d1 < d2 )
      blabla

where the comparison between distance_result types does the magic of
producing an exact answer no matter what the type of input coordinates
are. And this would even prevent the algorithms in the library to
mistakely do the wrong thing.

Following this methodology at large needs support from the bottom layer
of the library, i.e. it needs a exact precicates toolbox that can
operate on arbitrary input types and always yield correct results, but
it also needs a general methodology: i.e. predicates needs to be used
istead of "direct" logic and predicates need to be given only the input
and never intermediate results (or as I'll explain below, that must be
carefully cosidered and instrumented with its own special technique).
These guidelines can be enforced by the lirary given the right design
(i.e. having "disntace" return "distance_type" which actually stores the
  input)

All that above allows algorithms to be called on arbitraty input types
that don't need to carry the weight of providing robustness. i.e.
machine integer and floating point numbers can be used. Couldn't be any
simpler for the end user.

Constructions OTOH are similarly back-boxes like predicates are, but
serve different purposes since it is presumed that the algorithm is well
written and never uses the results from constructions to feed predicates
(or rather they do so only very carefully and by design, and only in
those cases when it was impossible to express the algorithm output as a
direct function of its input).

That is, while the library provides, say, Point p = centroid(a,b,c)
that 'p' is never used to drive the decisions on the algorithm (is never
passed to a predicate), yet still that centroid can be a higher level
facility that *interally* uses whatever it needs to come up with a
reasonable approximated result.

Whenever you can express an algorithm in terms of predicates and
constructions such that predicates only operate on the input and never
on the result of constructions, then your algorithm is robust
automagically without any requirement on the input type and without the
need to to anything at all to workaround numerical issues within the
algorithm itself. That's the job of the predicate oracles.

If an algorithm just can't call predicates with only the input and needs
to step over intermediate results, things get more complicated. However,
if all those intermediate results that would be feed into predicates are
ONLY produced with "constructions", sound alternatives can be explored.

One alternative, the one used the most, is to have the constructions
produce higher level abstactions instead of actual numerical results.
For instance, they can return the expression trees that refer to the
input, and have the predicates evaluate that lazily on demand (so the
predicate is still only really operating on the input).
Again, this needs clever support from the library at design type in
tgat, for example, the result from "centroid" would't be a point but a
centroid_result, storing the input points as well as an approximated
result, which could be perfecttly feed into predicates that could do the
  right thing having the input at hand.

This works reasonably well given the inherent complexity but is still
the subject of active research. Most of the time the use of DAGs for
results is not really needed, so the ideal framework would be able to
mix both native input and lazy expressions in a coherent manner. One
starting idea (that I had many years ago but never had the time to
evolve) would be to have a hybrid geometric type that uses a dynamic
dual representation: input type or DAG.

Now a word on predicates.

In C++ it is actually somewhat trivial to have a blackbox predicate than
can produce correct results in spite of the input type using the
following technique known as a floating-point filter (and would be
equally trivial *within* a virtual machine for any dynamic language FWIW):

template<class T>
bool pred( T a, T b )
{
   interval<T> ia = a ;
   interval<T> ib = b ;

   interval<T> r = some operations on ia and ib

   if ( certainly( r == 0 ) )
      return true ;
   else
   if ( certainly( r != 0 ) )
      return false ;
   else
      return pred<some_exact_type>(a,b);
}

That exploits the capability of interval arithmetic to implement
certified logic. I won't explain how or why here, but "r == 0" doesn't
return bool but TriBool, so you can explicitely query for the "I don't
know" answer, and *only* in that case fall back to exact solution.

All these machinery must ultimately use some mapping from input type and
the interval and exact types needed. Designing that mapping needs to
consder different aspects. For instance, not all algoritm require the
same exact_type for instance: some might need an exact sqrt() for
example. So, the mapping could need to be something like this:

    coordinate_trats<T>::template exact<requiremets_tag>::type

Also, predicates and constructions serve different purposes so they also
have differet requirement on the exact types. Going back to the previous
  straight skeleton example, while the topology must be right, the
coordinates of the center nodes need not, BUT, a user could reasonably
expect a black box costruction to neverthess come up with a decent
approximation, to avoid for instance points totaly off because of
floating point overflow. In this case the algorithm should use a
"constructor" object that can somehow be told what exact type to use,
which might be different than the one used for predicates. Thus,
the traits might need to be more like this:

    coordinate_trats<T>
     ::template exact<for_predicate_tag, requiremets_tag>::type

Best

--
Fernando Cacciola
SciSoft Consulting, Founder
http://www.scisoft-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk