Boost logo

Boost Users :

From: Detlef Mages (mages_at_[hidden])
Date: 2006-02-08 14:37:44


On Wednesday 08 February 2006 15:56, Doug Gregor wrote:
> On Feb 8, 2006, at 5:13 AM, Detlef Mages wrote:
> > I am using astar_search from the BGL and have sometimes hangs due to an
> > infinite loop (no, I do not use negative costs or negative heuristic
> > values).
> >
> > Now here is the strange problem: when compiling the attached (see
> > below) test
> > program with compiler optimization enabled it runs into an infinite
> > loop.
> > Without optimization a solution is found.
> >
> > I need help, because I do not know what I should do next. Is this a
> > compiler
> > bug? Is there something boost can do (workaround)? Is there a known
> > issue
> > (could not find anything) with the relax function?
>
> We ran into a similar problem with Dijkstra's algorithm in the Parallel
> BGL, with the same strange symptoms. Once we saw that the problem occur
> on Intel x86 hardware but not our PowerPC-based Macs, we figured out
> what was going wrong. The floating point unit in x86 machines has 80
> bits of precision, whereas values of type "double" have 64 bits of
> precision. When relaxing occurs, it's possible that the A* search finds
> a new path that is just barely better than the old path, so the edge is
> relaxed. However, the "just barely" better requires more than 64 bits
> of precision to describe, so when the new distance is written back into
> memory the new distance is effectively the same as the old distance...
> so we end up looping, finding a better distance each time but having
> that better distance truncated back to the old value. This only occurs
> when optimizations are enabled because the optimizations are needed to
> turn all of the code in "relax" into efficient FPU operations, and it
> only happens on x86 because nobody else (that I know of) has more
> precision in the FPU than in memory. One fix is to force the compiler
> to write the relaxed distance value back into memory (to round it),
> then compare against the old distance (already rounded); if the values
> aren't equal we can continue.
>
> Thanks for the test case. I was able to duplicate the problem on x86
> Linux with GCC and I've committed a fix to Boost CVS. The updated
> astar_search.hpp is also attached. May we have your permission to put
> this test case under the Boost Software License (all versions) and
> include it in the BGL test suite?
>
> Doug

Thanks for the immediate fix. I tested the fixed astar_search.hpp and
everything seems to work now. I already had faint intuition about decimal
point precission, but discarded it due to my strong believe in the
correctness without side-effects of how compilers address hardware.

Attached you will find a cleaned up test case. I managed to reduce the graph
even more without loosing its sensibility to the issue. Please feel free to
put it in the BGL test suite.

Thanks again for your help.

Detlef




Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net