On Fri, May 1, 2009 at 12:34 PM, Bhaskara Marthi <bhaskara@gmail.com> wrote:

On Fri, May 1, 2009 at 11:34 AM, Thomas Klimpel <Thomas.Klimpel@synopsys.com> wrote:

Bhaskara Marthi wrote:
> > Try building in release mode? The debugging instrumentation in the BGL can
> > kill performance.
>

> I'm compiling my source files with -O3 using gcc on ubuntu.
> That should be sufficient, right?

For release mode, NDEBUG must be defined. However, a superficial glance at the sources doesn't reveal any debugging instrumentation depending on NDEBUG. Andrew, is there any?

Another question for Andrew: Why was the relaxed heap replaced with a 4-ary heap? In my experience, a normal array based binary heap is a nightmare for the cache, and I wonder whether the 4-ary heap will be much better in this respect. However, I don't know whether the cache behavior of a relaxed heap is much better.

Regards,
Thomas
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

I verified that I'm compiling with NDEBUG. Compiler flags are shown below:

/usr/bin/c++ -Dtopological_graph_EXPORTS -O3 -DNDEBUG -fPIC -I/u/bhaskara/ros/ros-pkg/world_models/topological_map/include -I/u/bhaskara/ros/ros-pkg/deprecated/deprecated_msgs/msg/cpp -I/u/bhaskara/ros/ros/core/rosconsole/include -I/opt/ros/include/boost-1_37 -I/u/bhaskara/ros/ros-pkg/common/tinyxml/include -I/u/bhaskara/ros/ros-pkg/motion_planning/sbpl/src -I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/include -I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/srv/cpp -I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/msg/cpp -I/u/bhaskara/ros/ros-pkg/world_models/costmap_2d/include -I/u/bhaskara/ros/ros-pkg/common/laser_scan/include -I/u/bhaskara/ros/ros-pkg/common/laser_scan/msg/cpp -I/u/bhaskara/ros/ros-pkg/common/filters/include -I/u/bhaskara/ros/ros-pkg/common/loki/build/loki-0.1.7/include -I/u/bhaskara/ros/ros-pkg/common/angles/include -I/u/bhaskara/ros/ros-pkg/common/robot_srvs/srv/cpp -I/u/bhaskara/ros/ros-pkg/common/tf/include -I/u/bhaskara/ros/ros-pkg/common/tf/msg/cpp -I/u/bhaskara/ros/ros-pkg/common/tf/srv/cpp -I/u/bhaskara/ros/ros/core/roscpp/include -I/u/bhaskara/ros/ros/3rdparty/xmlrpc++/src -I/u/bhaskara/ros/ros-pkg/common/bullet/build/include -I/u/bhaskara/ros/ros-pkg/common/bullet/swig -I/u/bhaskara/ros/ros-pkg/visualization_core/visualization_msgs/msg/cpp -I/u/bhaskara/ros/ros-pkg/common/robot_msgs/msg/cpp -I/u/bhaskara/ros/ros/std_msgs/msg/cpp -I/u/bhaskara/ros/ros/core/roslib/msg/cpp -I/u/bhaskara/ros/ros/core/roslib/include -I/u/bhaskara/ros/ros-pkg/3rdparty/eigen/build/eigen-2.0.0 -I/opt/ros/include -I/u/bhaskara/ros/ros/3rdparty/gtest/gtest/include -I/u/bhaskara/ros/ros-pkg/world_models/topological_map/srv/cpp -DROS_PACKAGE_NAME=\"topological_map\" -O3 -DBT_USE_DOUBLE_PRECISION -DBT_EULER_DEFAULT_ZYX -DBT_USE_DOUBLE_PRECISION -DBT_EULER_DEFAULT_ZYX -W -Wall -Wno-unused-parameter -fno-strict-aliasing -o CMakeFiles/topological_graph.dir/src/grid_graph.o -c /u/bhaskara/ros/ros-pkg/world_models/topological_map/src/grid_graph.cpp

I also tried using my own priority-queue based version shown below. On a graph with about 10^4 nodes and degree 4, my version takes about .05 seconds, while dijkstra_shortest_path takes ten times as long.
- Bhaskara

I came up with a potential explanation. So my graph is actually about 10^6 nodes in size, but the initial node is in a connected component of size 10^4. Now, my implementation only scales with the size of this connected component: nodes are added to the distance map only as and when they are visited, so all those nodes outside the connected component don't matter. But, I suspect that bgl's dijkstra_shortest_paths is using data structures that scale with the graph size rather than the connected component. If heap performance scales logarithmically, the 100-fold increase in size would make the algorithm several times slower.

Bgl experts: does this seem like the right explanation? If so, what should I do: are there algorithms that take advantage of the smaller connected component, or am I stuck rolling my own?

- Bhaskara

Also, I just checked what the time is being spent on in dijkstra. Apparently, about half the time is spent on the initialization phase of marking every node's distance as infinite. My algorithm avoids this. The graph I'm trying it on is actually a million-node graph, but the start node is in a connected component of size 10,000. So that explains part of the performance difference --- not including the initialization, my implementation is just four times faster than dijkstra_shortest_paths. Once the initialization is complete, though, are there other parts of the algorithm that scale with the overall graph size?