<br><br><div class="gmail_quote">On Fri, May 1, 2009 at 12:34 PM, Bhaskara Marthi <span dir="ltr">&lt;<a href="mailto:bhaskara@gmail.com">bhaskara@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div><div></div><div class="h5"><br><br><div class="gmail_quote">On Fri, May 1, 2009 at 11:34 AM, Thomas Klimpel <span dir="ltr">&lt;<a href="mailto:Thomas.Klimpel@synopsys.com" target="_blank">Thomas.Klimpel@synopsys.com</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>Bhaskara Marthi wrote:<br>
&gt; &gt; Try building in release mode? The debugging instrumentation in the BGL can<br>
&gt; &gt; kill performance.<br>
&gt;<br>
</div><div>&gt; I&#39;m compiling my source files with -O3 using gcc on ubuntu.<br>
&gt; That should be sufficient, right?<br>
<br>
</div>For release mode, NDEBUG must be defined. However, a superficial glance at the sources doesn&#39;t reveal any debugging instrumentation depending on NDEBUG. Andrew, is there any?<br>
<br>
Another question for Andrew: Why was the relaxed heap replaced with a 4-ary heap? In my experience, a normal array based binary heap is a nightmare for the cache, and I wonder whether the 4-ary heap will be much better in this respect. However, I don&#39;t know whether the cache behavior of a relaxed heap is much better.<br>


<br>
Regards,<br>
Thomas<br>
_______________________________________________<br>
Boost-users mailing list<br>
<a href="mailto:Boost-users@lists.boost.org" target="_blank">Boost-users@lists.boost.org</a><br>
<a href="http://lists.boost.org/mailman/listinfo.cgi/boost-users" target="_blank">http://lists.boost.org/mailman/listinfo.cgi/boost-users</a><br>
</blockquote></div><br></div></div>I verified that I&#39;m compiling with NDEBUG.� Compiler flags are shown below:<br><br>/usr/bin/c++�� -Dtopological_graph_EXPORTS�� -O3 -DNDEBUG -fPIC -I/u/bhaskara/ros/ros-pkg/world_models/topological_map/include -I/u/bhaskara/ros/ros-pkg/deprecated/deprecated_msgs/msg/cpp -I/u/bhaskara/ros/ros/core/rosconsole/include -I/opt/ros/include/boost-1_37 -I/u/bhaskara/ros/ros-pkg/common/tinyxml/include -I/u/bhaskara/ros/ros-pkg/motion_planning/sbpl/src -I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/include -I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/srv/cpp -I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/msg/cpp -I/u/bhaskara/ros/ros-pkg/world_models/costmap_2d/include -I/u/bhaskara/ros/ros-pkg/common/laser_scan/include -I/u/bhaskara/ros/ros-pkg/common/laser_scan/msg/cpp -I/u/bhaskara/ros/ros-pkg/common/filters/include -I/u/bhaskara/ros/ros-pkg/common/loki/build/loki-0.1.7/include -I/u/bhaskara/ros/ros-pkg/common/angles/include -I/u/bhaskara/ros/ros-pkg/common/robot_srvs/srv/cpp -I/u/bhaskara/ros/ros-pkg/common/tf/include -I/u/bhaskara/ros/ros-pkg/common/tf/msg/cpp -I/u/bhaskara/ros/ros-pkg/common/tf/srv/cpp -I/u/bhaskara/ros/ros/core/roscpp/include -I/u/bhaskara/ros/ros/3rdparty/xmlrpc++/src -I/u/bhaskara/ros/ros-pkg/common/bullet/build/include -I/u/bhaskara/ros/ros-pkg/common/bullet/swig -I/u/bhaskara/ros/ros-pkg/visualization_core/visualization_msgs/msg/cpp -I/u/bhaskara/ros/ros-pkg/common/robot_msgs/msg/cpp -I/u/bhaskara/ros/ros/std_msgs/msg/cpp -I/u/bhaskara/ros/ros/core/roslib/msg/cpp -I/u/bhaskara/ros/ros/core/roslib/include -I/u/bhaskara/ros/ros-pkg/3rdparty/eigen/build/eigen-2.0.0 -I/opt/ros/include -I/u/bhaskara/ros/ros/3rdparty/gtest/gtest/include -I/u/bhaskara/ros/ros-pkg/world_models/topological_map/srv/cpp�� -DROS_PACKAGE_NAME=\&quot;topological_map\&quot; -O3 -DBT_USE_DOUBLE_PRECISION -DBT_EULER_DEFAULT_ZYX -DBT_USE_DOUBLE_PRECISION -DBT_EULER_DEFAULT_ZYX -W -Wall -Wno-unused-parameter -fno-strict-aliasing -o CMakeFiles/topological_graph.dir/src/grid_graph.o -c /u/bhaskara/ros/ros-pkg/world_models/topological_map/src/grid_graph.cpp<br>

<br><br>I also tried using my own priority-queue based version shown below.� On a graph with about 10^4 nodes and degree 4, my version takes about .05 seconds, while dijkstra_shortest_path takes ten times as long.<br>- Bhaskara<br>

</blockquote><div><br>I came up with a potential explanation.� So my graph is actually about 10^6 nodes in size, but the initial node is in a connected component of size 10^4.� Now, my implementation only scales with the size of this connected component: nodes are added to the distance map only as and when they are visited, so all those nodes outside the connected component don&#39;t matter.� But, I suspect that bgl&#39;s dijkstra_shortest_paths is using data structures that scale with the graph size rather than the connected component.� If heap performance scales logarithmically, the 100-fold increase in size would make the algorithm several times slower. <br>
<br>Bgl experts: does this seem like the right explanation?� If so, what should I do: are there algorithms that take advantage of the smaller connected component, or am I stuck rolling my own?<br><br>- Bhaskara<br><br><br>
<br><br><br>Also, I just checked what the time is being spent on in dijkstra.� Apparently, about half the time is spent on the initialization phase of marking every node&#39;s distance as infinite.� My algorithm avoids this.� The graph I&#39;m trying it on is actually a million-node graph, but the start node is in a connected component of size 10,000.�� So that explains part of the performance difference --- not including the initialization, my implementation is just four times faster than dijkstra_shortest_paths.� Once the initialization is complete, though, are there other parts of the algorithm that scale with the overall graph size?� <br>
</div></div><br>