Boost logo

Boost Users :

Subject: [Boost-users] Boost Graph: preventing duplicate vertices when constructing graph incrementally
From: Trevor Harmon (Trevor.W.Harmon_at_[hidden])
Date: 2010-03-29 19:02:12


Hi,

The vast majority of Boost Graph's sample code has pre-defined source
data. For example, from city_visitor.cpp:

   enum { SanJose, SanFran, LA, SanDiego, Fresno, LasVegas, Reno,
          Sacramento, SaltLake, Phoenix, N };
   E edge_array[] = { E(Sacramento, Reno), E(Sacramento, SanFran),
                      E(Reno, SaltLake),
                      E(SanFran, SanJose),
                      E(SanJose, Fresno), E(SanJose, LA),
                      E(LA, LasVegas), E(LA, SanDiego),
                      E(LasVegas, Phoenix) };

Unfortunately, static source data with enums as the vertices isn't
very typical in real-world programs. It's much more common to
construct the graph incrementally from a streaming data set, such as a
file containing a list of edge pairs. But this presents a problem: As
the list of edges is read in, a vertex may be encountered more than
once, so add_vertex can't simply be called on every vertex found
because that will result in redundant vertices.

An obvious fix is to check whether a vertex exists before calling
add_vertex. If it is found, it can be used for the edge; otherwise
add_vertex can be called and the returned value used for the edge. The
only question is, How do I check whether a vertex exists? I could
iterate through all vertices in the graph, searching for one that
equals the one I'm trying to add, but this approach doesn't scale
well. It has a time complexity of O(V).

A faster solution to this problem appears to be in the
boost_web_graph.cpp sample. It deals with the same issue: It reads in
graph data from a file called boost_web.dat, and this file contains
the same vertex value in multiple places. Rather than call add_vertex
multiple times (creating redundancies), the code puts every vertex it
finds into a map. The map associates the vertex's unique value (a
string in this case) with its corresponding vertex descriptor. Then,
before calling add_vertex, the code checks the map for the unique
value; if found, it uses the corresponding vertex descriptor;
otherwise it calls add_vertex and adds the new ID/descriptor pair to
the map.

I assume that this is faster (compared to my naive approach of
searching linearly through the vertices) because std::map's insert
function is supposed to have logarithmic time complexity. It probably
achieves this with a red-black tree, so as long as the keys are
sortable, O(log(V)) complexity is guaranteed. Strings (used in the
sample code) are certainly sortable, although in my program I'm using
pointers as the vertex property. But I guess pointers are sortable
too, since they're just numbers really. My initial tests show that
this approach works properly with pointers as the values, though I've
not done any performance testing, only correctness testing.

Any feedback on this would be appreciated. Thanks,

Trevor


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net