Boost logo

Boost Users :

Subject: Re: [Boost-users] [Graph] Problem with large vertex ids
From: Jeremiah Willcock (jewillco_at_[hidden])
Date: 2013-05-15 13:19:48


On Wed, 15 May 2013, Stephen Woodbridge wrote:

> Hi all,
>
> I'm trying to understand/fix some bugs in pgRouting that use the boost
> libraries. I know C not not C++, so I can understand most of the more C-ish
> parts. We have an old bug that I believe is caused when we add an edge and it
> has large vertex ids or when there is a large number between the min and max
> vertex numbers.
>
> We currently identify the min id number and down shift the ids by that, but
> that does not help if I add an edge (1)->(100000000). I could renumber all
> the nodes and then un-renumber them when done if I have to.

For adjacency lists with vecS as the vertex container type (the default),
there is an assumption that vertex numbers are in the range
0...num_vertices(g)-1 (inclusive). Thus, if you want to use a vertex
number 10000, your graph will need to have at least 10001 vertices (before
you add the edge). There are several data structures in the graph class
that whose sizes are proportional to the number of vertices, so using very
large vertex numbers can end up eating large amounts of memory. There
should not be any limits on vertex counts other than available memory; the
vertex index type is usually size_t or some other large-enough type if I
remember correctly.

> Questions:
>
> 1. is this a known limitiation in Boost graph?

We have run it with larger numbers of vertices before (but more often with
compressed_sparse_row_graph), so it should not be a limitation.

> 2. is there a different type of adjacency list mechanism that does not have
> this issue

Are you using widely separated vertex numbers (i.e., you use large numbers
as vertex IDs but don't actually have a large number of vertices)? If so,
you can try using labeled_graph. Otherwise,

> 3. currently the code crashes the database backend, ideally I would like to
> catch any issues in the C++ wrapper, free memory and return an error code to
> the C caller so we don't kill the server or leak lots of memory.

I suspect you are hitting a buffer overflow here, so that won't be easy to
trap. Compiling with _GLIBCXX_DEBUG defined (with GCC) will turn some of
those problems into assertion failures, but that still won't help you get
an error code. I think the best thing to try would be to fix any overflow
issues; you are likely to get an exception if you try to add more vertices
or edges than will fit into virtual memory (which probably won't happen,
since you will most likely run out of physical memory first, and it's up
to your OS what happens in that case).

> Here is a link to our boost wrapper code:
>
> https://github.com/pgRouting/pgrouting/blob/sew-devel-2_0/src/dijkstra/src/boost_wrapper.cpp
>
> We use that pattern for a lot of functions so if there is a better pattern, I
> might be able to update most(all?) of the functions that use this.

I see a few potential improvements:

1. If you have all of the edge properties up front, you can create the
adjacency list (or a compressed_sparse_row_graph, which will use less
memory) directly using your edges and their properties, rather than adding
them one-at-a-time. Even if you do need to use add_edge, you can add the
edge properties as a fourth argument to that function. The constructor to
adjacency_list to consider using is:

template <class EdgeIterator, class EdgePropertyIterator>
adjacency_list(EdgeIterator first, EdgeIterator last,
                EdgePropertyIterator ep_iter,
                vertices_size_type n,
                edges_size_type m = 0,
                const GraphProperty& p = GraphProperty())

2. Using a raw pointer as a property map (as on lines 115 and 117) often
breaks, especially on Visual C++. The recipe to use for that is:

boost::make_iterator_property_map
   (predecessors.begin(),
    get(boost::vertex_index, graph))

3. You can use edge_predecessor_recorder as a visitor to Dijkstra's
algorithm to get the edges in the path directly from the algorithm, rather
than needing to find them yourself. Use
http://www.boost.org/doc/libs/1_53_0/libs/graph/doc/bfs_visitor.html as a
template for what to do, replacing predecessor_recorder with
edge_predecessor_recorder and using a property map that stores
edge_descriptors rather than vertex_descriptors.

4. Including "using namespace boost" and "using namespace std" together is
likely to break in the future, since C++11's namespace std includes many
of the same names as namespace boost does. Visual C++ has C++11 mode
enabled by default in newer versions.

> I could really use some help resolving this issue.
>
> Meanwhile, I'm off to write a C main() to run this outside of the database
> and in gdb to see whats happening.

OK. You might be able to attach to the database server itself with gdb as
well, but Valgrind might be a more useful tool for seeing what the problem
is.

-- Jeremiah Willcock


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net