Boost logo

Boost :

Subject: Re: [boost] [BGL] Multithread problem inserting edges and exploring the same graph
From: Jeremiah Willcock (jewillco_at_[hidden])
Date: 2009-09-23 14:45:29


On Wed, 23 Sep 2009, Cosimo Calabrese wrote:

>
>> It looks like (from your description that I snipped out) that you are
>> really creating a new graph for every thread that is a copy of the original
>> graph plus some new, thread-specific edges. I would not recommend having a
>> single graph for all of this because of the issues you are having. What
>> about the following:
>>
>
> Well, I've really a single, unique, huge istance of an adjacency_list in my
> application. I've called the initial form of that graph: 'original graph'.
> There aren't other graphs in memory. Every thread creates a filtered_graph
> object, that takes a reference of the original graph, but it's not a memory
> copy of it, it's only an adaptor. The thread adds the new edges and vertices
> using a reference of the original graph, and explores the filtered graph. The
> original graph has about 5 million of vertices and 10 million of edges. It
> takes about 1 Gb of RAM, I can't replicate it.

I know you were not doing that literally, but it seemed like the combined
effect of your graph addition, filtering, and deletion meant that each run
of Dijkstra's algorithm was in effect on a graph built this way.

>> 1. Create a union_graph adapter that takes two graphs with the same number
>> of vertices and merges their edge sets (concatenating the out_edges and
>> in_edges for each vertex implicitly). Properties would also need to be
>> handled (or indexes for external properties).
>>
>> 2. For every thread, create a new graph with just that thread's new edges
>> and run Dijkstra's algorithm on the union_graph of that piece and the
>> original graph.
>>
>> 3. Throw away the temporary, thread-specific graph when the thread is done.
>>
>
> It seems a very, very good idea. I like it very much. If it works, I always
> access to the 'original' graph in read-only modality, that is the only shared
> object; I don't need to add any vertex or edge to any existing graph, but
> each thread only needs to create another, completly new graph, which isn't
> shared with other threads. So the thread explore the union adaptor in
> read-only modality. Exciting.
>
> Now I must take a look on some 'graph arithmetic'... This adaptor needs only
> to model the traversal graph concepts. The two graph type in input to the
> union adaptor must be the same. So all the adaptor types required by the
> concepts are the same of the input graphs. Now I need a moment to think about
> the concept funcions (out_edges(), out_degree(), ... ). I'll tell you about
> the state of the adaptor very soon.

Those should be fairly simple, except that I do not believe Boost.Iterator
has a concatenation iterator like the old View Template Library had. If
you write a generic concatenate-two-sequences iterator, you might want to
consider contributing it to Boost.Iterator. Otherwise, you should just be
able to concatenate edge sets (in edge sets, out edge sets) and add
degrees together. Various properties and vertex/edge index computation
might be tougher.

>> That should remove all of your locking (except in the memory allocator),
>
> Yes.
>
>> and the filtered_graph,
>
> Mmmmh, no, not yet. There is one thing of my project I've not mentioned, it's
> difficult to explain all of it... The graph elements have many properties.
> Each thread needs to explore not all the graph, but only a particular view of
> it, that must change dependently by the specific request. So, I need the
> filter_adaptor. But if you think that I haven't understood something about
> this point, please... tell me.

OK. So you mean that you're filtering on more than just the thread ID?
In that case, you might still need a filter.

>> and greatly simplify (and probably speed up) the code. Do you think this
>> will work?
>
> Yes, I'm hopeful. I'll tell you soon. Thanks for the great idea.
>
> But beyond the Jeremiah's idea, no one can explain what happens in my
> application? I'm referring to the memory corruption. Surely it depends on the
> contemporary reading and writing by different thread at the same moment, but
> I would to know why it happens, even if I've locked all the functions.

I do not know -- adjacency_list I believe has out edge (and possibly in
edge) sets for each vertex, plus I think there's a total edge list. That
second one might be getting corrupted even if you change unrelated
vertices. Also, anytime the documentation says that some operation
corrupts iterators into the graph, that means it might move the location
of some internal data structure and thus cause conflicts with multiple
threads. I have not looked specifically at the behavior you are seeing,
though.

-- Jeremiah Willcock


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk