Subject: Re: [boost] [BGL] Multithread problem inserting edges and exploring the same graph
From: Nick Edmonds (ngedmond_at_[hidden])
Date: 2009-09-21 14:39:56
On Sep 21, 2009, at 10:59 AM, Cosimo Calabrese wrote:
>> Here are some ideas:
>> Do you need all of the properties in the explorer threads? maybe you
>> can construct a somewhat cheap copy of the graph of just vertices,
>> edges and weights?
> Unfortunately no, my graph has 5 million of vertices and 10 million
> of edges, and the properties RAM cost is very lower confronted to
> the graph structure. The graph structure takes approximately the
> same RAM quantity of the graph + properties.
> Moreover I've tried to make a graph copy, but it takes about 30
> seconds on my machine (Pentium D 2.8GHz, 3Gb RAM, WinXP); it's too
> much confronted to the Dijkstra's time (about 6-7 seconds).
>> When each explorer thread runs an instance of Dijkstra you might
>> consider a parallel version of Dijkstra [1,2] working on disjoint
>> graph regions? That might allow you to cut the graph into pieces and
>> assigning a 'local' version of the graph to each explorer thread and
>> joining them later on?
> These arguments are relatively new to me... do you think that's the
> I've take a look to PBGL and MTGL:
> Maybe I can use them to resolve my problem?
Having written significant portions of one and used/hacked on the
other I can say with some certainty that neither one of them will
solve your problems out-of-the-box (at least not in the way I
propose). Firstly, the problem seems somewhat ill defined so I've got
a few questions to nail down the particulars:
1. When does the shortest paths solution have to be correct (i.e.
match the graph exactly)?
2. Are there any constraints as to when your 'integrator' thread can
call add_edge() (this relates to 1. pretty directly)
Now the solution I propose, since you have a monotonically increasing
edge set you can actually compute an incremental solution to the SSSP
problem rather than completely re-running Dijkstra. Basically you can
patch up the SSSP solution when you do the edge addition in O(N) time
(expected time should be much lower but depends on parameters of your
graph that I don't have so I won't try to explain in detail). Either
way this is either somewhat better, or much better than O(N log N) to
re-run Dijkstra. All you do is push the new weight onto the Dijkstra
queue and run Dijkstra as normal.
The key question is does your SSSP solution have to be exact? If not
you can simply spin off the 'patching up the SSSP' work to the
'Explorer' thread and get a solution that's reasonably likely to be
accurate, otherwise it should be faster than completely re-running
Dijkstra to patch up the solution. If you really do need concurrent
access to the adjacency list then I'd just wrap a readers-writers lock
around each vertex's adjacencies.
I'm actually working on incremental algorithms in parallel and will
likely end up adding some sequential counterparts as well. Dijkstra
is a pretty simple one so if you get a hold of me with the particulars
of your problem and it fits with the stuff I was going to do anyway I
may be able to write what you need fairly quickly.
On the 'partitioning/distributing the graph' note, if you want to go
that route there's a lot of support for it in PBGL. You'd probably
want to rip out all the MPI stuff and put in a shared memory process
group. You could run MPI over SM, but if the preliminary data I've
run on the matter is any indication performance will be abysmal and
memory consumption will go way up. Also, you're still going to have
to deal with the concurrent access to the adjacency-list issue. I've
adopted the general rule that partitioning data structures is only
appropriate when 1. you don't have a shared address space, or 2. you
have hideous contention and no other way to alleviate it. That's just
my $0.02 though, and subject to change at any time.
> Cosimo Calabrese.
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost