Boost Users :

Date view	Thread view	Subject view	Author view

Subject: Re: [Boost-users] Why does the creating of distributed adjacency list cost so much time?
From: Nick Edmonds (ngedmond_at_[hidden])
Date: 2011-10-13 10:59:05

Next message: Richard Hazlewood: "[Boost-users] [chrono] Compilation with MinGW/gcc 4.4.0"
Previous message: Ovanes Markarian: "[Boost-users] [xpressive] treating special literals as non-regex"
In reply to: Yan: "[Boost-users] Why does the creating of distributed adjacency list cost so much time?"
Next in thread: Yan: "Re: [Boost-users] Why does the creating of distributed adjacency list cost so much time?"
Reply: Yan: "Re: [Boost-users] Why does the creating of distributed adjacency list cost so much time?"

On Oct 13, 2011, at 9:53 AM, Yan wrote:

> Hi all,
> I'm learning to use the distributed adjacency list and distributed
> dijkstra
> SP-algorithm. I use some test data contains about 3,800,000 edges and
> 2,800,000 vertices. I use a pair of iterators to pass the data to the
> distributed adjacency list, like that:
> Graph g(reader.begin(), reader.end(), reader.weight_begin(),
> reader.num_vertices(), pg, dist);
> "reader" is an user-defined class which supplys required data
> (edges,
> vertices, weights).
> "pg" is defined like that "mpi_process_group pg;"
> "dist" is supplied by the reader like that
> "graph::metis_distribution
> dist = reader.distribution();". It is based on a metis partition file
> coordinating the test data.
> I run 2 processes on my notebook computer(Intel core2 2.0G, Ram 2G,
> WinXPsp3).
>
> I also test the same data using a sequential bgl dijkstra SP-
> algorithm on
> this computer.
>
> The result is that:
> Parallel bgl:
> Creating of Graph g costs *10080 seconds*; dijkstra computing costs
> 28.281
> seconds.
> Sequential bgl:
> Creating of Graph g costs *4.563 seconds*; dijkstra computing costs
> 1.703
> seconds.
>
> As you see, when using distributed adjacency list, it costs so much
> time to
> create the graph. Is it nomal?
>
> I have done some more comparison between 2-processes parallel
> dijkstra on
> one core2 computer and sequential dijkstra on the same computer, using
> different scale data set. Always, the sequential bgl dijkstra costs
> less
> time than the 2-processes parallel dijkstra. What's the problem? I
> have read
> the boost document on
> "boost_1_46_1\libs\graph_parallel\doc\html
> \dijkstra_shortest_paths.html",
> which shows a good performance and speedup in the charts. Doesn't
> that mean
> less time costing than the sequential program?
>
>
> Any advice would be appreciated.
>
> Thanks,
> Yan

While I can't offer categorical answers, my first reaction is to say
that it is highly likely that you're paging in the example you
provide. With only 2G of RAM and the memory consumed by the
mpi_process_group (discussed in another thread on this list, look
there for a description of how to reduce it) plus whatever else is
running on your laptop it's entirely possible that you're running out
of memory. Once the PBGL begins to page between mpi_process_group
buffers and graph data the runtime becomes entirely unpredictable and
if the degree of paging is high very, very long. It's also worth
noting that parallel speedup of the PBGL vs. sequential BGL depends
on the graph structure and size, it's is entirely possible, and not
uncommon, to find graphs for which no speedup or negative speedup is
observed. The distributed communication mechanisms in the PBGL incur
non-trivial overhead due to copying and serialization overhead. The
overhead of the serialization in PBGL is so high that I have seldom
found it beneficial to run multiple MPI ranks within a single address
space (as you are doing on your two core laptop). There are a number
of ways to speed up serialization (by defining BOOST_MPI_HOMOGENOUS
to avoid MPI_Pack()/MPI_Unpack() overhead amongst other
optimizations). I considered rewriting PBGL to make it possible to
avoid serialization entirely, and in fact the new prototype active
message based version does this, but it's quite a bit of work and I
likely won't have the time to make similar changes to the Boost
version of the library. The new version of the library also
incorporates parallelism within address spaces via threading or
(potentially) off-loading to accelerators rather than running
multiple MPI processes.

Hope that answers your question and gives you some ideas regarding
where you can expect to find (known) performance issues in PBGL and
how to ameliorate them to some degree... as well as what you could do
to improve PBGL if you're so inclined!

Cheers,
Nick

Next message: Richard Hazlewood: "[Boost-users] [chrono] Compilation with MinGW/gcc 4.4.0"
Previous message: Ovanes Markarian: "[Boost-users] [xpressive] treating special literals as non-regex"
In reply to: Yan: "[Boost-users] Why does the creating of distributed adjacency list cost so much time?"
Next in thread: Yan: "Re: [Boost-users] Why does the creating of distributed adjacency list cost so much time?"
Reply: Yan: "Re: [Boost-users] Why does the creating of distributed adjacency list cost so much time?"

Date view	Thread view	Subject view	Author view

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net