Boost Users :

Date view	Thread view	Subject view	Author view

Subject: Re: [Boost-users] [PBGL] Results of distributed BFS on multiple processors vs. on a single processor
From: Cedric Laczny (cedric.laczny_at_[hidden])
Date: 2011-05-04 02:08:57

Next message: Cedric Laczny: "Re: [Boost-users] distributed breadth first search doesn't work properly?"
Previous message: Vitaly Budovski: "[Boost-users] [Spirit] Generating from struct containing single std::vector member"
In reply to: Nick Edmonds: "Re: [Boost-users] [PBGL] Results of distributed BFS on multiple processors vs. on a single processor"

Hi,

On Wednesday, 4. May 2011 01:08:06 Nick Edmonds wrote:
> On Apr 23, 2011, at 5:09 AM, Cedric Laczny wrote:
> > On Friday, 22. April 2011 23:05:04 Nick Edmonds wrote:
> >> On Apr 22, 2011, at 11:01 AM, Cedric Laczny wrote:
> >>> On Friday, 22. April 2011 16:31:20 Steven Watanabe wrote:
> >>>> AMDG
> >>>>
> >>>> On 04/22/2011 05:59 AM, Cedric Laczny wrote:
> >>>>> it recently came to my mind again that the distributed BFS in the
> >>>>> PBGL seems to produce different results when applied to multiple
> >>>>> processors as to when applied to a single processor (s.
> >>>>> http://lists.boost.org/boost-
> >>>>> users/2010/12/64783.php)
> >>>>> If this is true, I really would like to know what the reasoning is
> >>>>> behind this and it should be noted in the documentation of course to
> >>>>> help prevent surprises as in the discussion above.
> >>>>> Unfortunately, at that time, no definite answer was given and this
> >>>>> question puzzled me again now...
> >>>>> Any ideas on that?
> >>>>
> >>>> Why would you expect the results to be exactly the
> >>>> same? Parallelism usually introduces a certain
> >>>> amount of non-determinism.
> >>>
> >>> I agree with you on that, generally. And it is good to note this
> >>> explicitly again.
> >>> However, maybe I just got a weird kind of perception/expectation here,
> >>> but especially for search-algorithms, I would expect parallel
> >>> algorithms to return the same results as otherwise the question of
> >>> correct interpretation of the parallel results arises to me.
> >>> I mean, if you get different distance-results depending on the number
> >>> of processors, what is then the use of distributed BFS, e.g. if you
> >>> want to know the BFS-distance from source to a specific vertex x?
> >>
> >> The current PBGL (and any other *correct* parallel BFS implementation)
> >> will explore one (of a family of valid) breadth-first vertex ordering.
> >> For a variety of reasons the sequential visitor semantics do not map
> >> well to parallel execution however. Confusion about the semantics of
> >> visitors in a parallel context in combination with the fact that the
> >> ordering of vertex discovery and edge traversal within levels is
> >> necessarily non-deterministic often leads to confusion by users about
> >> the results returned by their BFS visitors. A correctly written
> >> visitor will return the same distance labels as the sequential
> >> implementation, but may not explore the vertices/edges in the same
> >> order.
> >
> > Ok, that's interesting to know. IMHO, someone who simply wants to speed
> > up the computation of a BFS by using a parallelised version does
> > generally not care about the order of exploration but is interested only
> > in the result, like the distance labels, and for such a scenario that
> > should be equal to the sequential case. Good to know that this is in
> > fact the case.
> > However, this brings me to question, why the parallel BFS boost-example
> > then returns different distance labels, depending on the number of
> > involved processors? Or is this just a misconception of the output?
>
> Sorry it took me a second to get to this... The BFS example seems to have
> at least one major issue, the distance labels returned appear to be
> completely non-deterministic because they are assigned on tree_edge()
> events which may occur simultaneously on multiple ranks and may continue
> to occur after a vertex has already been discovered by the owning rank.
> The result of these assignments is determined by the reduction operation
> on the distance property map. The default reduction operation is to allow
> an arbitrary write to succeed which will result in incorrect distance
> labels when more than one process is involved. I have no idea how long
> this example has been this way, but judging by the fact that it still uses
> internal properties I'd hazard to guess a very long time. I'll take the
> blame for not catching that on the original release. The quick fix for
> this example would be setting the reduction operation on the distance map
> to choose the minimum distance label. I'll mak e that change and check it
> into the trunk.

Glad that you could make it again. Your advice is definitely needed here, so
thank you.

Do I get you correct on this one, that in the PBGL BFS example, the
  bfs_discovery_visitor(DistanceMap distance) : distance(distance)
  {
    set_property_map_role(vertex_distance, distance);
  }
is actually wrong?

Should the correct way be like in
http://osl.iu.edu/research/pbgl/documentation/distributed_property_map.html
with the "struct rank_accumulate_reducer" where the accumulated value is used?

>
> That approach should work, but there's also a more subtle issue. The
> correctness of this approach relies on channel ordering semantics, i.e.
> that remote operations are performed in the same order as they were
> generated by the receiver. In this case that means that on the target
> rank, the put() in the distance map has to happen before the corresponding
> vertex is removed from the queue, otherwise when its out-edges are
> explored its distance label will be incorrect and thus so will those
> assigned to the targets of its out edges. These data structures are
> independent, as are the sets of remote operations performed on them (i.e.
> the puts() to the property maps and the queue operations). The existing
> communication layer happens to enforce sequential ordering across all
> messages, but this is really just an implementation detail. Furthermore,
> this works as long as there is only one thread per process, but trying to
> parallelize these operations within a process would require brea king the
> underlying channel ordering semantics and result in incorrect behavior.
>
> This is a fundamental issue with the BSP-style programming model of the
> original PBGL. Control flow (via the queue in this case) and data
> dependencies (via property maps) are disjoint, and rely on fragile
> ordering semantics to work properly.
>
> To summarize, mea culpa on not catching this a few versions back and I'll
> check in a more correct version.
>

Great. Thank you again.

> Cheers,
> Nick
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users

Best,

Cedric

Next message: Cedric Laczny: "Re: [Boost-users] distributed breadth first search doesn't work properly?"
Previous message: Vitaly Budovski: "[Boost-users] [Spirit] Generating from struct containing single std::vector member"
In reply to: Nick Edmonds: "Re: [Boost-users] [PBGL] Results of distributed BFS on multiple processors vs. on a single processor"

Date view	Thread view	Subject view	Author view

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net