Boost logo

Boost :

Subject: Re: [boost] [BGL] Trying to get a correclty working Parallel BFS code
From: Sandeep Gupta (gupta.sandeep_at_[hidden])
Date: 2009-11-04 18:11:57


On Wed, Nov 4, 2009 at 12:18 PM, Nick Edmonds <ngedmond_at_[hidden]>wrote:

>
> On Nov 3, 2009, at 10:44 PM, Sandeep Gupta wrote:
>
> On Mon, Nov 2, 2009 at 12:43 PM, Nick Edmonds <ngedmond_at_[hidden]
>> >wrote:
>>
>>
>>> On Oct 31, 2009, at 1:00 AM, Sandeep Gupta wrote:
>>>
>>> On Fri, Oct 30, 2009 at 6:41 AM, Jeremiah Willcock <jewillco_at_[hidden]
>>>
>>>> wrote:
>>>>>
>>>>
>>>> On Thu, 29 Oct 2009, Sandeep Gupta wrote:
>>>>
>>>>>
>>>>> On Thu, Oct 29, 2009 at 1:12 PM, Jeremiah Willcock <
>>>>> jewillco_at_[hidden]
>>>>>
>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>> On Thu, 29 Oct 2009, Sandeep Gupta wrote:
>>>>>>
>>>>>>
>>>>>>> On Wed, Oct 28, 2009 at 5:49 PM, Sandeep Gupta <
>>>>>>> gupta.sandeep_at_[hidden]
>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 27, 2009 at 9:31 PM, Jeremiah Willcock <
>>>>>>>>
>>>>>>>>> jewillco_at_[hidden]
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 27 Oct 2009, Sandeep Gupta wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The error snippet regarding predecessor map:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> error: no match for call to
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> '(boost::property_reduce<boost::vertex_predecessor_t>::apply<boost::detail::error_property_not_found>)
>>>>>>>>>>>> (const boost::detail::parallel::global_descriptor<unsigned
>>>>>>>>>>>> int>&)'
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ../../../boost/property_map/parallel/impl/distributed_property_map.ipp:141:
>>>>>>>>>>>> error: no match for call to
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> '(boost::property_reduce<boost::vertex_predecessor_t>::apply<boost::detail::error_property_not_found>)
>>>>>>>>>>>> (const boost::detail::parallel::global_descriptor<unsigned
>>>>>>>>>>>> int>&,
>>>>>>>>>>>> boost::detail::error_property_not_found&, const
>>>>>>>>>>>> boost::detail::error_property_not_found&)'
>>>>>>>>>>>> ../../../boost/graph/parallel/properties.hpp:95: note:
>>>>>>>>>>>> candidates
>>>>>>>>>>>> are:
>>>>>>>>>>>> T
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> boost::property_reduce<boost::vertex_predecessor_t>::apply<T>::operator()(T)
>>>>>>>>>>>> const [with T = boost::detail::error_property_not_found]
>>>>>>>>>>>> ../../../boost/graph/parallel/properties.hpp:96: note:
>>>>>>>>>>>> T
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> boost::property_reduce<boost::vertex_predecessor_t>::apply<T>::operator()(T,
>>>>>>>>>>>> T, T) const [with T = boost::detail::error_property_not_found]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Jeremiah, Do you think that I should file a bug report for this.
>>>>>>>>>>>>
>>>>>>>>>>>> Although I was hoping (and actually needed quite urgently) that
>>>>>>>>>>>> it
>>>>>>>>>>>>
>>>>>>>>>>>> would
>>>>>>>>>>>>
>>>>>>>>>>> be
>>>>>>>>>>> minor issue and get fixed quickly.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> You need to pass a predecessor map to the algorithm as one of the
>>>>>>>>>>>
>>>>>>>>>>> parameters
>>>>>>>>>> (or put it as a part of your graph but making an external property
>>>>>>>>>> map
>>>>>>>>>> to
>>>>>>>>>> give
>>>>>>>>>> to the algorithm is easier).
>>>>>>>>>>
>>>>>>>>>> Hi Jeremy,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It took me a while but I finally figure out how to pass the
>>>>>>>>>>
>>>>>>>>> predecessor
>>>>>>>>>
>>>>>>>>> map. Unfortunately it doesn't have any effect. Also, I might be
>>>>>>>>>
>>>>>>>>>> wrong
>>>>>>>>>> but I
>>>>>>>>>> don't see any logical reason why predecessor map should have any
>>>>>>>>>> bearing
>>>>>>>>>> on
>>>>>>>>>> the correctness of the depth. I have attached the new code. I am
>>>>>>>>>> not
>>>>>>>>>> able
>>>>>>>>>> to print out the predecessor because I am not able to figure out
>>>>>>>>>> its
>>>>>>>>>> type.
>>>>>>>>>> Maybe you could help me resolve this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It would be nice to have this code running. I need to profile
>>>>>>>>>> graph
>>>>>>>>>>
>>>>>>>>>> performance on a new machine by tomorrow. Again, thanks for you
>>>>>>>>> patience
>>>>>>>>> and
>>>>>>>>> time. I really appreciate you looking into this.
>>>>>>>>>
>>>>>>>>> Hi Jeremiah,
>>>>>>>>>
>>>>>>>>> Was just wondering if you had time to look into this or any
>>>>>>>>> suggestion
>>>>>>>>>
>>>>>>>>> on
>>>>>>>>>
>>>>>>>> how further proceed. I can get you the output of predecessor map if
>>>>>>>> that
>>>>>>>> would help.
>>>>>>>> Just that I haven't be been able to figure out what the is type of
>>>>>>>> the
>>>>>>>> property map returned by make_distributed_property_map.
>>>>>>>> Please let me know your thoughts.
>>>>>>>>
>>>>>>>>
>>>>>>>> The person who really knows this library is not around as far as I
>>>>>>>>
>>>>>>> know;
>>>>>>> he
>>>>>>> might be on vacation. The type returned from
>>>>>>> make_distributed_property_map
>>>>>>> is documented on <URL:
>>>>>>> http://www.osl.iu.edu/research/pbgl/documentation/graph/
>>>>>>> distributed_property_map.html>. One thing that you could do to debug
>>>>>>> things is to put in your own color map and look at what values it
>>>>>>> ends
>>>>>>> up
>>>>>>> with at the end of the algorithm, and possibly even to put in a local
>>>>>>> color
>>>>>>> map that prints out when elements are changed. That might be a lot
>>>>>>> of
>>>>>>> work
>>>>>>> for what could end up being an obvious problem (that I just don't
>>>>>>> know
>>>>>>> how
>>>>>>> to diagnose), though. BTW, does the example in
>>>>>>> libs/graph_parallel/example/breadth_first_search.cpp work for you?
>>>>>>> It
>>>>>>> appears to be doing the same thing you want to do. If that works, it
>>>>>>> could
>>>>>>> be a problem in your visitor.
>>>>>>>
>>>>>>>
>>>>>>> I derived this code from
>>>>>>> graph_parallel/example/breadth_first_search.cpp.
>>>>>>>
>>>>>>> The problem is that I don't understand the input and the output of
>>>>>>> the
>>>>>>>
>>>>>> example code. It would be great if I could get an explanation for the
>>>>>> format of the input file. Then I can transform my graph into that
>>>>>> format
>>>>>> and pass to the algorithm. That would be enough for my current
>>>>>> purpose.
>>>>>>
>>>>>> As time permits I would try out your suggestions and let know of
>>>>>> update.
>>>>>> In
>>>>>> the meantime I am hoping that I would get input from other relevant
>>>>>> authors.
>>>>>>
>>>>>>
>>>>>> This is what I could tell from the code.
>>>>>
>>>>> Input:
>>>>> http://people.sc.fsu.edu/~burkardt/data/metis_graph/>
>>>>> <
http://people.sc.fsu.edu/%7Eburkardt/data/metis_graph/>
>>>>>
>>>>> metis_graph.html<
>>>>> http://people.sc.fsu.edu/%7Eburkardt/data/metis_graph/metis_graph.html
>>>>> >
>>>>>
>>>>>
>>>>> Output:
>>>>> http://www.graphviz.org/doc/info/lang.html
>>>>>
>>>>>
>>>>> Thanks Jeremiah. The link was helpful. I understood the input graph
>>>>> format.
>>>>> Its for undirected graph only. However the BFS output is still coming
>>>>> out
>>>>> to
>>>>> be incorrect. I tried with a sample line graph of 4 nodes and it
>>>>> exhibits the same problem as before.
>>>>>
>>>>> The distances of nodes on process 2 is not getting updated. On the
>>>>> other
>>>>>
>>>> hand dijkstra shortest paths example is working correctly.
>>>>
>>>> Thanks
>>>> Sandeep
>>>>
>>>>
>>> I rewrote most of your code before I realized that you're outputting all
>>> your distances on process 0, even for vertices process 0 doesn't own.
>>> The
>>> large value you see is std::numeric_limits<distance_type>::max. When you
>>> call get(distance, vertex(i, g)) for some i not owned by the current
>>> process
>>> a ghost cell is created for the requested value and a request sent to the
>>> owner of that vertex for the correct value. get() then returns whatever
>>> the
>>> default value is for the distance property map, which in this case is
>>> infinity (i.e. std::numeric_limits<distance_type>::max.
>>>
>>> In some cases you may get the correct distance value because process 0
>>> has
>>> previously requested that value (and it may or may not be up to date).
>>> If
>>> process 0 has never requested that distance value and it doesn't own the
>>> vertex, then you'll get infinity.
>>>
>>> You need to gather all your data before you output it from a single node
>>> (i.e. issue a get() or request() which is a get with no return value, for
>>> each data element). If you want your code to be scalable you should
>>> output
>>> your data in a distributed fashion from all nodes at once. Remember that
>>> updated values won't be guaranteed to arrive until the next
>>> synchronization
>>> step.
>>>
>>> If you want the modified version of your code let me know and I'll finish
>>> it and ship it your way.
>>>
>>> -Nick
>>>
>>>
>>> Hi Nick,
>>>
>>> Thanks so much for looking into this. Sorry, I couldn't get to it
>> earlier
>> due to other concerns. I corrected the mistake you mentioned and I don't
>> get
>> numeric_limit<>::max() for any nodes anymore.
>>
>> I believe the graph/distributed/graphviz.hpp was outputting the distance
>> in
>> correct fashion.
>>
>> Unfortunately, the output is still coming out to be incorrect.
>> It was also the case with output from graphviz.hpp.
>>
>> For the test graph that we mentioned before, essentially
>> 0 --> 1 --> 2 --> 3,
>> 0 --> 4 --> 3
>>
>> the distance output is as follows:
>> global node-id : distance
>> 0 : 0
>> 1 : 1
>> 2 : 2
>> 3 : 1 //(wrong: should be 2)
>> 4 : 0 // should be 1
>>
>> If possible can you send me your version of the code. Let me know if you
>> need mine (the updated version).
>>
>
>
> Given the description of your problem above, and assuming you're using a
> block distribution with 8 vertices, it sounds like you're initializing the
> distance to the first vertex on *each* process to 0, as opposed to only the
> distance to the source. The vertex with global index 4 is actually local
> index 0 on process one using the above distribution. If you initialize the
> distances to the vertices with global indices 0 and 4 to 0, then your
> results make sense. Let me know if that's not the case, but I surmise that
> it is.
>
> Very much so :-). Thanks for getting me started with by catching these
minor but fatal mis-understandings with PBFS code. I get the feeling the I
haven't read some vital documentation, apart from the manual, on regarding
distributed graph.

Also there's a problem in your depth labeling visitor. In the case of
> cross-processor edges the process that owns the target of the edge doesn't
> necessarily have the distance to the source of the edge available locally.
> This will lead to incorrect distance labels even though tree and non-tree
> edges will still be correctly recognized since the BFS is
> level-synchronized. The fastest way to handle this is to modify the BFS
> queue to also pass distance data (which is how the shortest paths algorithms
> do it). You could also send the data separately and probe for it on the
> receiving side. Check out the distributed betweenness centrality code for
> how to create a distributed queue which contains tuples of data one element
> of which is a vertex (in order to know where to route the data).
> Integrating this type of vertex queue (i.e. one that contains a data
> payload) with BFS basically just requires handling the data payload on the
> receiving side.
>
> I would try this out. So basically the default call

 boost::breadth_first_search
     (g, start,
boost::visitor(boost::make_bfs_visitor(boost::record_distances(distance,
boost::on_tree_edge()))));

wouldn't work. I assumed that distributed BFS implementation did exactly
what you mentioned.
Although I would write my own, parallel visitor as well per your
suggestions.

thanks
Sandeep

> Cheers,
>
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk