Boost logo

Boost Users :

Subject: Re: [Boost-users] Using BGL for huge grid graphs
From: Nicholas Edmonds (ngedmond_at_[hidden])
Date: 2014-01-21 03:40:08


Hi Knut,

For the shortest-paths algorithm itself you’re probably looking at minutes, most likely a small number of them. The diameter of your graph and the parameters to your algorithm regarding how much work to do in each phase will determine the number of super-steps required, which plays a large role in the overall runtime of the algorithm.

In my experience it takes longer to import the data and build the graph than to actually run a single instance of an algorithm. The performance of your distributed filesystem plays a big role here. Also, the CSR graph has special “in-place” constructors that allow you to build your graph without duplicating the edges, which doubles the amount of graph data you can store on each node which are probably worth looking into.

-Nick

On Jan 21, 2014, at 12:12 AM, Knut Krause <knut.krause_at_[hidden]> wrote:

> Hi Nick,
>
> thanks for your reply. Could give a guess how long path finding would take on
> your suggested setup? It doesn't have to be precise just something like
> seconds, minutes, hours, days.
>
> Knut
>
> Am Dienstag, 21. Januar 2014, 00:06:23 schrieb Nicholas Edmonds:
>> On Jan 15, 2014, at 3:10 AM, Knut Krause <knut.krause_at_[hidden]> wrote:
>>> Hi,
>>>
>>> I'm planning on using the BGL on huge graphs the represent a grid where
>>> each vertex is a cell connected to it's 8 neighbours. I think something
>>> around 2,500,000,000 vertices would be a good number to approach but it
>>> would be great if it could handle (a lot) more later.
>>>
>>> If I got this graph I have to find the shortest path between two vertices.
>>>
>>> Does anyone here have experiences with graphs that large? Is this even
>>> worth a try on "normal" hardware?
>>>
>>> The edge weights would be stored in the vertices. So if you want to go
>>> from
>>> (a) to (b) you'd have to pay the cost stored in (b). The graph is not
>>> directed either.
>>>
>>> I read about BGLs CSR graph
>>> http://www.boost.org/doc/libs/1_55_0/libs/graph/doc/compressed_sparse_row.
>>> html but since I honestly have no clue how to use BGL yet I thought I'd
>>> first hear your opinons.
>>
>> I know you asked about the sequential BGL, but as a data point, I’ve run the
>> Parallel BGL on graphs of this scale. ~100 (quite old) cluster nodes with
>> 4G each or 16-32 16G nodes, connected via a low-latency interconnect (e.g.,
>> not Ethernet/TCP), were sufficient depending on the algorithm. Strong
>> scaling improvements are certainly possible up to a point if you have more
>> machines available. The large surface-to-volume ratio of grid graphs
>> should minimize contention, though in the case of shortest paths the
>> distribution of your edge weights matters quite a bit as well. CSR is
>> essential for memory efficiency but is straightforward to substitute in
>> place of an adjacency list.
>>
>> I’m not sure what qualifies as “normal” hardware but if a modest cluster
>> with Myrinet/Infiniband is reasonable than graphs of the scale you mention
>> are certainly reasonable. I suppose it’s possible to run the PBGL in one
>> of the public cloud environments, but the networking performance tends to
>> be so abysmal with most of them that I would expect significantly increased
>> runtimes.
>>
>> Hope that data-point is helpful.
>>
>> Regards,
>> Nick
>> _______________________________________________
>> Boost-users mailing list
>> Boost-users_at_[hidden]
>> http://lists.boost.org/mailman/listinfo.cgi/boost-users
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net