|
Boost : |
Subject: Re: [boost] space complexity of push_relabel_max_flow
From: Jeremiah Willcock (jewillco_at_[hidden])
Date: 2010-04-13 22:31:17
On Tue, 13 Apr 2010, Dan Jiang wrote:
>
> Thanks for the reply. Please see my reply inline.
>
>> Date: Tue, 13 Apr 2010 12:53:45 -0400
>> From: jewillco_at_[hidden]
>> To: boost_at_[hidden]
>> Subject: Re: [boost] space complexity of push_relabel_max_flow
>>
>> On Tue, 13 Apr 2010, Dan Jiang wrote:
>>
>>> Hi All, I am a new BOOST user and like it very much. I have a question
>>> regarding space complexity of push_relabel_max_flow. The
>>> push_relabel_max_flow is very efficient compared to Edmond's and Karp.
>>> In my application, I need to deal with half a million nodes and half a
>>> billion arcs. It requires lots of ram (>= 6gb). Two questions:
>>
>>> 1) How do I find out space used per node and per arc?
>>
>> I do not know off hand for adjacency_list; it is probably reasonable (one
>> integer per edge, some per vertex, plus properties). However, remember
>> that dynamically allocated vectors can have up to a 100% space overhead.
>> I would recommend using compressed_sparse_row_graph and then changing the
>> EdgeIndex and Vertex types to uint32_t (since you have fewer than 2^32 of
>> each). That will use 4 * (nedges + nvertices + 1) bytes of space, plus
>> properties; accesses will also likely be faster than for adjacency_list.
>
> I'll try that later on.
OK. I saw your other email about that -- I'll answer it soon.
>>> 2) My network does not need any reverse edge (currently I set the
>>> capacity of reverse edge to 0), is there anyway to not supply reverse
>>> edges to save space?
>>
>> It does not appear so, because of the residual capacities that need to be
>> stored. If you can find an algorithm description that doesn't need
>> reverse edges, though, it might be possible to change the algorithm to
>> remove them.
>
> I just read the push_relabel_max_flow algorithm, it requires reverse edge list and their capacities.
It actually doesn't need the edges AFAICT. It just uses them to look
elements up in property maps, so it might be possible to work around that
requirement. However, there are other memory uses in the algorithm that
are likely to be bigger.
>>> 3) Does push_relabel_algorithm dynamically allocates lots of memory
>>> (besides the ram required by the input graph, i.e., adjacency_list)?
>>
>> It looks like there is a bunch of that. I see six vectors and a queue in
>> there, and one of those vectors is a vector of std::lists.
>>
>>> 4) Is it possible to implement custom file-based edge lists so that I
>>> could explicitly page to disk when not enough ram is available? How
>>> slower would it be?
>>
>> It would probably be quite a bit slower. If you need to go beyond your
>> system's memory, why not let the OS's swap mechanism do the disk accesses?
>> That would take care of all of the caching and such that you would need,
>> and I'm not sure writing your own would provide much benefit.
> I tried OS paging on 64-bit OS using my application. The performance
> decreases from 100 to 1000times and plus the paging happens only when
> available physical ram runs out. I'd like to be able to do partial
> paging on demand. I am looking into using STLdb to hold capacity
> properties, one for forward edge, another for reverse edge. But saving
> is just 2*4*NumOfArcs (I am using float for capacity), which might not
> be dramatic saving...
I'm surprised OS paging was so bad, but that can happen and the algorithm
doesn't seem to use a lot of locality between related data structures.
The 2*4*nedges thing is probably not worth doing too much work to get yet.
>> How much memory do you have, and what type are you using for your edge
>> capacities? I have some ideas for how to shrink things, but I would first
>> like to see a baseline of how bad things are now. If you switch to
>> compressed_sparse_row_graph and try to run the algorithm (possibly on
>> scaled-down graphs), how much memory does it use? Running that for a few
>> data sizes might give an idea of how much actual space per vertex and edge
>> the algorithm uses.
> I have 4gb physical ram, usually with 3.2gb free. I use float for edge
> capacity. I'll look into compressed_sparse_row_graph and see if there is
> much saving.
It will help you with the graph structure information and maybe the
properties. Let's see what the actual usage is now and how that compares
to what you have space for.
>> BTW, is there any regular structure in your graph? For example, is it a
>> grid graph like some computer vision algorithms use?
>
> My application is for mining. I construct a network from voxel model
> which is typically of size 200 * 200 * 125. For each "chosen" voxel (a
> voxel deemed to be used as a node in a graph), I create about 100 arcs
> to other relevant voxels. I my test case, I have 0.5million voxels and
> hence 500mil arcs. I need to find max flow from my source voxel to
> destination voxel. If I use STLdb, I can decide at run time when to page
> all property maps used in the push_relabel_max_flow algorithm. vertex
> and edge lists are not to be paged as STDdb supports only maps.
How do you define "relevant"? Are they local in terms of the grid of
voxels? Is there any regular structure at all, such as linking to all
neighbors within a certain distance that have a certain weight or
something? What are you able to say about the conditions for which edges
are put in? Can any of the data used to create edges be reused for
computing capacities?
-- Jeremiah Willcock
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk