Subject: Re: [boost] space complexity of push_relabel_max_flow
From: Dan Jiang (danjiang_at_[hidden])
Date: 2010-04-13 15:37:37
Thanks for the reply. Please see my reply inline.
> Date: Tue, 13 Apr 2010 12:53:45 -0400
> From: jewillco_at_[hidden]
> To: boost_at_[hidden]
> Subject: Re: [boost] space complexity of push_relabel_max_flow
> On Tue, 13 Apr 2010, Dan Jiang wrote:
> > Hi All, I am a new BOOST user and like it very much. I have a question
> > regarding space complexity of push_relabel_max_flow. The
> > push_relabel_max_flow is very efficient compared to Edmond's and Karp.
> > In my application, I need to deal with half a million nodes and half a
> > billion arcs. It requires lots of ram (>= 6gb). Two questions:
> > 1) How do I find out space used per node and per arc?
> I do not know off hand for adjacency_list; it is probably reasonable (one
> integer per edge, some per vertex, plus properties). However, remember
> that dynamically allocated vectors can have up to a 100% space overhead.
> I would recommend using compressed_sparse_row_graph and then changing the
> EdgeIndex and Vertex types to uint32_t (since you have fewer than 2^32 of
> each). That will use 4 * (nedges + nvertices + 1) bytes of space, plus
> properties; accesses will also likely be faster than for adjacency_list.
I'll try that later on.
> > 2) My network does not need any reverse edge (currently I set the
> > capacity of reverse edge to 0), is there anyway to not supply reverse
> > edges to save space?
> It does not appear so, because of the residual capacities that need to be
> stored. If you can find an algorithm description that doesn't need
> reverse edges, though, it might be possible to change the algorithm to
> remove them.
I just read the push_relabel_max_flow algorithm, it requires reverse edge list and their capacities.
> > 3) Does push_relabel_algorithm dynamically allocates lots of memory
> > (besides the ram required by the input graph, i.e., adjacency_list)?
> It looks like there is a bunch of that. I see six vectors and a queue in
> there, and one of those vectors is a vector of std::lists.
> > 4) Is it possible to implement custom file-based edge lists so that I
> > could explicitly page to disk when not enough ram is available? How
> > slower would it be?
> It would probably be quite a bit slower. If you need to go beyond your
> system's memory, why not let the OS's swap mechanism do the disk accesses?
> That would take care of all of the caching and such that you would need,
> and I'm not sure writing your own would provide much benefit.
I tried OS paging on 64-bit OS using my application. The performance decreases from 100 to 1000times and plus the paging happens only when available physical ram runs out. I'd like to be able to do partial paging on demand. I am looking into using STLdb to hold capacity properties, one for forward edge, another for reverse edge. But saving is just 2*4*NumOfArcs (I am using float for capacity), which might not be dramatic saving...
> How much memory do you have, and what type are you using for your edge
> capacities? I have some ideas for how to shrink things, but I would first
> like to see a baseline of how bad things are now. If you switch to
> compressed_sparse_row_graph and try to run the algorithm (possibly on
> scaled-down graphs), how much memory does it use? Running that for a few
> data sizes might give an idea of how much actual space per vertex and edge
> the algorithm uses.
I have 4gb physical ram, usually with 3.2gb free. I use float for edge capacity. I'll look into compressed_sparse_row_graph and see if there is much saving.
> BTW, is there any regular structure in your graph? For example, is it a
> grid graph like some computer vision algorithms use?
My application is for mining. I construct a network from voxel model which is typically of size 200 * 200 * 125. For each "chosen" voxel (a voxel deemed to be used as a node in a graph), I create about 100 arcs to other relevant voxels. I my test case, I have 0.5million voxels and hence 500mil arcs. I need to find max flow from my source voxel to destination voxel.
If I use STLdb, I can decide at run time when to page all property maps used in the push_relabel_max_flow algorithm. vertex and edge lists are not to be paged as STDdb supports only maps.
-Your insights would be greatly appreciated.
Videos that have everyone talking! Now also in HD!
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk