Boost logo

Boost Users :

Subject: Re: [Boost-users] CSR graph and Dijkstra's to lower graph memory
From: Jeremiah Willcock (jewillco_at_[hidden])
Date: 2012-10-08 18:53:55


On Mon, 8 Oct 2012, Jeremy Kawahara wrote:

> Hello,
>
> On Mon, Oct 8, 2012 at 1:55 PM, Jeremiah Willcock <jewillco_at_[hidden]> wrote:
> On Mon, 8 Oct 2012, Jeremy Kawahara wrote:
>
> It seems to me that we need to store the following arrays: (here I assume that the total
> number of vertices is less than 2^32 (4 bytes) but the total number
> of edges will likely be greater than 2^32 (8 bytes).
>
> An "edge array" that stores the vertex indices.
> 2 vertices per edge at 4 bytes each
>
>
> Usually (for the directed case where incoming edges aren't stored) there is only one vertex (the target)
> stored per edge.
>
>
>
> I see. I was working from Leo Hidd's earlier example where we create a list of edges...
> *ptr_edge_list++        = Edge(i_neigh, i);
>
> and from the CSR documentation:
> E the_edges[] = { E(0, 1), E(0, 2), ...
>  
> I'm not quite sure how to change the edge list to store only the target vertex but I'll look into this some more as
> this would result in quite a bit more space savings.

I meant that the CSR graph representation stores only targets explicitly,
not that your input data should do that.

> Right, so I think this would be the memory needed to store the "edge cost array", which would equal
> the_number_of_edges x 4 bytes.

Yes.

> If so, I'm wondering if there is a way we can somehow lower this memory usage where we only
> store these structures once? I looked at using the
> "edges_are_unsorted_multi_pass" parameter but the memory differences did not seem to make a
> significant difference.
>
>
> There is an in-place version (construct_inplace_from_sources_and_targets_t) that saves memory by
> requiring particular input layouts, but it is also slower.  You should always use the _multi_pass
> version when you can since the other one does a copy of the data first.
>
>
>
> Alright thanks! The construct_inplace_from _sources_and_targets_t sounds like it might do the trick.
>
>
> How much more memory are you looking to save?  Are you hitting an out-of-memory condition on graphs
> above a certain size?
>
>
> Yes on some of my graphs I'm running out of memory. I'm really looking at getting it down to as low as possible
> since the lower I get it the more data I can run my graphs on. For example, one graph this crashed on had:
>
> 37,044,000 nodes
> 6,328,936,480 edges
>
> I thought that if I can use the same memory to load the data and create the CSR, then I would approximately half my
> memory requirements. I think if I find this not be enough savings, then I am probably hitting some theoretical
> limits and would have to look at taking a different approach.

How are you generating/loading the data? Can you get the data in sorted
order by source? The best you can do without pre-sorting the data is 12
bytes per edge + 8 bytes per vertex (for in-place construction, you need
the lists of sources, targets, and edge properties, the second of which
will be destroyed after graph construction, plus the temporary array of
edge indices created by the CSR construction algorithm).

-- Jeremiah Willcock


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net