Boost logo

Boost Users :

Subject: Re: [Boost-users] CSR graph and Dijkstra's to lower graph memory
From: Jeremy Kawahara (jeremy.kawahara_at_[hidden])
Date: 2012-10-08 18:43:54


Hello,

On Mon, Oct 8, 2012 at 1:55 PM, Jeremiah Willcock <jewillco_at_[hidden]>wrote:

> On Mon, 8 Oct 2012, Jeremy Kawahara wrote:
>
> It seems to me that we need to store the following arrays: (here I assume
>> that the total number of vertices is less than 2^32 (4 bytes) but the total
>> number
>> of edges will likely be greater than 2^32 (8 bytes).
>>
>> An "edge array" that stores the vertex indices.
>> 2 vertices per edge at 4 bytes each
>>
>
> Usually (for the directed case where incoming edges aren't stored) there
> is only one vertex (the target) stored per edge.
>

I see. I was working from Leo Hidd's earlier example where we create a list
of edges...
*ptr_edge_list++ = Edge(i_neigh, i);

and from the CSR
documentation<http://www.boost.org/doc/libs/1_51_0/libs/graph/example/csr-example.cpp>
:
E the_edges[] = { E(0, 1), E(0, 2), ...

I'm not quite sure how to change the edge list to store only the target
vertex but I'll look into this some more as this would result in quite a
bit more space savings.

> An "edge cost" array the stores the cost at each edge.
>> 1 edge cost per edge at 4 bytes (say we use a float) each
>>
>
> Yes.
>
> These arguments are than passed to the CSR constructor. I THINK the CSR
>> then sorts and copies these values into an internal structure (I'm basing
>> this off
>> the documentation which states that "The CSR format stores vertices and
>> edges in separate arrays" - but perhaps I misunderstood it).
>>
>
> The CSR graph takes roughly (nvertices + 1) * sizeof(EdgeIndex) + nedges *
> sizeof(VertexIndex) bytes for a directed graph, excluding properties.
>
> So I think we need to have enough memory to store the "edge array", the
>> "edge cost array" AND the newly created CSR (the_number_of_edges x 4 bytes +
>> the_number_of_nodes x 8 bytes) all at the same time.
>>
>> Is my understanding of this correct?
>>
>
> Yes, except that you need another copy of the edge properties that are
> sorted along with the edges.
>

Right, so I think this would be the memory needed to store the "edge cost
array", which would equal the_number_of_edges x 4 bytes.

>
> If so, I'm wondering if there is a way we can somehow lower this memory
>> usage where we only store these structures once? I looked at using the
>> "edges_are_unsorted_multi_**pass" parameter but the memory differences
>> did not seem to make a significant difference.
>>
>
> There is an in-place version (construct_inplace_from_**sources_and_targets_t)
> that saves memory by requiring particular input layouts, but it is also
> slower. You should always use the _multi_pass version when you can since
> the other one does a copy of the data first.
>

Alright thanks! The construct_inplace_from _sources_and_targets_t sounds
like it might do the trick.

> How much more memory are you looking to save? Are you hitting an
> out-of-memory condition on graphs above a certain size?
>

Yes on some of my graphs I'm running out of memory. I'm really looking at
getting it down to as low as possible since the lower I get it the more
data I can run my graphs on. For example, one graph this crashed on had:

37,044,000 nodes
6,328,936,480 edges

I thought that if I can use the same memory to load the data and create the
CSR, then I would approximately half my memory requirements. I think if I
find this not be enough savings, then I am probably hitting some
theoretical limits and would have to look at taking a different approach.

For the reference of others, I added the last "unsigned int" parameter to
>> the call below since I expect the number of nodes to be less than 2^32 but
>> the
>> number of edges will likely be more.
>>
>> typedef compressed_sparse_row_graph<**directedS, void,
>> Edge_Cost,boost::no_property, unsigned int> graph_t;
>>
>
> Yes, that is how you want to do it in that case. You might want to use
> uint32_t instead to get more control over the exact data size.
>

Alright thanks!
- Jeremy



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net