Hello,
On Mon, 8 Oct 2012, Jeremy Kawahara wrote:
It seems to me that we need to store the following arrays: (here I assume that the total number of vertices is less than 2^32 (4 bytes) but the total number
of edges will likely be greater than 2^32 (8 bytes).
An "edge array" that stores the vertex indices.
2 vertices per edge at 4 bytes each
Usually (for the directed case where incoming edges aren't stored) there is only one vertex (the target) stored per edge.
An "edge cost" array the stores the cost at each edge.
1 edge cost per edge at 4 bytes (say we use a float) each
Yes.
These arguments are than passed to the CSR constructor. I THINK the CSR then sorts and copies these values into an internal structure (I'm basing this off
the documentation which states that "The CSR format stores vertices and edges in separate arrays" - but perhaps I misunderstood it).
The CSR graph takes roughly (nvertices + 1) * sizeof(EdgeIndex) + nedges * sizeof(VertexIndex) bytes for a directed graph, excluding properties.
So I think we need to have enough memory to store the "edge array", the "edge cost array" AND the newly created CSR (the_number_of_edges x 4 bytes +
the_number_of_nodes x 8 bytes) all at the same time.
Is my understanding of this correct?
Yes, except that you need another copy of the edge properties that are sorted along with the edges.
If so, I'm wondering if there is a way we can somehow lower this memory usage where we only store these structures once? I looked at using the
"edges_are_unsorted_multi_pass" parameter but the memory differences did not seem to make a significant difference.
There is an in-place version (construct_inplace_from_sources_and_targets_t) that saves memory by requiring particular input layouts, but it is also slower. You should always use the _multi_pass version when you can since the other one does a copy of the data first.
How much more memory are you looking to save? Are you hitting an out-of-memory condition on graphs above a certain size?
For the reference of others, I added the last "unsigned int" parameter to the call below since I expect the number of nodes to be less than 2^32 but the
number of edges will likely be more.
typedef compressed_sparse_row_graph<directedS, void, Edge_Cost,boost::no_property, unsigned int> graph_t;
Yes, that is how you want to do it in that case. You might want to use uint32_t instead to get more control over the exact data size.