Boost logo

Boost Users :

Subject: Re: [Boost-users] CSR graph and Dijkstra's to lower graph memory
From: Jeremiah Willcock (jewillco_at_[hidden])
Date: 2012-10-08 16:55:35


On Mon, 8 Oct 2012, Jeremy Kawahara wrote:

> Hello,
>
> Sorry for such a late response - I was pulled to some other tasks. The suggestions to lower the needed memory helped immensely. Thank you :) I am wondering
> if there are still more optimizations I could look at to further lower the required memory.
>
> A better description can be found below...
>
> On Sun, Sep 23, 2012 at 12:29 PM, Leo Hidd <leohidd_at_[hidden]> wrote:
> and, since you are in a cluster, you could also try the distributed (parallel) version of SSSP. It will probably boost your running time.
>
>
>
> Yes thanks! This is a good point. I might try this later once I get all the space issues sorted out.
>
>  
> About your memory usage, let's see. You have ~580,000,000 edges, 2 fixed point of 4 bytes (vertex indices) plus one floating point of 8 bytes
> (edge cost) per edge, totalizing  ~9.3 GB. If your graph is undirected, but if you use a directed CSR, than you multiply this by 2 (to include
> both senses), totalizing  ~19GB. Now if your 35GB takes into account only the Boost graph size, it would seem a little too much, however if you
> also consider your own structure to load your information on RAM, it is just fine.
>
>
>
> Sorry I made a bit of an error in my original calculations here. However I'm curious about your estimate of the amount of needed space.
>
> It seems to me that we need to store the following arrays: (here I assume that the total number of vertices is less than 2^32 (4 bytes) but the total number
> of edges will likely be greater than 2^32 (8 bytes).
>
> An "edge array" that stores the vertex indices.
> 2 vertices per edge at 4 bytes each

Usually (for the directed case where incoming edges aren't stored) there
is only one vertex (the target) stored per edge.

> An "edge cost" array the stores the cost at each edge.
> 1 edge cost per edge at 4 bytes (say we use a float) each

Yes.

> These arguments are than passed to the CSR constructor. I THINK the CSR then sorts and copies these values into an internal structure (I'm basing this off
> the documentation which states that "The CSR format stores vertices and edges in separate arrays" - but perhaps I misunderstood it).

The CSR graph takes roughly (nvertices + 1) * sizeof(EdgeIndex) + nedges *
sizeof(VertexIndex) bytes for a directed graph, excluding properties.

> So I think we need to have enough memory to store the "edge array", the "edge cost array" AND the newly created CSR (the_number_of_edges x 4 bytes +
> the_number_of_nodes x 8 bytes) all at the same time.
>
> Is my understanding of this correct?

Yes, except that you need another copy of the edge properties that are
sorted along with the edges.

> If so, I'm wondering if there is a way we can somehow lower this memory usage where we only store these structures once? I looked at using the
> "edges_are_unsorted_multi_pass" parameter but the memory differences did not seem to make a significant difference.

There is an in-place version
(construct_inplace_from_sources_and_targets_t) that saves memory by
requiring particular input layouts, but it is also slower. You should
always use the _multi_pass version when you can since the other one does a
copy of the data first.

How much more memory are you looking to save? Are you hitting an
out-of-memory condition on graphs above a certain size?

> Le 23/09/2012 18:23, Jeremiah Willcock a écrit :
> If you know that you are going to have fewer than 2^32 vertices and/or 2^32 edges, you can also turn down the sizes of the integers
> used as indices in the graph's representation; see the documentation for details. On a 64-bit machine, they both default to 64 bits,
> but you can save a lot of storage by reducing their sizes.
>
>
>
> Thanks! This did help quite a bit with the space requirements.
>
> For the reference of others, I added the last "unsigned int" parameter to the call below since I expect the number of nodes to be less than 2^32 but the
> number of edges will likely be more.
>
> typedef compressed_sparse_row_graph<directedS, void, Edge_Cost,boost::no_property, unsigned int> graph_t;

Yes, that is how you want to do it in that case. You might want to use
uint32_t instead to get more control over the exact data size.

-- Jeremiah Willcock


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net