Boost logo

Boost Users :

Subject: Re: [Boost-users] CSR graph and Dijkstra's to lower graph memory
From: Jeremy Kawahara (jeremy.kawahara_at_[hidden])
Date: 2012-10-08 16:40:27


Hello,

Sorry for such a late response - I was pulled to some other tasks. The
suggestions to lower the needed memory helped immensely. Thank you :) I am
wondering if there are still more optimizations I could look at to further
lower the required memory.

A better description can be found below...

On Sun, Sep 23, 2012 at 12:29 PM, Leo Hidd <leohidd_at_[hidden]> wrote:

> and, since you are in a cluster, you could also try the distributed
> (parallel) version of SSSP. It will probably boost your running time.
>

Yes thanks! This is a good point. I might try this later once I get all the
space issues sorted out.

> About your memory usage, let's see. You have ~580,000,000 edges, 2 fixed
> point of 4 bytes (vertex indices) plus one floating point of 8 bytes (edge
> cost) per edge, totalizing ~9.3 GB. If your graph is undirected, but if
> you use a directed CSR, than you multiply this by 2 (to include both
> senses), totalizing ~19GB. Now if your 35GB takes into account only the
> Boost graph size, it would seem a little too much, however if you also
> consider your own structure to load your information on RAM, it is just
> fine.
>

Sorry I made a bit of an error in my original calculations here. However
I'm curious about your estimate of the amount of needed space.

It seems to me that we need to store the following arrays: (here I assume
that the total number of vertices is less than 2^32 (4 bytes) but the total
number of edges will likely be greater than 2^32 (8 bytes).

An "edge array" that stores the vertex indices.
2 vertices per edge at 4 bytes each

An "edge cost" array the stores the cost at each edge.
1 edge cost per edge at 4 bytes (say we use a float) each

These arguments are than passed to the CSR constructor. I THINK the CSR
then sorts and copies these values into an internal structure (I'm basing
this off the documentation<http://www.boost.org/doc/libs/1_51_0/libs/graph/doc/compressed_sparse_row.html>which
states that "The CSR format stores vertices and edges in separate
arrays" - but perhaps I misunderstood it).

So I think we need to have enough memory to store the "edge array", the
"edge cost array" AND the newly created CSR (the_number_of_edges x 4 bytes
+ the_number_of_nodes x 8 bytes) all at the same time.

Is my understanding of this correct?

If so, I'm wondering if there is a way we can somehow lower this memory
usage where we only store these structures once? I looked at using the
"edges_are_unsorted_multi_pass" parameter but the memory differences did
not seem to make a significant difference.

> Le 23/09/2012 18:23, Jeremiah Willcock a écrit :
>
>> If you know that you are going to have fewer than 2^32 vertices and/or
>> 2^32 edges, you can also turn down the sizes of the integers used as
>> indices in the graph's representation; see the documentation for details.
>> On a 64-bit machine, they both default to 64 bits, but you can save a lot
>> of storage by reducing their sizes.
>>
>

Thanks! This did help quite a bit with the space requirements.

For the reference of others, I added the last "unsigned int" parameter to
the call below since I expect the number of nodes to be less than 2^32 but
the number of edges will likely be more.

typedef compressed_sparse_row_graph<directedS, void,
Edge_Cost,boost::no_property, unsigned int> graph_t;

-- Jeremy



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net