Boost logo

Boost Users :

Subject: Re: [Boost-users] [BGL] Upper limits on graph size
From: Adam Spargo (aws_at_[hidden])
Date: 2010-05-20 11:16:56


Hi, thanks for your reply, the fact that you have run something with two
billion vertices tells me that it's worth looking into. I will probably
come back to when I have done some runs. At the moment I'm mostly writing
the code that gets the information to build my graph.

I have 500GB RAM on a single node for prototyping, plus a cluster of 4000
nodes with 4GB to 64GB RAM each so it should be cool.

I'm still very much in the proof of concept stage. I'm one of the first to
explicitly construct the overlap graph in a genome assembly program, most
have something like it but can't really play with it before they barf the
answer. So hopefully the BGL will give be the power to explore my graph,
but like I said - early days.

Thanks again,

Adam.

--
Dr Adam Spargo
High Performance Assembly Group   email: aws_at_[hidden]
Wellcome Trust Sanger Institute   Tel: +44 (0)1223 834244 x7728
Hinxton, Cambridge CB10 1SA       Fax: +44 (0)1223 494919
On Thu, 20 May 2010, Jeremiah Willcock wrote:
> On Thu, 20 May 2010, Adam Spargo wrote:
>
>> Hi, I am working on genome assembly software and hope that the BGL can save 
>> me a lot of development time, but before I make the investment in learning 
>> the library can somebody advise me on whether it is appropriate.
>> 
>> My initial test sets will be quite small. However in the end I will want to 
>> scale up to on the order of a billion nodes, quite sparsely connected. We 
>> have the RAM and many CPUs, but will the code scale up this far?
>
> For this level of scalability, we have the Parallel BGL (mostly in 
> boost/graph/distributed and libs/graph_parallel; more info at 
> <URL:http://www.osl.iu.edu/research/pbgl/>) that runs on distributed-memory 
> systems using MPI.  We have successfully run tests up to two billion or so 
> vertices (16G undirected edges) on 96 machines (4GiB of memory each).  How 
> much RAM and how many CPUs do you have?  PBGL works on clusters or SMP 
> systems, but remember that RAM is the usual limit on how many vertices you 
> have on a single machine, not CPU speed.  How many edges do you have? 
> Directed or undirected?  How much data do you need to attach to each vertex 
> or edge?  What kinds of algorithms do you want to run?
>
> -- Jeremiah Willcock
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>
-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net