Boost logo

Boost :

Subject: Re: [boost] Parallel Tree Merge
From: Biddiscombe, John A. (biddisco_at_[hidden])
Date: 2009-11-03 08:18:33


Nick

I have a series of objects which are nominally represented in as XML nodes (libxmls tree structures). They actually represent a hierarchy of File/group/dataset elements within an HDF5 file. The precise data would be something along the lines of this

On P0
Filename/Process_0/Group_0/dataset_0
Filename/Process_0/Group_0/dataset_1
Filename/Process_0/Group_0/dataset_2
...on P1
Filename/Process_1/Group_0/dataset_0
Filename/Process_1/Group_1/dataset_0
Filename/Process_1/Group_1/dataset_1

etc etc. The tree structures are stored on each process given by the process_N tag, but because parallel IO using HDF5 requires collective creation of groups and datasets, I need to do an AllGather operation of the 'descriptions' of the groups and datasets I wish to create. An important note here is that the datasets may contain large amounts of data, but I only need to gather the descriptions of them. Along the lines of, datatype, number of items, name and path, maybe some simple hyperslab or extents details. So each leaf of the above diagram would have a small number of members which could be in a struct, but the paths need to be merged so that all processes can traverse the tree in a synchronized manner and create the full structure in parallel.

I'd like to have a struct/class template which stores my descriptions, then have a tree object which allows me to add these objects as leaf nodes and specify the paths, then do an allgather operation of all the sub-trees. All nodes have their own data (which they keep hold of and write to disk), but they also have received descriptions of data held on other processes and can participate in the collective operations.

What I am really hoping for is
Iterate over data items
  Add item to my local tree
End
Exchange trees
Iterate over data items
  Collective create of all groups and datasets (synchronously)
  Write data I own (asynchronously)
End

It should be reasonably straightforward to implement, but I am hoping that PBGL might have something that already does most of what I'm after. I don't quite know where to start as I have not used BGL in anger - just looked at a few examples.

Thanks for any help or advice.

JB

> I'm unclear exactly what your merge operation/algorithm is (possibly
> because I never use XML which probably provides adequate context for
> other people). The p(BGL) graph classes should work fine for
> representing your tree. If you describe your algorithmic needs in
> more detail I can suggest whether PBGL or BGL with a bit of MPI for
> the data movement is more appropriate. I presume the total structure
> is small enough to represent on a single node since you seem to be
> doing an all_gather?
>
> Thanks,
> Nick
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk