Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2005-11-23 09:11:25


Matthias Troyer wrote:

> Oh yes, there can be a huge difference. Let me just give a few
> reasons:
>
> 1) in the applications we talk about we have to regularly send huge
> contiguous arrays of numbers (stored e.g. in a matrix, vector,
> valarray or multi_array) over the network. The typical size is 100
> million numbers upwards. I'll stick to 100 million as a typical
> number in the following. Storing these 100 million numbers already
> takes up 800 MByte, and nearly fills the memory of the machine, and
> this causes problems:
>
> a) copying these numbers into a buffer using the serialization
> library needs another 800 MB of memory that might not be available
>
> b) creating MPI data types for each member separately mean storing
> at least 12 bytes (4 bytes each for the address, type and count), for
> a total of 1200 MBytes, instead of just 12 bytes. Again we will have
> a memory problem
>
> But the main issue is speed. Serializing 100 million numbers one by
> one, requires 100 million access to the network interface, while
> serializing the whole block at one just causes a single call, and the
> rest will be done by the hardware. The reason why we cannot
> afford this overhead is that actually on modern high performance
> networks
>
> ** the network bandwidth is the same as the memory bandwidth **

This makes sense, thank you. I just want to note that contiguous arrays of
double are handled equally well by either approach under discussion; an
mpi_archive will obviously include an overload for double[]. I was
interested in the POD case. A large array of 3x3 matrices wrapped in
matrix3x3 structs would probably be a good example that illustrates your
point (c) above. (a) and (b) can be avoided by issuing multiple MPI_Send
calls for non-optimized sequence writes.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk