Boost logo

Boost :

From: Ian McCulloch (ianmcc_at_[hidden])
Date: 2005-11-25 13:33:22


Robert Ramey wrote:

> Matthias Troyer wrote:
>>> Dave's design does not change anything in your archives or
>>> serialization functions, but only adds an additional binary archive
>>> using save_array and load_array.
>
> Hmm - that's not the way I read it. I've touched on this in another post.

As already explained elsewhere, you mis-read it. The archive was in a
sub-namespace. Perhaps it would have been clearer if Dave had used a
completely different namespace name, boost::array_serialization_extensions,
or boost::not_the_serialization_namespace ?.

>
>>>> c) The premise that one will save a lot of coding
>>>> (see d) above) compared to to the current method
>>>> of overloading based on the pair of archive/type
>>>> is overyly optimistic.
>>>
>>> Actually I have implemented two new archive classes (MPI and XDR)
>>> which can profit from it, and it does save lots of code duplication.
>>> All of the serialization functions for types that can make use of
>>> such an optimization can be shared between all these archive types.
>>> In addition formats such as HDF5 and netCDF have been mentioned,
>>> which can reuse the *same* serialization function to achieve optimal
>>> performance.
>>>
>>> There is nothing "optimistic" here since we have the actual
>>> implementations, which show that code duplication can be avoided.
>
> OK - I can really only comment on that which I've seen.

Are we talking at cross-purposes here? Matthias is talking about sharing
*serialization* functions. That is, for each data type, there is only
*one* serialization function that calls load/save_array (or whatever the
array hook function is called...).

You seem to be disputing the code duplication issue by saying that different
*archives* will not typically(*) be able to share implementations of array
processing. This I completely agree with. But that is a completely
separate to the number of *serialization* functions that need to be
written.

Matthias, it might help if you show an example of a serialization function
for some vector type, and the implementation of the array processing for
the MPI and XDR archives, do demonstrate the orthogonality of the
serialization vs archive ideas.

(*) of course there are some counter-examples. That is the idea for
deriving one archive from another, is it not?

[snip]

>>> As you can see the overhead of the serialization library (less than
>>> 2%) is insignificant compared to the cost of doing lots of individual
>>> insertion operations into the buffer instead of one big one. The
>>> bottleneck is thus clearly the many calls to save() instead of a
>>> single call to save_array().
>
> Well, this is interesting data. the call to save() resolves inline to
> a call to std::vector get element and stuffing the value into the buffer.
> I wonder how much of this in std::vector and how much is in
> the save to the buffer?.

As described here
http://lists.boost.org/Archives/boost/2005/11/97156.php
the effect of using a custom buffer versus a buffer based around
vector::push_back is exactly a factor 2, irrespective of cache effects.
Matthias' benchmark showed that the time taken to serialize an array into a
vector buffer is almost the same as the time taken to push_back the array
in a loop (ie. the serialization library itself introduces negligable
overhead in this case). Thus, a serialization archive based on the same
buffer I used in my benchmark should achieve the same factor 2 speedup.

Note that the speedup using save_array was of the order of 30, so that, even
with a factor 2 speedup from using an optimized buffer, save_array would
still be 15 times faster! (This is using the first set of data. Using the
set for small arrays would only give a modest factor 3x improvement for
save_array versus a cusom buffer archive).

Cheers,
Ian


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk