Boost logo

Boost :

From: Matthias Troyer (troyer_at_[hidden])
Date: 2005-11-25 02:29:24


Hi Robert,

I'll let Dave comment on the parts where you review his proposal, and
will focus on the performance.

On Nov 24, 2005, at 6:59 PM, Robert Ramey wrote:

> a) It doesn't address the root cause of "slow" performance
> of binary archives.

I have done the benchmarks you desired last night (see below), and
they indeed show that the root cause of slow performance is the
individual writing of many small elements instead of "block-writing"
of the array in a call to something like save_array.

> b) re-implemenation of binary_archive in such a way so as not
> to break existing archives would be an error prone process.
> The switching between new and old method "should" result
> in exactly the same byte sequence. But it could easily
> occur that a small subtle change might render archives
> create under the previous binary_archive unreadable.

Dave's design does not change anything in your archives or
serialization functions, but only adds an additional binary archive
using save_array and load_array.

> c) The premise that one will save a lot of coding
> (see d) above) compared to to the current method
> of overloading based on the pair of archive/type
> is overyly optimistic.

Actually I have implemented two new archive classes (MPI and XDR)
which can profit from it, and it does save lots of code duplication.
All of the serialization functions for types that can make use of
such an optimization can be shared between all these archive types.
In addition formats such as HDF5 and netCDF have been mentioned,
which can reuse the *same* serialization function to achieve optimal
performance.

There is nothing "optimistic" here since we have the actual
implementations, which show that code duplication can be avoided.

> Conclusions
> ===========
> a) The proposal suffers from "premature optimization".
> A large amount of design effort has been expended on
> areas which are likely not the source of observed
> performance bottlenecks.

As Dave pointed out one main reason for a save_array/load_array or
save_sequence/load_sequence hook is to utilize existing APIs for
serialization (including message pasing) that provide optimized
functions for arrays of contiguous data. Examples include MPI, PVM,
XDR, HDF5. There is a well established reason why all these libraries
have special functions for arrays of contiguous data, because they
all observed the same bottlenecks. These bottlenecks are well known
for decades in high performance computing, and have caused all these
APIs to include special support for contiguous arrays of data.

> b) The proposal suffers from "over generalizaton". The
> attempt to generalize results in a much more complex
> system. Such a system will result in a net loss of
> conceptual integregrity and implementation transparancey.
> The claim that this generalization will actually result in a
> reduction of code is not convincing.

I'm confused by your statement. Actually the implementations of fast
binary archives, MPI archives and XDR archives do share common
serialization functions, and this does indeed result in code
reduction and avoids code duplication.

> c) by re-implementing a currently existing and used
> archive, it risks creating a maintainence headache
> for no real benefit.

To avoid any such potential problems Dave proposed to add a new
archive in an array sub namespace. I guess that alleviates your
concerns? Also, a 10x speedup might not be a benefit for you and your
applications but as you can see from postings here, it is a concern
for many others.

> Suggestions
> ===========
>
> a) Do more work in finding the speed bottlenecks. Run
> a profiler. Make a buffer based non-stream based archive
> and re-run your tests.

I have attached a benchmark for such an archive class and ran
benchmarks for std::vector<char> serialization. Here are the numbers
(using gcc-4 on a Powerbook G4):

Time using serialization library: 13.37
Time using direct calls to save in a loop: 13.12
Time using direct call to save_array: 0.4

In this case the buffer had size 0 at first and needed to be resized
during the insertions. Here are numbers for the case where enough
memory has been reserved():

Time using serialization library: 12.61
Time using direct calls to save in a loop: 12.31
Time using direct call to save_array: 0.35

And here are the numbers for std::vector<double>, sing a vector of
1/8-th the size:

Time using serialization library: 1.95
Time using direct calls to save in a loop: 1.93
Time using direct call to save_array: 0.37

Since there are fewer calls for these larger types it looks slightly
better, but even now there is a more than 5x difference in this
benchmark.

As you can see the overhead of the serialization library (less than
2%) is insignificant compared to the cost of doing lots of individual
insertion operations into the buffer instead of one big one. The
bottleneck is thus clearly the many calls to save() instead of a
single call to save_array().

> b) Make your MPI, XDR and whatever archives. Determine
> how much opportunity for code sharing is really
> available.

This has been done and is the reason for the proposal to introduce
something like the save_array/load_array functions. I have coded an
XDR and two different types of MPI archives (one using a buffer,
another not using a buffer). A single serialization function for
std::valarray, using the load_array hook, to use the optimized APIs
in MPI and XDR, as well as a faster binary archive, and the same is
true for other types.

> c) If you still believe your proposal has merit, make
> your own "optimized binary archive". Don't derive
> from binary_archive but rather from common_?archive
> or perhaps basic_binary_archive. In this way you
> will have a totally free hand and won't have to
> achieve consensus with the rest of us which will
> save us all a huge amount of time.

I'm confused. I realize that one should not derive from
binary_iarchive, but why should one not derive from
binary_iarchive_impl?

Also, following Dave's proposal none of your archives is touched, but
instead additional faster ones are provided.

Matthias




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk