Boost logo

Boost :

From: Ian McCulloch (ianmcc_at_[hidden])
Date: 2005-11-24 17:16:20

Robert Ramey wrote:

> Felipe Magno de Almeida wrote:
>> Creating a "buffered_archive" wouldnt require copying? And as Matthias
>> have already put, it is an unacceptable overhead. I'm having some
>> ideas about a 'fake_buffered'_archive.
> LOL - then then have your archive send the data whereevery you want
> it to. The point is that the 10x speed up demonstration on which this
> whole thread
> rests is based on using the current binary_?archive which (apparently)
> won't be the mechanism which will eventually be used.

? It doesn't matter what back-end buffer is used, there will always be a
substantial difference between buffering a bulk array copy versus a loop.
If that buffer is going to be written to disk, the difference doesn't
matter so much because the disk IO will be the bottleneck. But if it is
going to a fast network interface, the buffering is critical. Besides, is
the boost iostreams library really much slower than a hand-coded buffer?

Anyway, this is a side issue. The main point is:

David Abrahams wrote:
> ,----
> | For many archive formats and common datatypes there exist APIs
> | that can quickly read or write contiguous sequences of those types
> | all at once (**). Reading or writing such a sequence by
> | separately reading or writing each element (as the serialization
> | library currently does) can be an order of magnitude more
> | expensive.
> `----

The operative phrase here is "archive formats". To pick a random example,
from the netCDF users guide

The Network Common Data Form, or netCDF, is an interface to a library of
data access functions for storing and retrieving data in the form of
arrays. An array is an n-dimensional (where n is 0, 1, 2, ...) rectangular
structure containing items which all have the same data type (e.g. 8-bit
character, 32-bit integer). A scalar (simple single value) is a
0-dimensional array.

If there is to be any possibility of targetting an archive to this format,
then array support is crucial.

Similarly, the basic message passing interface in MPI is

int MPI_Send( void *buf, int count, MPI_Datatype datatype, int dest,
              int tag, MPI_Comm comm )

The 'count' argument there is the array length. Again, without array
support it is not possible to take full advantage of MPI.

Maybe you don't care about these applications, but if that is the case then
you should substantially narrow your description of the library, which
misleadingly suggests that such applications would fall within its scope:

Here, we use the term "serialization" to mean the reversible deconstruction
of an arbitrary set of C++ data structures to a sequence of bytes. Such a
system can be used to reconstitute an equivalent structure in another
program context. Depending on the context, this might used implement object
persistence, remote parameter passing or other facility. In this system we
use the term "archive" to refer to a specific rendering of this stream of
bytes. This could be a file of binary data, text data, XML, or some other
created by the user of this library.

Ian McCulloch

Boost list run by bdawes at, gregod at, cpdaniel at, john at