Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2005-10-09 11:44:24


I only took a very quick look at the diff file. I have a couple of
questions:

It looks like that for certain types, (C++ arrays, vector<int>, etc) we want
to use
binary_save/load to leverage on the fact the fact that we can assume in
certain
situations that storage is contiguous.

Note that there is an example in the package - demo_fast_archive which does
exactly this for C++ arrays. It could easily extended to cover any other
desired
types. I believe that using this as a basis would achieve all you desire
and more
which a much smaller investment of effort. Also it would not require
changing the
serialization library in any way.

Robert Ramey

Matthias Troyer wrote:
>> Hi Robert,
>>
>> Over the past week I got around to doing what I wanted to do for a
>> long time, and implemented an improved serialization of contiguous
>> arrays of fundamental types. The motivation was two-fold:
>>
>> i) to speed up the serialization of large data sets by factors of up
>> to 10
>> ii) to allow implementation of serialization by MPI
>>
>> The problem with the serialization of large contiguous arrays of
>> fundamental types (be it a C-array, a std::vector, a std::valarray,
>> boost::multi_array, ...) is that the serialization function is called
>> for each (of possible billions) elements, instead of only once for
>> the whole array. I have attached a suite of three benchmark programs
>> and timings I ran on a PowerbookG4 using Apple's version of the gcc-4
>> compiler. The benchmarks are
>>
>> a) vectortime: reading and writing a std::vector<double> with 10^7
>> elements to/from a file
>> b) arraytime: reading and writing 1000 arrays double[10000] to/
>> from a file
>> c) vectortime_memory: reading and writing a std::vector<double>
>> with 10^7 elements to/from a memory buffer
>>
>> The short summary of the benchmarks is that Boost.Serialization is
>> 5-10 times slower than direct reading or writing!
>>
>> With the fast array serialization modifications, discussed below,
>> this slowdown is removed. Note that the codes were compiled with -O2.
>> Without -O2 I have observed another factor of 10 in slowdown in some
>> cases
>>
>>
>> In order to implement the fast array serialization, I made the
>> following changes to the serialization library:
>>
>> i) a new traits class
>>
>> template <class Archive, class Type>
>> has_fast_array_serialization<Archive,Type>;
>>
>> which specifies whether an Archive has fast array serialization for a
>> Type. The default implementation for this traits class is false, so
>> that no change is needed for existing archives.
>>
>> ii) output archives supporting fast array serialization for a given
>> Type T provide an additional member function
>>
>> save_array(T const * address, std:;size_t length);
>>
>> to save a contiguous array of Ts, containing length elements starting
>> at the given address, and a similar function
>>
>> load_array(T * address, std:;size_t length);
>>
>> for input archives
>>
>> iii) serialization of C-arrays and std::vector<T> was changed to use
>> fast array serialization for those archives and types where it is
>> supported. I'm still working on serialization for std::valarray and
>> boost::multi_array using the same features.
>>
>> iv) in addition, to support an MPI serialization archive (which is
>> essentially done but still being tested), and to improve portability
>> of archives, I introduced a new "strong" type
>>
>> BOOST_STRONG_TYPEDEF(std::size_t, container_size_type)
>>
>> for the serialization of the size of a container. The current
>> implementation uses an unsigned int to store the size, which is
>> problematic on machines with 32-bit int but 64 bit size_type . To
>> stay compatible with old archives, the serialization into binary
>> archives converts the size to an unsigned int, but this should be
>> changed to another type, and the file version number bumped up to
>> allow containers with more than 2^32 elements.
>>
>> The second motivation was MPI serialization, for which I need the
>> size type of containers to be a type distinct from any other integer.
>> The explanation is lengthy and I will provide the reason once the MPI
>> archives are finished.
>>
>> v) also the polymporphic archives were changed, by adding save_array
>> and load_array functions. Even for archives not supporting fast array
>> serialization per se this should improve performance, since now only
>> a single virtual function call is required for arrays, instead of one
>> per element.
>>
>>
>> The modifications are on the branch tagged
>> "fast_array_serialization", and I have attached the diffs with
>> respect to the main trunk. I have performed regression tests under
>> darwin, using Apple's version of gcc-4. None of the changes should
>> lead to any incompatibility with archives written with the current
>> version of the serialization library, nor should it break any
>> existing archive implementation.
>>
>> Regarding compatibility with non-conforming compilers the only issue
>> I see is that I have used boost::enable_if to dispatch to either the
>> standard or fast array serialization. We should discuss what to do
>> for compilers that do not support SFINAE. My preferred solution would
>> be to just disable fast array serialization for these compilers, to
>> keep the changed to the code minimal. The other option would be to
>> add another level of indirection and implement the dispatch without
>> using SFINAE.
>>
>> Robert, could you take a look at the modifications, and would it be
>> possibly the merge these modifications with the main trunk once you
>> have finished your work for the 1.33.1 release?
>>
>> Best regards
>>
>> Matthias
>>
>>
>>
>>
>
>
>
>> _______________________________________________
>> Unsubscribe & other changes:
>> http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk