Boost :

Date view	Thread view	Subject view	Author view

From: Matthias Troyer (troyer_at_[hidden])
Date: 2005-11-13 03:03:24

Next message: Matthias Troyer: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Previous message: Gennadiy Rozental: "Re: [boost] Boost Test - yet more mostly spurious warnings."
In reply to: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Next in thread: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Reply: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Reply: Paul A Bristow: "Re: [boost] [serialization] fast array serialization (10x speedup)"

On Nov 12, 2005, at 9:33 PM, Robert Ramey wrote:

> I've been perusing the files you checked, your example, and this list.
>
> Summary
> =======
> First of all, a little more complete narrative description as to what
> the submission was intended to acommplish and how it would change
> the way the user uses the library would have been helpful. I'm going
> to summarize here what I think I understand about this. Please
> correct
> me if I get something wrong.
>
> a) a new trait is created.
>
> template <class Archive, class Type>
> struct has_fast_array_serialization
> : public mpl::bool_<false>
> {};

Yes, I wrote that in my original e-mail
>
>
> b) new functions save_array and load_array are implemented in those
> archives which have the above trait set to true. In this case the
> following is added to the binary_iarchive.hpp file. The effect
> is that this trait will return true when a fundamental type is
> to be saved/loaded to a binary_iarchive.
>
> // specialize has_fast_array_serialization
> // the binary archive provides fast array serialization for all
> fundamental types
> template <class Type>
> struct has_fast_array_serialization<binary_iarchive,Type>
> : public is_fundamental<Type>
> {};

This is just the example for binary archives. The set of types for
which direct serialization of arrays is possible is different from
archive to archive. E.g. MPI archives support array serialization for
all PODs that are not pointers and do not contain pointer members.
>
>
> Some Observations
> =================
> Inmediatly the following come to mind.
>
> a) I'm not sure about the portability of enable_if. Would this not
> break
> the whole serialization system for those compilers which don't
> support it?

I mentioned this issue in my initial e-mail, and if there are
compilers that are supported by the serialization library but do not
support enable_if, we can replace it by tag dispatching.

> b) what is the the point of save_array? why not just invoke
> save_binary
> directly?

Because we might want to do different things than save_binary. Look
back at the thread. I gave four different examples.
>
> c) The same could be said for built arrays - just invoke save_binary

same as above.
>
> d) There is no provision for NVP in the non-binary version above while
> in the binary version there is NVP around count. Presumably, these
> are oversights.

The count is not saved by save_array, but separately, and there the
same code as in your version is used. Hence, the count is also stored
as an NVP.

>
> e) The whole thing isn't obvious and its hard to follow. It couples
> the implementation code in i/o serializer.hpp to a specific kind of
> archive
> adding another dimension to be considered while understanding this
> thing.

The real problem is that you implement the serialization of arrays in
i/o serializer.hpp now. That's why I patched it there. The best
solution would be to move array serialization to a separate header.
>
> f) What about bitwise serializble types which aren't fundamental?
> That is
> structures which don't have things like pointers in them. They have
> the
> same opportunity but aren't addressed. If this is a good idea for
> fundamental
> types, someone is going to want to do them as well - which would
> open up
> some
> new problems.

I mentioned above that this is just what we do for MPI archives now.
This mechanism can easily be extended to binary archives: First you
introduce a new traits class

template <class Type>
struct is_bitwise_serializable
: public is_fundamental< Type >
{};

and then use this traits in the definition of

template <class Type>
    struct has_fast_array_serialization<binary_iarchive,Type>
      : public is_bitwise_serializable <Type>
    {};

>
> g) I don't see endian-ness addressed anywhere. I believe that
> protocols
> such as XDR and MPI are designed to transmit binary data between
> heterogenious machines. Suppose I save an array of ints as a
> sequence of
> raw
> bits on an intel type machine. Then I use load_binary to reload the
> same
> seqence
> of bits into sparc based machine. I won't get back the same data
> values.
> So either either the method will have to be limited to collections
> of bytes
> or some extra machinery would have to be added to conditionally to the
> endian translation depending on the source/target machine match/
> mismatch.

That's is EXACTLY the reason why I propose to call save_array instead
of save_binary. In a portable binary archve, save_array and
load_array will take care of the endianness issue. XDR, CDR, MPI,
PVM, HDF and other libraries do it just like that.

>
> f) Similar issues confront bitwise serialization of floats and
> doubles. I
> believe
> the "canonical" format for floats/doubles is ieee 80 bit. (I think
> that's
> what
> XDR uses - I could be wrong.) I believe that many machines store
> floats as
> 32 bit word and doubles as 64 bit words. I doubt they all are
> guarenteed
> to have the same format as far as exponent, sign and representation of
> value.
> So that's something else to be addressed. Of course endian-ness
> plays into
> this
> as well.

Same answer as above. IEEE has 32 and 64 bit floating point types,
and they are used also by XDR and CDR. As far as I know the 80 bit
type is an Intel extension. Again you see that save_binary and
load_binary will not do the trick. That's why we need save_array and
load_array.

> g) I looked at the "benchmark" results. I notice that they are run
> with -O2
> on the gcc compiler. Documentation for the gcc compiler command line
> specifies
> that this optimization level should does not enable automatic
> inlining for
> small functions. This is a crucial optimization to be effective in
> the
> serialization library. The library is written with the view that
> compilers
> will collapse inline code when possible. But this happens only in
> the gcc
> compiler when the -O3 optimization switch is used. Furthermore,
> with this
> compiler,
> it might be necessary to also specify max-inline-insns-recursive-auto
> switch.
> to gain maximum performance on boost type code. This latter is
> still under
> investigation.

You can drop the double quotes around the "benchmark". I have been
involved in benchmarking of high performance computers for 15 years,
and know what I'm doing. I have also run the codes under -O3, with
the same results.

Regarding the inlining: -O2 inlines all the functions that are
declared as inline. -O3 in addition attempts to inline small
functions that are not declared inline. I surely hope that all such
small functions in the library are declared inline, and the fact that
there is no significant difference in performance

>
> h) my own rudimentary benchmark (which was posted on this list)
> used 1000
> instances of a structure which contained all C++ primitive data types
> plus an std::string made up of random characters. It was compiled as
> a boost test and built with bjam so it used the standard boost options
> for release mode. It compared timings against using raw stream i/o.
> Timings for binary_archive and standard stream i/o where comparable.
> I'm still working on this test. The problem is that standard
> stream i/o
> uses text output/input. Of course no one for whom performance is
> an issue
> would do this so I have to alter my timing test to use binary i/o to
> the standard stream as a comparison. But for now, I'm comfortable
> in asserting that there is not a large performance penalty using
> serialization
> as opposed to "rolling your own". As an aside, the test executable
> doing
> the same test for 3 different types of archives and all primitive data
> types only came to 238K. So there isn't a significant code bloat
> issue
> either.

Nobody who cares for performance would use text based I/O. All your
benchmark shows is that the overhead of the serialization library is
comparable to that of text/based I/O onto a hard disk. For this
purpose you are right, the overhead can be ignored. On the other
hand, my benchmark used binary I/O into files and into memory
buffers, and that's where the overhead of the serialization library
really hurts. A 10x slowdown is horrible and makes the library
unusable for high performance applications.

>
> i) somehow I doubt that this archive type has been tested with all
> the serialization test suite. Instructions for doing so are in the
> documenation and the serialization/test directory includes batch files
> for doing this with one's own archives. Was this done? What where
> the
> results? With which compiler? It costs nothing to do this.

Just ask if you had a doubt. The short answer is "I have done this".
After adding the fast array serialization to the binary and
polymorphic archives, I ran all your regression tests, without any
problem (using gcc 4 under MacOS X).

>
> end of observations
> ===================
>
> Admitedly, this is only a cursory examination. But its more than
> enough to
> make me skeptical of the whole idea. I you want, I could expand
> upon my
> reasons
> for this view, but I think they should be obvious.

I will stop this e-mail here since as you can see there is nothing to
be skeptical about. Actually I had already replied to all these
issues before.

I would appreciate if you read my replies instead of making the same
statements over and over again without considering my arguments. The
endianness issue you raise above is, as you can see from my reply,
not a problem in my approach, but instead a killer argument for your
proposal to use save_binary instead.

I will reply to your alternative proposal in a seocnd e-mail.

Matthias

Next message: Matthias Troyer: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Previous message: Gennadiy Rozental: "Re: [boost] Boost Test - yet more mostly spurious warnings."
In reply to: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Next in thread: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Reply: Robert Ramey: "Re: [boost] [serialization] fast array serialization (10x speedup)"
Reply: Paul A Bristow: "Re: [boost] [serialization] fast array serialization (10x speedup)"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk