Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2005-10-15 16:37:21


Matthias Troyer wrote:
> On Oct 11, 2005, at 6:45 PM, Robert Ramey wrote:
>
> Actually the way I see it you deal with the serialization of
> pointers, the class versioning and object tracking there. All of
> these get serialized through special types, created by strong
> typedefs, and the archives can then override the serialization of
> these classes. Please correct me if I'm wrong, but the built-in types
> like int, double, etc.. are actually all dispatched to the archive,
> and not serialized by the [io]serializer.

correct

>The only exception seem to
> be pointers (which should be handled by the serialization library),
> and C-arrays.

and enums

> It would thus make sense to put the serialization of
> arrays into a separate header, just as you have done for std::vector,
> and as I will do soon for other classes.

disagree - default serialization of C array is included here as its a
built-in
type and it does have a universal default implementation.

> However there are also other options as you pointed out: the archive
> classes could override the serialization of arrays. As long as this
> stay limited to arrays there will be no MxN scaling problem, but
> there is still a problem of code duplication, since each archive type
> implementing fast array serialization has to override the
> serialization of arrays.

Disagree - all overrides can be placed in an archive adaptor class which
takes the adaptee class as a template argument. This is written once as
in my attachment above and than can be applied to any working archive.
Using something like my attachment above:
a) substitute your improved special overrides for vectors and arrays.
b) if possible, add some compile time assertion that traps the cases
where the adaptor is applied to a base class that isn't appropriate.
c) by hand, make mini class declaration equivalent to templated typedefs
(since C++ doesn't have them) of all the combinations that you now
know will work.

>This is also error-prone since we have to
> tell the implementors of archives supporting fast array serialization
> that they should not forget overriding serialization of built-in
> arrays.

nope, they can do one of the following:
a) use one of your "templated typedef" classes above
b) apply the fast_archive_adaptor to any other archive class they
want. Ideally, it would trap if the application wasn't appropriate
but that may not be worth implementing. Its just boiler plate
code for any combination of adaptor/adaptee - you could
even make it a macro if you wanted to.

>>> 2. boost/serialization/vector.hpp is also modified to dispatch to
>>> save_array and load_array where possible. I don't think that this is
>>> a problem?

I do. It buries knowledge about the archive in the serialization of a type.
what happens when someone comes along with "checked_archive" or
"thread_safe_archive", are we going to decorate the implementation
of all the serializations for each one?

>> That would mean that users are including save/load_array even if they
>> don't want them or want to use their own versions. Oh - then
>> documentation
>> has to be enhanced to explain all this internal behavior.
>
> Actually the cost is minimal if the archive does not support fast
> save/load_array. The hasfast_array_serialization.hpp header only
> consists of the default traits:

One still has to include the header. This violates the boost principle -
don't
pay for what you don't use. Actually the boost principle would be - don't
even know about what you don't use - which is better anyway.

> template <class Archive, class Type>
> struct has_fast_array_serialization : public mpl::bool_<false> {};
>
> and the serialization of a std:vector only contains this minimal
> extension:

> template<class Archive, class U, class Allocator>
> inline void save(Archive & ar,const STD::vector<U, Allocator> &t,
> const unsigned int ,
> typename
> boost::enable_if<boost::archive::has_fast_array_serialization<Archive,U>
> >::type* =0
> ){
> const boost::archive::container_size_type count(t.size());
> ar << BOOST_SERIALIZATION_NVP(count);
> if (count)
> ar.save_array(boost::detail::get_data(t),t.size());
> }
>
> The cost of parsing these few lines is negligible compared to the
> rest of the serialization library.

All this can easily be move to the fast_array_archive_adaptor class.
with no loss of generality or convenience or efficiency.

>> I would prefer
>> something like the following:
>>
>> class my_class {
>> stl::vector<int> m_vi;
>> ...
>> };
...

> I find this proposal unacceptable for the following reasons
...

OK a bad idea - I take it back.

> Now to address your issues:
>
> a) keeping the STL portion small: I don't see this as a valid point
> since, as you can see above it increases the size of the STL
> serialization code only by a few lines.

It could conflict with someone else's extension/specialization. There
is no downside to including it in the fast_...archive_adaptor class

> b) "leave the user in control of what's going on": actually this is
> breaking the orthogonality. The user should not influence the
> archives internal behavior. The archive class should decide how to
> serialize the array, not the user. The user can pick between fast and
> slow array serialization by choosing a different archive class.

Just fine by me. Either he chooses the original plain vanilla one or he
choose one to which your fast_...archive_adaptor has been applied.
He can apply it himself or use your premade "templated typedef"
classes if he's in a hurry.

> c) development on an independent track: the only interference we have
> is this one file vector.hpp.

again, no downside in factoring your special features into your own
special adaptor.

Note that using the adaptor approach has another huge benefit. Suppose
someone else comes up with another adaptor - checked_archive adaptor
which check quadriple checks the save/load by trapping Nan for floats,
and who knows what else.

One could then apply either or both adaptors to create a new archive
with all the features - all without anyone writing any new code.

> Why should anyone
> still use the other version? To save the compile time for 5 lines of
> code?

LOL - believe me, someone will want to do it differently. I can't say
how or why - but believe me it will happen. The adaptor approach
lets everyone add their own thing and lets everyone else pick and
choose which combination of things they want to add.

Hate to tell you mattias, but someone, somewhere isn't going to like
your changes for some reason. You can either debate the issue with them
or you can factor your improvements so that they are optional. Believe
me - the latter is going to save you a lot of time.

>> b) should save/load array be incorporated into stl collection
>> serialization
>> to make
>> its usage oblicatory? I say no, you say yes.
>
> This point b) is where I will not budge, for the reasons explained in
> earlier e-mails. While I could maybe live with the fact that I have
> to override the C-style array serialization in all archives
> supporting fast array serialization, I will never do that for other
> classes, since this again opens the can of worms discussed
> previously. Let me outline it again:
>
> if the vector.hpp serialization stays unchanged, I will have to
> override it in the archive.
>
> Next we'll implement the std::valarray serialization. What should we
> do? Support fast array serialization out of the box or leave it to
> the archive implementor to override. We'll probably follow the
> example of std::vector and do not support it. Now the archive also
> has to provide overrides for std::valarray, which can still be done.
>
> After that we'll implement serialization of ublas matrices. Following
> the above examples we will again not implement support for fast array
> serialization directly, to save a few lines of code. The consequence
> is even worse now: the archive implementation has to override the
> serialization of all ublas matrices, and will either be inefficient,
> or has to have knowledge of implementation details of the ublas
> matrices.

all these should be in either one fast...adaptor or separate adaptors
according to your taste.

> We would be back at both an MxN problem, and will have tight coupling
> between archives and the implementation details of the classes to be
> serialized. We should avoid this at all cost!

Nope, we have at most one adaptor for each "special" type. The
same adaptor applies to all archives (present and future) with which
it is compatible.

> So the real question here is:
>
> "Shall we recommend that the serialization of array-like data
> structures uses fast array serialization by calling save/load_array
> when possible?"
>
> My clear answer is yes, and I will not budge on that. The
> serialization library is useless to me with a 10x performance hit.

your adaptor will fix that for you.

> And many people I talked to do not use Boost.Serialization but their
> own (otherwise inferior) solutions for that reason. I just want to
> mention that vectors with billions elements are typical sizes for
> many of our problems.

and your adaptor will fix it for them as well.

> The real question is where to draw the line between using fast array
> serialization and not using it?

> - I think we can agree that classes like multi_array or ublas
> matrices and vectors should be recommended to use it wherever possible

the user will want to decide whether or not to use which adaptors.

> - The same should be true for std::valarray.

yep

> - To stay consistent we should then also use it for std::vector
> - What about C-arrays? since I rarely actually use them in raw form
> in my code, and never for large sizes, I have no strong personal
> preference. It would just be consistent and speed up serialzation at
> negligible compile time cost to also use the fast option there, but
> if you veto it I could live with it.

you can include it exclude it from your adaptor as you wish.

> Actually it might be an adaptor only in the case of the binary
> archive. Other archives I mentioned (such as the MPI archive) will
> have to support fast array serialization directly to have any chance
> of being usable.

I would disagree with that. MPI archive might have it built in but
it could just as well use the adaptor. All the magic happens at
compile time - there is no run-time overhead. So the only considerations
are design and flexibility.

> Regarding the binary archives: if your concern is that it will make
> it harder for you to maintain, then I could, if you want, propose to
> submit the fast array version as a replacement for the existing one
> and take over its maintenance. That will make your task smaller and
> in the review of the submission we can hear if someone wants to keep
> the existing version.

In genearl, I want no more coupling than is absolutly necessary. I don't
think
its necessary here. You can get every thing you want and more by
using an archive adaptor.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk