Boost logo

Boost :

Subject: [boost] [serialization] proposed improvements: forward-compatibility of serialization
From: Andrzej Horoszczak (ahoroszczak_at_[hidden])
Date: 2014-04-14 05:15:11


Hello Everybody, (it seems that my previous post had contained only title - so I repeat the full posting below) We've started (again) using boost::serialization in our multi-platform distributed application about one year ago. However, we have arrived at a road block with forward-compatibility of serialization - which of major importance to us since we cannot force users to upgrade to the same/newer version of application all at the same time. Firstly, question is: perhaps you or someone that you know of - would be interested in such improvement work? We are small company rushed against some die-or-prosper deadlines - and simply do not have enough resources to do it on our own in time avialable - but we could sponsor such work. Secondly, the problem is: To avoid ambiguity I have put together the following description and our own suggestion for solution: The project requirement is centred on improving multi-version class compatibility in binary archives e.g. more flexibility in a situation when we are reading or transferring modified class in C++ code. We need to go beyond what current version of object versioning in boost offers: http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/tutorial.html#versioning Specifically we need to add ability to skip reading unknown or unexpected fields in a non-XML archives we are parsing. Specific example: Version of the application (newer one) that saved the archive has the following code: struct extensionClass { std::string moreInfo; time_t date; template<class Archive> void serialize(Archive & ar, const unsigned int version) { ar & moreInfo; ar & date; } } class gps_position { public: template<class Archive> void serialize(Archive & ar, const unsigned int version) { ar & degrees; ar & minutes; ar & seconds; ar & new_field } int degrees; int minutes; float seconds; extensionClass new_field; gps_position(){}; }; And now the version of the software that needs to read the archive is an older application that has the code that implements the gps_position class in the following way: class gps_position { public: template<class Archive> void serialize(Archive & ar, const unsigned int version) { ar & degrees; ar & minutes; ar & seconds; } int degrees; int minutes; float seconds; gps_position(){}; }; The purpose of the project is to gracefully allow for serialization in older version of application to continue reading the archive, by simply omitting information related to extensionClass new_field; e.g. to properly advance the read position to the beginning of next class in the archive. It is desirable to add to programming interface information that there was incompatibility detected, for further escalation to either to user interface, or higher level code logic that might decide, for instance, to change communication protocol. It is critically important for us to optimize archive size while adding this capability. We prefer to use eos::portable_iarchive which provides varinteger support for multi-OS compatibility (size of int, little vs big endian), therefore we prefer solution which would accept any archive, however, we will be ok with standard boost binary archive solution. Thirdly, the proposal is: The most promising solution that we envisage is to add recursive size information associated with the number identifying class (already placed in front of every data object). By recursive size information I mean size not in terms of bytes for raw data, but in terms of number of fields/objects in the class. The size information would than be expanded until it can be expressed in terms of C++11 standard predefined POD sizes. For instance for simple class example: class SimplePODs { uint32_t firstField, uint64_t int secondField; float thirdField; } The associated (leading) size information for the SimplePODs class is 3. Than within the class itself we already have identifiers of each field types. Since they are POD we can maintain global (and for the archive) dictionary of sizes for each identifier in this case: 4 than 8 and 4 And than it follows: class AggregateClass { SimplePODs first_field; char second_field; } In this case the size information (in the global dictionary) for identifier associated with AggregateClass is 2. Than within the class itself sizes for first_field would be 3 and for second_field it is of course 1. I believe this can be done as part of the version and object tracking process so the performance would still be high and most importantly, the incremental size overhead both in memory and in the archive will be relatively small (only one size information per each type of object contained in the archive - and no size overhead for POD fields). By the way this approach would also increase multi-OS compatibility for instance class on the source computer might be 32-bit compilation of : class CrossPlatform { int idont_care_about_size; } And during saving that field: idont_care_about_size would actually be saved and identified as uint32_t The same CrossPlatform class on destination machine would be compiled as 64-bit application - but since target int would be larger that source we can do silent promotion. Of course the other way around, we can throw exception if the actual value contained within idont_care_about_size exceeds the targets' compilation POD size. OK - I hope that my explanation is clear and I would appreciate any feedback that you might have. And finally let me state that we do greatly appreciate (and use) a lot of boost work and in particular, we consider boost.serialization approach to be probably the best possible under the current constrains of the language. Best regards, Andrew Horoszczak


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk