Boost logo

Boost Users :

Subject: Re: [Boost-users] Serialization: BOOST_CLASS_EXPORT changes between1.38 and 1.52
From: Nathan Whitehorn (nwhitehorn_at_[hidden])
Date: 2013-02-07 12:11:47


On 02/07/13 10:36, Robert Ramey wrote:
> Nathan Whitehorn wrote:

[trimmed]

>> So my questions:
>> 1. Is it possible for things that would return GUIDs of NULL to try
>> harder and look in a global registry instead of silently breaking
>> things? This kind of global lookup was how 1.38 always worked and it
>> seems considerably less fragile.
>
> The old method did instantiation by default so it worked then in
> some cases where the current one won't. So it seems "less fragile".
> But I think that's sort of an illusion. It does so at a cost of gratuitous
> instantiations which often are harmless - though non-optimal. But
> the real problem is that it left this out of the hands of the programmer.
> This could lead to silent and surprising behavior. Now we have the
> situtation where this behavior can't happen - we have to explicitly
> plan for it. I believe that this leads to less surprising programs - albiet
> at the cost of some surprising behavior at build time.

Thanks for the explanation! What I'm running into is surprising behavior
at *run time* despite a build that apparently works. We've managed to
avoid all the compile and link time redefinition issues so far by some
combination of planning and luck.

>> 2. Is there a way to handle BOOST_CLASS_EXPORT_KEY() sanely in the
>> case of templates without the risk of silent serialization failures
>> -- in all instances of that class -- that depend on global
>> initialization order?
>
> I believe that the best way to do this is to just do an explicit
> instantiation
> in a cpp file which imports the header containing BOOST_CLASS_EXPORT_KEY()
> and includes BOOST_CLASS_EXPORT_IMPL(). Once compiled, this
> can be added to a library or DLL. This will result in one and only instance
> of
> the class serialization existing in the program rather than mulitple ones
> (in the case of DLLS). Less code and better yet, this eliminates the
> possibility that the mainline module and the dll have different versions of
> the code which would be agony of the worst type to track down.

This is basically what we were already doing (there were never
BOOST_CLASS_EXPORT() or -- with the exception of some templated things
-- inlineable serialize() routines in header files for the reasons you
mention). What I'm concerned about is a situation like this:

Main library:
Header:
template<typename T>
class I3Vector : public std::vector<T>, public I3FrameObject (our base
class) {
private:
  serialize();
};

BOOST_CLASS_EXPORT_KEY(I3Vector<T>) for a variety of T

Implementation:
<Instantiate serialize and BOOST_CLASS_EXPORT_IMPL(I3Vector<T>) for the
same variety of T>

Second library (a reasonably standard part of the software):
Header:
BOOST_CLASS_EXPORT_KEY(I3Vector<A>) for some other type A

Implementation:
<Instantiate serialize and BOOST_CLASS_EXPORT_IMPL(I3Vector<A>) >

Third library (written by someone else as an addon):
Implementation:
Serialize an I3Vector<A> *without* including the second library's header

What happens here is that library #3 will usually appear to work -- and
certainly compile and link without issue -- because all the
serialization instantiation was done in library #2. *However*, if
library 3 is loaded before library 2, both library 3 *and* 2's attempt
to serialize I3Vector<A> will starting failing with an unregistered
class exception. This is because extended_type_info_typeid<A> is a
singleton and the first instance of it (as well as the now competing
definitions of the classes if not fully inlined) came from library 3
where the GUID template specialization had not happened.

This is an awful problem to have to debug, especially when libraries 2
and 3 are written by unrelated parties and the behavior happens to
depend on the user's choice of load order (the libraries come in through
RTLD at runtime). It means that all possible I3Vector<T> need to have
keys exported in a common header somewhere that you can't possibly avoid
including if you ever use an I3Vector<T>. Otherwise, everything can
break everywhere if you happen not to have included that header in
wherever happens to be the first occurrence of the type from the
perspective of the (potentially dynamic) linker.

>> 3. Is it possible to change the GUID set in the extended type info
>> object of a pointer_[i/o]serializer at runtime after the class has
>> been added to the export registry?
>
> I have never considered this. I don't see what this would be used for.
> The singleton class table is never modified after it is constructed
> (before main is called). This is necessary for the serialization library
> to be thread-safe.

What I was hoping to do is to replace (still before main is called) any
possible NULL GUIDs for a class with a non-NULL one if the relevant
extended_type_info ever gets instantiated with a non-NULL GUID (which
would require some changes to how the extended_type_info_typeid
constructor works, but that's a separate issue). The idea would be that
the GUID attached to a BOOST_CLASS_EXPORT(), when it runs, would be the
final word on the matter instead of a leftover NULL from an
instantiation in some place that didn't know about the GUID.

>> 4. Are there any suggested mechanisms for local hacks, given that we
>> control the archive implementation, to implement 1-3 without changes
>> to boost?
>
> to re-summarize my suggestion above.
>
> a) change all the headers to use BOOST_CLASS_EXPORT_KEY()
> b) make a small *.cpp file for each header which imports the header
> and invokes BOOST_CLASS_EXPORT_IMPL().
> c) add your small *.cpp file to your library - either static library or dll.
> d) while you're at it, you might want to consider adding the serialize,
> save, and load functions for the class to the *.cpp file and not making
> them inline. This will eliminate any code bloat generated by the
> serialization
> library. If your DLLS are dynamically loadable, they will only occupy
> memory when the the classes they refer to are actually being used at
> runtime.
> (just don't load/unload the DLLS while multi-threading - use a mutex!)
>
> it seems you've touched upon the issue regarding serialization of
> template classes. This was also touched upon in a previous email.
> Currently we have to explicitly instantiate any templates we want
> to serialize. Automatically instantion of template generated classes
> using some combination of enable_if, partial specialization and who
> knows what else is interesting to consider, but likely much trickier
> than first meets eye. Also our "guid" is a string which can only be
> processed at runtime. Replacing this with a "guid" generated at
> compile time from the class name, might make somethings possible
> which weren't before. This is sort of irrelevant to your current
> situation, but I like to keep the pot boiling.
>

Thanks for the suggestions. They mostly reflect what we were already
doing -- people get beaten with a stick if they try to instantiate
serialize methods in header files or multiple times and they are usually
kept in implementation files for that reason. We've mostly given up on
automatic instantiation, but kept it easy to extend if you want to (see
the problem with I3Vector above) -- although it would be amazing if you
figured out how to do it.
-Nathan


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net