Boost logo

Boost :

Subject: Re: [boost] [serialization] Dealing with any tainted types.
From: Robert Ramey (ramey_at_[hidden])
Date: 2011-01-13 16:38:50


David Sankel wrote:
> The serialization library allows for the painless serialization of
> most data types and even polymorphic types like variants and vectors.
>
> There is one type, however, that doesn't have a serialize function
> that makes sense. That type is any. It is easy to see that if we make
> an any serialize function, we're going to need to make some
> assumptions at least about which subset of types it might contain.
>
> I have a complex algebraic datatype, lets call it C, that has a
> member (that has a member...) of type T. T is "tainted" with a member
> of type any. I'm looking at the possible ways to serialize C with
> boost.serialization. We can know, at runtime, which subset of types
> the any member of T can contain and how to serialize each of those
> types. Lets call the dictionary type that has that information D.
>
> Here are the options I've come up with so far...
>
> Option 1:
>
> Make some global variable of type D, called d. Before calling any
> serialization of C, I ensure d has the correct value. The serialize
> function for T uses d to figure out how to serialize the any member.
>
> This option would certainly work, but I'm using a global variable as a
> workaround of the fact that I cannot add arguments to T's serialize
> function. No points for beauty here.

take look at how shared_ptr is serialized. Seems to me a similar
problem. This was handled by adding a "helper" class just for this
shared_ptr type. Such a "helper" could hold the otherwise "global"
variable just for that archive instance, this maintaining the
thread-safe characterstic of the library..

> Option 2:
>
> Instead of serializing a value of type C, serialize a value of type
> "struct CWithDict { C c; D d; }". In this serialization function I
> can use d whenever I need.
>
> Unfortunately the contents of this serialize function would need to
> duplicate most of the functionality of boost.serialize in the first
> place since T is buried deep within C's structure. Although this
> option works, it requires rewriting a bunch of serialize which isn't
> attractive.

Take a look at "extended type info". This extends the rtti system
to handle types identified by a string at runtime. This is the basis
for the "export" functionality.

> Option 3:
>
> Make an archive wrapper:
>
> template< typename Archive >
> struct DArchive
> {
> Archive a;
> D d;
> };
>
> DArchive would model the Archive concept by forwarding functionality
> to a. However, the T serialize function could access DArchive's d
> member for serialization of the any.
>
> This would solve the problem at the expense of extending the meaning
> of Archive a bit. It seems pretty elegant to me.

This seems similar to the "helper" described above. That is there is the
concept of a "naked_text_iarchive". text_archive looks something like:

class text_iarchive : public naked_text_archive, shared_ptr_helper
{
...
};

This seems similar to what you want to do.

There are a couple of problems with this.

It's become clear that the "Archive Concept" is currently ambiguous
and this needs improvement to support the construction of robust
archives by other users. The current efforts to do this have been
successful. But this ambiguity makes these much more fragile than
they should be.

Sometime ago, the concept of a dynamic "helper" was built into the
archive base class. This permited the attachment at runtime of code
by types which otherwise would not be serializable. The only type
that needed this at the time was shared_ptr. I didn't document it as
I saw it as a carbuncle on the face of my otherwise pristine library.

No other type ever needed that "hack". Then I took that code out,
and added on the specific helper for the shared_ptr type which exists
to this day.

Of course you might guess what happened.

Soon after I took that code out and changed to the the "statically"
added shared_ptr_helper, A new type appeared (flyweight) which
was not serializable with out similar functionality. There was talk
about going back to the old system - but I couldn't face doing
the extra work and it never got done. Of course this would require
some more documentation and concepts, etc., etc. Also even though
such a facility is almost never necessary, people would start to use
it and then there would be whole 'nother source of questions to support.

So, ........

To really do this right, I see the following as necessary

a) Clarify and simplify the current archive concept. I've thought about
this alot and know what I want to do - but I'm not excited enough to
do it.

b) Go back to the original runtime helper and update the documentation
accordingly.

c) tweak the shared_ptr serialization to use the runtime helper

d) and redefine ?_iarchve as what naked_?_archive is now.

It might not seem like a huge amount of work. But it's enough to disuade
me from starting. It would also mean eliminating the workd "naked"
from my naming - which I've grown fond off.

> So, that's what I've come up with. I'm interested in comments. Does
> anyone know of a better way to do this? Could this possibly lead to a
> general mechanism for similar problems?

I think that if the archive concept were "fixed" it would permit things like
you suggest - better extention through derivation. Also it might permit
copying of one archive type to another to permit these extentions to be
dynamic. E.G.

void serialize(Archive &ar, T &t, const unsigned version){
    ArchiveWithNoTracking arnot(ar);
    arnot & arnot;
}

Food for thought.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk