Boost logo

Boost :

From: JOAQUIN LOPEZ MU?Z (joaquin_at_[hidden])
Date: 2007-09-28 15:58:59


----- Mensaje original -----
De: Robert Ramey <ramey_at_[hidden]>
Fecha: Viernes, Septiembre 28, 2007 6:06 pm
Asunto: Re: [boost] [serialization] Proposal for an extension API
to the Archive concept
Para: boost_at_[hidden]

>
> > I've tried to describe the options as dispassionately as I
> > could, so as to lay a common ground for further discussion.
> > Do you see any error in the description? Are you satisfied
> > with this rendering of the available options?
>
> A very good summary and explanation.

Great, I'm glad we've got the discussion grounds agreed upon.

> > But an important one (and more are coming, e.g.
> > std::tr1::shared_ptr).> You are of course free to avoid expanding
> > the Archive concept as in E) and F), but the logical implication
> > of this is that shared_ptr is technically not serializable.
> > Nothing wrong about that, if that's your declared intention, but
> > users should know.
>
> OK - now I see better your concern. I would say:
>
> shared_ptr as concieved and implemented by boost does not provide
> sufficient exposure to be support the concept of Serializable Type
> as defined by Boost Serialization:
> [...]
> shared_ptr is the only type which has come up in several years
> which has this problem.
>
> Some application specific types might have this issue, but in those
> cases, I would expect the ability to attach an application specific
> helper to the archive, thereby expanding the Archive Concept
> for just that application would be acceptable.

OK, I understand your position on the grounds that you deem
shared_ptr (and other potential types on the same vein) as
pathological cases. I will try in the lines below to convince
you (or at least instill some drops of doubt in you) that this
is not necessarily the case.

[...]
> The power of any software (or mathematical/logical ) module resides
> in its logical coherence. Arbitraritly extending it in tangent
> directions requires that the a use of the module or concept
> consider a bunch of special cases every time he uses even for
> simple cases. Such types of extensions result in a net reduction
> in the utility of the library.

Some remarks on this: I cannot but wholeheartedly agree with
you that logical coherence must drive the extension of any
firmly grounded design. But from my point of view, the helper API
is a perfectly sound addition to B.S, because it fullfills
a very general need in very general fashion:

  The helper API is just about keeping state information
  associated to the serialization process of objects of
  a given type.

Think about it: as it's currently modeled by B.S, serialization
is esentially a stateless process: when an object t of type
T is about to be serialized, the assumption is that it won't
rely on other T instances which have been previously serialized.
Is this assumption reasonable? Well, I contend that in some,
non-pathological situations, stateful serialization is needed.
An almost insultingly obvious case: object tracking. If you
give me the helper API and an archive without object tracking,
I can implement object tracking myself without relying on
the archive implementation or B.S facilities --and it's
very easy to do, if you think about it. How's that for
applicability of the helper API?

Down below I'm explaining also the type of mine which has
originated the discussion, for another example of use of
the helper API.
  
[...]
> Remember, this whole issue has come about because shared_ptr
> (unlike any other type so far) has been written in a manner that
> it is effectively closed for extension. Maybe you want to direct
> some observations in that direction.

That's fair criticism. But consider that there are cases where
the type to be serialized is closed beyond the user's control.
And my example below is open yet not serializable without
helper API.
 
> I've just come accross a very relevent instance of this. the 1.35
> version of the binary
>
> In order to support optimization of a couple of collection types
> extra machinery was added to basic_binary_?archive (in 1.35)
> This was subject of a fairly acrimonious discussion last year.

Can you provide me with a link to that thread? Thanks!

[...]
> > In my particular case, I can assure you that the need of a
> > helper API is a genuine one: I cannot serialize my type
> > efficiently without that API, no matter how much I want to
> > expose the guts of my type for serialization purposes. Please
> > believe me on that, this is not a case of me wanting
> > to keep my type encapulated, "pure", or anything. It's sheer
> > impossibility to do it otherwise.
>
> Hmm - on one hand, I don't doubt your sincerity nor your assessment.
> On the other hand, I've yet to see such a type myself. The original
> serialization of shared_ptr was implemented by exposing more of
> the shared_ptr internals so in that one case it is possible. Of
> coursewhether that solution is desirable would be a matter for
> discussion.

I wanted to keep this discussion untied from my particular problem,
but you're entitled to see the case and evaluate by yourself, so
here it goes, the following is a simplified description of
the type to keep things short.

My type implements a sort of flyweight idiom, by which objects
with the same value internally keep a pointer to the same
representation, so as to avoid duplication of data and excessive
memory consumption:

  flyweight<string> fw("hello");
  flyweight<string> fw2("hello");
  // fw and fw2 internally have pointers to the same string object.

A crude approximation to the implementation of flyweight is:

  template<typename T, template <typename> Container=...>
  class flyweight
  {
  private:
    typedef Container<T> factory_type; // used to keep value objects
    static factory_type factory; // global value factory

    // A flyweight maintains an iterator to the associated value
    typedef typename factory_type::iterator handle_type;
    handle_type handle;

  public:
    flyweight(const T& t) // ctor #1
    {
      // retrieve an iterator to an equivalent value or else
      // insert a new one if no equivalent value is found
      h=factory.insert(t).first;
    }
    flyweight(const flyweight& x) // ctor #2
    {
      // point to the same value as x
      h=x.h;
    }
    ...
  };

Now, I want to serialize flyweight *efficiently*. The first
naive approach is this one:

  template<class Archive,T,...>
  void save(Archive& ar,const flyweight<T,...> & fw,const unsigned int)
  {
    ar<<*(fw.h); // serialize the associated value
  }

  template<class Archive,T,...>
  void load(Archive& ar,flyweight<T,...> & fw,const unsigned int)
  {
    T t;
    ar>>t;
    fw=flyweight<T,...>(t);
  }

but this is not efficient because, on loading time, duplicate
values are created through ctor #1, which incurs a factory
lookup, when I'd want to use ctor #2 (direct copy from a
previously loaded equivalent flyweight object).

So that's the problem. In the particular case where handle_type
is a pointer, the thing can be done by B.S object tracking,
but as the type is an unspecified iterator I cannot do that.
To me, this is a clear example of serialization needing state
info. With the helper API, serializing flyweight<T> is implemented
so efficiently and beatifully that it almost hurts :)

[...]
> So the current situation, is
>
> a) We have an Archive Concept, and Serializable Concept which are
> fairly coherent. The are currently only broken by
> basic_binary?archive.
> b) The classes in the library common_?archve, basic_?archive, are
> implementation features. It is not required to derive from any of
> theseclasses to implement the concepts. Of course its convenient
> to do
> so as they implement common aspects of the Archive Concept - but
> they are not required to.
>
> c) Some types - so far only shared_ptr and your ? - do not conform
> to the concept of a serializable type as they stand. My position is
>
> i) they are very infrequent.
> ii) the Archive Concept can be extended for these special cases
> throughcomposition (inheritance) to provide ad hoc solutions.

I hope I've been able to cast some doubt on your commitment
to i) Also, I'd like to add that, from my experience as a lib
maintainer, functionality usually predates usage: it is not
until you provide some new stuff that people begin seeing
application scenarios for it, not the other way around. Much more
so if you're keeping an advanced lib as Boost libs are held to be,
where contributors for new ideas are scarcer.
 
> That's what 1.34 did for shared_ptr and I'm willing to continue
> doing that. The only think i want to do is to move the helper
> API out of the basic_?archive where it pollutes the Archive
> concept (which is why i didn't document it)
> and package it as a mix-in which is used with naked_?archive to
> producethe "shared_ptr" friendly archive classes.
>
> What do you see wrong with that?

About this last point of yours, so according to your proposals
what archive types would provide the helper API? You say
basic_?archive won't provide it? Was it this way in 1.34?
If not, won't you be breaking shared_ptr serialization
code then?

I look forward to your opinions about my position on the
generality of the helper API and about the flyweight case.
In general, I understand your position and I think it's a
reasonable one, given the particular weights you assign to
the factors involved --different to mine. When we come to
this point the thing it's not then about hard facts but
opinions, but I hope I'll be able to pile some more arguments
to my tip of the balance

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk