Boost :

Date view	Thread view	Subject view	Author view

From: Joaquín Mª López Muñoz (joaquin_at_[hidden])
Date: 2005-09-22 10:07:55

Next message: Martin Bonner: "Re: [boost] Errors compiling multi_index hashed.cpp"
Previous message: Aschwin Gopalan: "[boost] integer_mask for long long"
In reply to: David Abrahams: "Re: [boost] [serialization docs] Ping?"
Next in thread: David Abrahams: "Re: [boost] [serialization docs] Ping?"
Reply: David Abrahams: "Re: [boost] [serialization docs] Ping?"
Reply: Robert Ramey: "Re: [boost] [serialization docs] Ping?"

David Abrahams ha escrito:

> Joaquin M Lopez Munoz <joaquin_at_[hidden]> writes:
>
> > David Abrahams <dave <at> boost-consulting.com> writes:
> >
> > [...]
> >
> >> As I've said before, you will need eventually to describe the
> >> relationship between compatible loading and saving archives. It will
> >> be something like
> >>
> >> T x, y;
> >> // arbitrary operations on x to set its state
> >> sar & x;
> >> lar & y;
> >>
> >> Postcondition: y is equivalent to x
> >
> > [...]
> >
> >> Joaquin's post takes an "innovative" approach to the problem of
> >> specifying semantics but it isn't at all clear to me that it holds
> >> water.
> >
> > I can do little to argument against that criticism. If you have
> > specific concerns about the approach please do bring them here.
>
> It's not intended to be a criticism, and no offense was intended.
> Probably the quotation marks were misplaced, so, sorry for that. I
> didn't have time to evaluate what you wrote, so I really have no idea
> whether it holds water. That said, instinct tells me there's no way
> to formulate this issue that really avoids the issue that no two
> distinct objects in C++ are truly equivalent.
>
> >> The reason that "equivalent" is a fuzzy term in C++ comes down
> >> to the fact that two distinct objects always have detectably distinct
> >> addresses, so no two distinct objects can _truly_ be equivalent.
> >
> > As far I know, the only definitions for (object) equivalence
> > in the standard are given in connection with strict weak orderings
> > induced by comparison functors. Beside that, I failed to find
> > any reference about what two objects being "equivalent" means.
>
> Correct, it is not defined. You're expected to understand it because
> it's a plain english word.
>
> >> Leaving aside that language corner, the idea of equivalence works
> >> perfectly well.
> >
> > For the sake of the discussion, let's assume that "a and b
> > are equivalent" is somehow defined as / related to "a==b".
>
> Well, that would be one convenient definition, but since lots of types
> aren't even syntactically EqualityComparable, that isn't much help.
> "Assuming" it basically skirts the equivalence problem.
>
> > My thesis is that there are serious objections against this
> > definition of equivalence in the context of serialization:
> >
> > 1. A serializable type need not be equality comparable.
>
> Hey, sounds familiar!
>
> > 2. "a==b" is a C++ expression, so implying that a and b are
> > objects living inside the same program. If I save an object a
> > on my PC, pass the file to you and you load it a year later as
> > b on your Linux box, what is "a==b" supposed to mean?
>
> Exactly.
>
> > 3. A serializable type can be implemented without observing
> > the "a==b" rule: for instance, a list-like container can
> > load the elements in reverse order --I understand this is
> > a perfectly legitimate implementation that shouldn't be banned
> > because of the "a==b" restriction.
>
> I'm not sure it should be considered legit under any Archive concept
> that will be defined by the library. Is it a useful semantics?
> Beware premature generalization!

In my serialization stuff for Boost.MultiIndex I actually have a serializable
type that does not conform to the equivalence rule. Its layout kinda looks
like:

template<typename Value>
struct node
{
value v;

  template<class Archive>
  void serialize(Archive& ar,const unsigned int)
  {
    // do nothing
  }
}

I use this weird construct to make node trackable, but no contents
information is dumped to the archive (that is taken care of somewhere
else in the program). In case you're curious, this arises in connection
with serialization of iterators.
So, yes, there are actual uses of serialization not conforming to
the equivalence rule.
I guess one can also figure out other possible scenarios breaking the
equivalence rule, like for instance a struct where some fields are
serialized whereas others are local.

>
>
> > One can argue that (1) and (2) can be overcome with a
> > "fuzzier" definition of equivalence relying on the reader's
> > intuition about this relationship, but (3), IMHO, breaks
> > down any hope of attaching equivalence to serialization
> > semantics
>
> Only if you think (3) is important. And if you do, as I wrote
> elsewhere, you can always make a weaker concept than Archive, that
> allows (3).
>
> > ultimately, archives are not responsible for holding the equivalence
> > rule,
>
> It doesn't matter whether they're _ultimately_ responsible, if
> Serializable also gives sensible guarantees.

This can lead to circular definitions, see my last paragraph in this post.

>
>
> > as they relay to user provided serialize() functions.
>
> But that's not what Robert is saying; he's saying they don't have to
> even do that!

IMHO an archive should guarantee that loading/saving an UDT executes
the associated load/save functions. Failing to do would devoid the
Archive concept of most useful purposes.
A do-nothing archive (i.e the logging example) could be covered by
a more relaxed concept, if someone finds that useful.

> > So, from my point of view, the real task of an input/output
> > archive pair is to ensure that, when a T::serialize function is
> > invoked on loading, the input context (i.e, permissible >> ops
> > on the input archive) is a replica of the output sequence.
> >
> > This rule recursively descends to primitive (in the serialization
> > sense) types, where an equivalence rule can actually be provided.
> > My (skectchy) proposal is merely a formalization of this
> > idea.
>
> That's an interesting rule. So essentially you are saying that the
> output archive needs to record enough structure to ensure that the
> input archive can read the same sequence of types?

Yes.

> What if the user serializes an aggregate struct X containing two ints?
> Is the corresponding input archive required to be able to read two
> ints as part of reading an X?

Not only that: X::save is actually *required* to load those two ints. Consider
the following sample:

#include <boost/config.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <iostream>
#include <sstream>

struct foo
{
foo(int a=0,int b=0):a(a),b(b){}

int a,b;

BOOST_SERIALIZATION_SPLIT_MEMBER()

  template<class Archive>
  void save(Archive& ar,const unsigned int)const
  {
    ar<<a;
    ar<<b;
  }

  template<class Archive>
  void load(Archive& ar,const unsigned int)
  {
    ar>>a;
    // we do not load b!!
  }
};

int main()
{
const foo x0(1,2),x1(3,4);

  std::ostringstream oss;
  {
    boost::archive::text_oarchive oa(oss);
    oa<<x0;
    oa<<x1;
  }

foo y0,y1;

  std::istringstream iss(oss.str());
  boost::archive::text_iarchive ia(iss);
  ia>>y0;
  ia>>y1;

std::cout<<"y0.a="<<y0.a<<std::endl;
std::cout<<"y1.a="<<y1.a<<std::endl;

return 0;
}

Note that foo::save only loads the first int. The program outputs

y0.a=1
y1.a=2

which is incorrect (y1.a should be 3), so serialization of foo is not
correctly implemented. For XML archive types my hunch is that the
program would throw.

> >> I suggest you use that, and the established conventions from the
> >> literature, to describe semantics. You have, essentially, an
> >> emergency on your hands -- this is not the time to try untested
> >> approaches. First plug the dyke and then, if you have time, think
> >> about a rewrite.
> >
> > Without wanting to sound harsh, I think that what you propose as
> > established conventions for describing serialization semantics hold
> > little real information and, worse yet, can mislead readers to
> > assume that Boost.Serialization is constrained by the equivalence
> > rule when it is not (cf. point 3. above.) The current docs are
> > better in this respect since at least they don't assert false
> > semantic rules.
>
> I guess it depends whether you want something useful or just something
> minimally restrictive. Equivalence, even if not defined in the
> standard, is still useful. If we took out requirements such as
>
> If a==b and (a,b) is in the domain of == then *a is equivalent to *b.
>
> from the input iterator requrements, and
>
> t = u T& t is equivalent to u
>
> from the assignable requirements, I assert that input iterators and
> the algorithms would be much less useful.

Well, of course users of Boost.Serialization (specially if they do not
write any serialize function of their own but merely use serialization
capabilities of 3rd party types) expect this fuzzy equivalence rule
to be held. My point is that meeting that expectation is up to
each serializable type implementer, and shouldn't be enforced by
the concepts section.

If Robert does not have the time/will to pursue a more formal approach,
I think the equivalence rule could be relaxed to something like:

  T x, y;
  // arbitrary operations on x to set its state
  sar & x;
  lar & y;

  Postconditions:
    *For primitive serializable types, y is equivalent to x.
    *For pointer types, bla bla
    *Other types are expected to implement serialization
    in such a manner that y is equivalent to x, but this is not
    guaranteed.

As an "intuition oriented" guide this does not harm, but please note that
the former does not define what a compatible input/ouput archive pair
is: archive compatibility is defined in terms of object equivalence,
and object equivalence relies on archive compatibility, we got a
circularity here. The recursively descending approach breaks that
vicious circle --if I'm not missing something. To sum it up, what
I propose, in nonformalese, is:

* An input archive iar is compatible with an output archive oar if
  1. iar allows a sequence of >> ops matching the corresponding << ops
  made upon oar (matching defined in terms of types involved and
  nesting depth of the call.)
  2. For primitive serialization types, the restored copies are equivalent
  to their original (expand on this, specially with respect to pointers.)
* A type T is serializable if it is primitive serializable or else it defines
  the appropriate serialize (load/save) function such that the sequence
  of >> ops in load() match the << ops in save().

[This is not a requirement] For each serializable type, the implementor
can define "equivalence" in terms of its constituent types. For instance,
for std::vector:

Given a std::vector<T> out, where T is serializable, and a restored copy in,
then in(i).size()==out(i).size() and each in(i)[j] is a restored copy of
out(i)[j].

Sorry for the long post. Best,

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo

> That said, I find your approach interesting. My instinct about the
> need for equivalence here might be wrong, although I would still need
> almost every Serializable type to provide an equivalence guarantee.
>
> --
> Dave Abrahams
> Boost Consulting
> www.boost-consulting.com
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Next message: Martin Bonner: "Re: [boost] Errors compiling multi_index hashed.cpp"
Previous message: Aschwin Gopalan: "[boost] integer_mask for long long"
In reply to: David Abrahams: "Re: [boost] [serialization docs] Ping?"
Next in thread: David Abrahams: "Re: [boost] [serialization docs] Ping?"
Reply: David Abrahams: "Re: [boost] [serialization docs] Ping?"
Reply: Robert Ramey: "Re: [boost] [serialization docs] Ping?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk