Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2005-05-11 12:33:19


>> OK let me rephrase. This code conflicts with the description of the
>> description of the functioning of the operator << as described in the
>> documentation.
>
> Can you give me specific doc URL? I can't find the relevant part in
> TOC.

Serialization/Reference/Special Considerations/Object tracking

explains this.

> That isan object
>
> What's "isan"?

That is, an

>> can be passed as const
>> referene and still be modified during the process of serialization.
>> So this is not bullet proof - but I believe it is very helpful.

> Let me draw an analogy: there's std::ostream. For years, we save
> objects to ostreams and we're not required to save "const" object.
> The fact that serialization acts in a different way is inconsistent
> and confusing. I
> think you'll agree that users should not read documentation just to
> save an object.

std::ostream doesn't need to track the objects saved. So passing objects
which change while in the course of creating an output file doesn't create a
problem, while in the serialization system it DOES create an error.

>
> And let me clarify again -- is this indended to stack only stack
> allocated objects?
>

No. Its for any object whose contents might change while in the course of
creating an archive.

Example.

    X * x;
    ..
    ar << x;
    ....
    *x = y;
    ...
    ar << x; // uh - oh bug introduced since x is being tracked.

The static assert will flag this as an error at compile time.

if this is what one really wants to do then X should be marked untracked in
which case no error will be issued.

>>>> Note that the documentation suggests that he above be reformulated
>>>> as For(...
>>>> ar << a[i]; //
>>>>
>>>> Enforcing const-ness also has the effect of preventing
>>>> serialization from altering the state of the object being
>>>> serialized - another almost impossible to find bug.
>>>
>>
>>> Can you explain:
>>> 1. How this bug can happen
>>
>> Its very easy to write
>> for(...{
>> X x = *it; // create a copy of
>> ar << x
>> }
>>
>> all the x's are at the same address so if they happen to be tracked
>> because a pointer to some X in serialized somewhere in the program,
>> then the
>> subsequent copies would be supprsed by tracking.
>

> So, you're saying that the behaviour of the above code will be
> different, depending on whether some pointer to X is saved somewhere
> else? This is very strange -- here, saving is not done via pointer, so I
expect
> that no tracking is ever done. I'd argue this is a bug in
> serialization.

Hey - that's not a bug - its a feature !

tracking is only done if its necessary to guarentee that the objects can be
loaded correctly.

If you believe this is a bug, you can use serialization traits to mark the
particular class "track_always"

>
> The only problematic case is
>
> for(...{
> X x = *it; // create a copy of
> ar << x
> }
>
> X* x = new X:
> ar << x;
>
> where address of newed 'x' is the same as address of saved 'x'. But
> this can never happen, because heap memory and stack memory are
> distinct.
>

In the loop, a new value for x is set every iteration through the loop. But
the address of x (on the stack) is the same every time. If X is tracked,
only the first one will be saved. When the archive is loaded, all the x
values will be the same - not what is probably intended. So, the question
is what is really intended here and trapping as an error requires that this
question be considered.

>> With the above
>> formulation this cannot happen. This error is detected a compile
>> time - very convenient. Of course the following would work as well
>>
>> for(...{
>> const & X x = *it; // save a const reference to the orgiinal
>> ar << x
>> }
>
> Will that indeed save a const reference? How will you read const
> reference from an archive?

This works as one would expect - that is all the objects are saved and
tracked separtly.
The const reference doesn't have its own address - taking the address of it
returns the address of the object being refered to - just what we want for
tracking.

>
>> while the following would create an error that would go undetected.
>>
>> for(...{
>> const X x = *it; // create a copy of
>> ar << x
>> }
>
> Probably the right way to fix this is just don't track 'x' here.
> Again, no heap allocated object will have the same address as 'x'.

in this case x is being allocated on the stack - NOT on the heap. And all
the x's have the same stack address.

>
>>> 2. Why the "watch" command on gdb is not a reliable way to catch
>>> such bugs?
>>
>> I'm not sure what the gdb watch command does but I'm sure it doesn't
>> detect compile time errors.
>
> It allows to find all places where a specific memory location is
> modified. So, if you have a bug where a value of an object is
> modified by serialization, it's easy to track this down to specific
> code line.

Well, that would be helpful in finding such an error. But its not as good
as making the error impossible to occur in the first place.

>>>>>> ofstream ofs(av[2]);
>>>>>> boost::archive::binary_oarchive oa(ofs);
>>>>>> oa << *data_builder.finish();
>>

> What kind of context should I provide? This is top-level code, it
> reads some files, passes them to 'data_builder' and then calls
> 'finish' that builds
> the proper data which is then saved.

>>>>>> Basically, it's some heuritic to prevent saving
>>>>>> on-stack object
>>
>> That's one case I would like trap.
>
> Any others?

Well since you bring it up - the & operator totally blows a way this const
checking. It has to do this in order to effectively provide the
functionality for which its intended. So the usage of & would fail to trap
the bugs that << might detect. On the other hand this is not such a problem
as the & operator lends itsefl mostly to situation where the save/load are
exactly symetric and its much harder to (maybe immpossible) to create the
kind of situtation we have here.

Also there are situations in which I could not really enforce the const-ness
as I wanted. Most notable is the case where once uses ar <<
BOOST_SERIALIZATION_NVP(x) . So had you wrapped your x above in this macro
to support xml archives, you would not ahve noticed the issue.

So I can't always catch it, but I catch it where I can.

>> When finish returns something that is not a const it suggests that
>> its returning a pointer to a mutable object. The serialization
>> library presumes
>> that objects don't change in the course of serialization. So
>> passing a non-const conflicts with one of the assumptions made in
>> the implementation of the library.
>

> Ehm... the standard way to solve this is to declare "operator<<" with
> the "const T&" type, just like iostreams do.

We're doing this now. that is

Archive::operator<<(const T & ...)

 But using const indicates that the callee won't mutate the object. It
doesn't require that the object being passed be imutable - which is what
we're trying to check for here. In other words even though a paramter is
declared const - that doesn't require tha the object passed be const. In
fact, the compiler is permitted to create a copy of an non-const and pass it
to a const paramter. I haven't seen a compiler actually do this - but it is
permitted. It would create havoc with the tracking system which is
essential to re-creating archived pointers. The new code follows compilier
rules more carefully and should work even if some compiler makes a copy to
when converting a non-const paramter to a const.

>> If its a temporary you could
>> just
>> as well return an auto_ptr to a const T. But really without more
>> context its hard for me to commment.
>
> 1. I don't want to change my design due to serialization library. It's
> intrusive.

This trap is telling you that maybe something in your design conflicts
serialization in a fundamental way and this should be checked.

> 2. Inside 'finish', I need to work with auto_ptr<T> -- because I
> modify the data.

That's it. when you call finish from with an archive save, you're telling
me that the process of saving is changing the data. The serialization
library implementation (tracking) assume that won't happen. So if your
design is such that this is what you really want to do, then T should be
marked track_never. This will inhibit tracking and suppress the error
message.

Without this checking, this issue would go undetected until runtime - and
during a load at that. And be very difficult to find.

Of course, in the course of reviewing the question, you might conclude that
rather than marking the type "track_never" you might want to make a
different kind of change. Which is fine of course.

My intention is to make it easier to write code that works correctly with
the serialization library.

The interface of the serializtion libary is very simpile - deceptively so.
The intention of including all this checking is to make it harder to
introduce hard to find bugs. This is just one of many examples.

>And conversion from auto_ptr<T> to auto_ptr<const
> T> does not work for me on gcc.

As noted above, there's really more to it than changing a data type. I
raises the question as to what one thinks he wants to conflicts with the
serializiaton implementation. Its the case whenever we use a library.
Sometimes it has requirements which are too onerous and we don't use it.
Other times we make compromises to be able to make use of it.

>>> 1. Why std::string is 'tracked type'?
>>
>> so that storage space isn't wasted storing repeated strings

Also so that serializaton of a pointer to a std::string would function as
expected.

>>> 2. How do you suggest me to fix the above?
>>
>> I did suggest using & instead. But I'm curious to see more context.
>> I'm curious to see more.
>
> Here's the enclosing method:
>
> template<class Archive>
> void save(Archive & ar, unsigned int version) const
> {
> // Serialization has problems with shared_ptr, so use strings.
> ar << boost::lexical_cast<std::string>(*this);
> }
>

LOL - well I would disagree that serialization has problems with
shared_ptr - My view is that shared_ptr has problems with serialization.
The next version will reconcile differing points of view on this subject.
That aside I would expect to use something like the following

template<class Archive>
void save(Archive & ar, unsigned int version) const
{
     // Serialization has problems with shared_ptr, so use strings.
    const std::string s = boost::lexical_cast<std::string>(*this);
     ar << s
}

Maybe the above might be
ar << boost::lexical_cast<const std::string>(*this);

template<class Archive>
void load(Archive & ar, unsigned int version) // note no "const" here
{
     // Serialization has problems with shared_ptr, so use strings.
    std::string s
    ar >> s;
   *this = s; // whatever means
}

Now this could create a tracking problem. So if ths is really what needs to
be done, I would create an wrapper:

struct untracked_string : public std::string {
...
};

and set untracked_string to "track_never"

BTW in the case above we're not serialization a shared_ptr as far as I can
tell so I would expect
to just have

serialize(Archive &ar, unsigned int version){
    ar & ... members
}

Its really odd to me to see something like

    ar << *this

or

  ar << *ptr

which means we're circumventing and maybe re-implementing part of the
serialization library.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk