Boost :

Date view	Thread view	Subject view	Author view

From: Robert Ramey (ramey_at_[hidden])
Date: 2005-05-12 11:19:12

Next message: Larry Evans: "[boost] Re: [policy_ptr]source, sink policies proposal"
Previous message: Andrei Alexandrescu (See Website For Email): "[boost] Re: flex_string license"
In reply to: Vladimir Prus: "[boost] Re: [serialization] Regressions?"
Next in thread: Jeff Flinn: "[boost] Re: [serialization] Regressions?"
Reply: Jeff Flinn: "[boost] Re: [serialization] Regressions?"
Reply: Vladimir Prus: "Re: [boost] [serialization] Regressions?"

Vladimir Prus wrote:

> I'll comment on the error later. But the important point is that
> difference from iostream, no matter how much explanation you put in
> docs, will be confusing. Moreover, as soon as users get into habit of
> const_casting serialized things, your protection no longer works.
>
> for(;;) {
> A x = a[i];
> ar << x;
> }
>
> User gets error above changes this to
>
> for(;;) {
> const A x = a[i];
> ar << x; // Oops, error here, change this to
> }
>
> and circumvents your protection. If there's no way to serialize
> non-const object, users will just start using const.

I do mention specifically somewhere in the docs that Archives are NOT
streams. However, I do concede that its natural to make the analogy. In my
view, that supports my idea that the compiler should be used if possible to
detect those cases where the difference between Archives and streams is
important. This is one of those cases. Without this check, a user who
identifies Archives and streames will never, ever find his bug because he
will be looking in the wrong place.

I'm aware that many programmers avoid using "const" because its seems to
slow down development. Using const on a regular basis generates compiler
complaints similar to this one on a regular basis and many just minimize use
of const to avoid this annoyance. In my view this is misguided. By using
const everywhere one can, one is comunicating to the compiler ones view that
a particular object is not expected to be changed by a particular operation.
When the compiler traps an error there are two reactions

a) The compiler is being overly picky - I'll just remove the const.
b) Hmmm - what's going on here - I'm trying to modify something that that I
wouldn't expect to be modified.

picking a) is easier because addressing b) often starts a chain reaction of
"const" problems. I've come to the view that choosing b) makes for better
code with less bugs. I believe also that in the future this will come to be
even more important when we start to use multi-threading more. Also, its a
fact that some of STL algortihms and collections are a little "const"
unfriendly so this means sometimes I have to use a const_cast. But all in
all I've come to strongly believe that picking b) is a much better choice
and my handling of this issue in the serialization library reflects that.

> Ok, at least I understand your intentions now. But again note that if
> this static check triggers in situation users consider save, they'll
> quickly learn to use casts. Everywhere.

Of course your correct on this. But lot's of people drive without seat
belts too. That doesn't mean that the rest of us should be prohibited from
using them.

> I understand the need for tracking. What I don't understand is why
> tracking is enabled when I'm saving *non pointer*.
>
> Say I'm saving object of class X:
>
> X x;
> ar << x;
>
> Then, for reasons I've explained in the previous email, no heap
> allocated object will have the same address as 'x', so no tracking is
> needed at all.

Default trait is non-primitive objects is "track_selectivly" This means
that objects will be tracked if and only anywhere in the source, some object
of this class is serialized through a pointer. So when I'm compiling one
module above and checking at compile time I realy don't know that the object
will in fact be tracked. So I trap unless the serialization trait is set to
"track never". To reiterate, in this case the object won't be tracked
unless somewhere else its serialized as a pointer.

As an aside one might want to track object never serialized as pointers.
That's why there is a serialization trait "track_always". This might occur
where objects might be the objects of references from several instances of
another class:

class a (
X & m_x;
....
};

tracking would guarentee that only one copy of the same X would be written
to the archive - thus saving space. This would be an unusual case but its
supported if necessary.

>
>>> The only problematic case is
>>>
>>> for(...{
>>> X x = *it; // create a copy of
>>> ar << x
>>> }
>>>
>>> X* x = new X:
>>> ar << x;
>>>
>>> where address of newed 'x' is the same as address of saved 'x'. But
>>> this can never happen, because heap memory and stack memory are
>>> distinct.
>>>
>>
>> In the loop, a new value for x is set every iteration through the
>> loop. But
>> the address of x (on the stack) is the same every time. If X is
>> tracked, only the first one will be saved. When the archive is
>> loaded, all the x values will be the same - not what is probably
>> intended. So, the question is what is really intended here and
>> trapping as an error requires that this question be considered.

> Exactly. As I've said above, I believe saves of 'x' inside the loop
> should not do tracking since we're not saving via a pointer.

How do we know that one of the x's saved in the loop is not serialized as a
pointer somewhere else? We have to track ALL x's because we don't know
which ones if any are being tracked somewhere else. It could even be in a
different module.

> This behaviour is strange. In C++ reference almost always acts like
> non-reference type, and it's common to use referece to create
> convenient local alias.

> I'd expect saving of "const &X" to works exactly as saving of "const X".

saving it does. The problem is the

const X x = some_x;

is quite different than

const X & x = some_x

That is creating a reference is altogether different from making a copy of
an object. The strong analogy between these operations and the automatic
invokation of the copy operation constitutes one of the main features of C++
which can be considered both a blessing and curse. On one hand it makes the
language expressive by hiding the natural copies while on the other hand
these hidden copies are a major source of hard to find bugs. In any case,
its out of my hands.

> Yes, they have the same stack address, but it does not matter,
> because we're not saving them via pointer. If we wrote something like:
>
X x; // ramey
> ar << &x;
>
> then the object would have to be tracked. But saving pointer to stack
> object is problematic on its own.

This would definitatly be an error. I did spend a significant amount of
time tweaking the code to detect this but I don't think I was successful.
( I don't remember at this point). That is, in my world I want to trap

X * x_ptr;
ar << x_ptr

while permiting

const X * x_ptr;

to pass unmolested. I don't exactly remember right now but I think I was
inhibited from implementing this at least in all cases.

>>
>> Archive::operator<<(const T & ...)
>>
>> But using const indicates that the callee won't mutate the object.
>> It doesn't require that the object being passed be imutable - which
>> is what we're trying to check for here. In other words even though
>> a paramter is declared const - that doesn't require tha the object
>> passed be const. In fact, the compiler is permitted to create a
>> copy of an non-const and pass it to a const paramter. I haven't seen
>> a compiler actually do this - but it is
>> permitted.
>
> The only case I know about is binding rvalue to const reference.

which is what we're doing here.

>
>> It would create havoc with the tracking system which is
>> essential to re-creating archived pointers. The new code follows
>> compilier rules more carefully and should work even if some compiler
>> makes a copy to when converting a non-const paramter to a const.
>
> Again, when saving non-pointer no tracking should be necessary.
>

see above

>>>> If its a temporary you could
>>>> just
>>>> as well return an auto_ptr to a const T. But really without more
>>>> context its hard for me to commment.
>>>
>>> 1. I don't want to change my design due to serialization library.
>>> It's intrusive.
>>
>> This trap is telling you that maybe something in your design
>> conflicts serialization in a fundamental way and this should be
>> checked.
>
> But I've checked and nothing's wrong. So I either have to modify my
> design -- which I don't want, or add very strange-looking cast.

You have three other options:
a) use & operator instead of <<
b) set the tracking trait to "track_never"
c) tweak your code so the trap is never invoked. (hypothetical only)

By the way the const_cast is a good choice for another reason. Its
specifically flags a case which should be checked if the program has
surprising behavior. Suppose you've checked everything and its what you want
to do so you put in a const_cast to avoid the trap. Then months later you
add a module to your program which serializes a pointer to X. Now your code
is broken in a surprising way. When you start debugging you'll see the
"const_cast" and it might draw your attention so something that should be
checked.

Of course if you had tweaked your code without using the const_cast, then
adding a module wouldn't break the code in any case - but its your decision.

>>> 2. Inside 'finish', I need to work with auto_ptr<T> -- because I
>>> modify the data.
>>
>> That's it. when you call finish from with an archive save, you're
>> telling me that the process of saving is changing the data. The
>> serialization library implementation (tracking) assume that won't
>> happen. So if your design is such that this is what you really want
>> to do, then T should be marked track_never. This will inhibit
>> tracking and suppress the error message.
>>
>> Without this checking, this issue would go undetected until runtime
>> - and during a load at that. And be very difficult to find.
>
> Sorry, probably I did not elaborate enough. The finish code looks
> like:
>
> auto_ptr<Data> finish()
> {
> auto_ptr<Data> result;
> // modify result
> return result;
> }
>
> I modify 'result' inside 'finish', not inside any serialization code.
> And I can't change return type to auto_ptr<const Data> because the
> code won't compile. And
>
> return auto_ptr<const Data>(result.release())
>
> is scary.

Is this any more scary than

const Data * result:

which we are quite comfortable with. In fact I believe

         auto_ptr<const Data> finish()
         {
                 auto_ptr<Data> result;
                 // modify result
                 return result;
         }

expresses your intention quite well. That you're returning an auto_ptr to
an object that you don't expect should be changed by anyone who gets the
pointer this way.

But it also makes me question: What is // modify result doing here? It
seems that result might change every time we call it. This would violate
the assumptions underlying assumptions of the serialization library. So I
would say that we should be using:

         auto_ptr<const Data> some_class::finish() const
         {
                 auto_ptr<Data> result;
                 // modify result
                 return result;
         }

I'm still suspicious of the // modify result. But at least I know that the
class being serialized isn't being modified by the save operation.

> So, there's no bug in my code yet ;-)

That we know about.

>>>>> 2. How do you suggest me to fix the above?
>>>>
>> That aside I would expect to use something like the following
>>
>> template<class Archive>
>> void save(Archive & ar, unsigned int version) const
>> {
>> // Serialization has problems with shared_ptr, so use strings.
>> const std::string s = boost::lexical_cast<std::string>(*this);
>
> I don't like the copy.
>
>> ar << s
>> }
>>
>> Maybe the above might be
>> ar << boost::lexical_cast<const std::string>(*this);
>
> I think this wont work because loading from stream to "const
> std::string" won't compile.

I believe we're doing the opposite here. creating a const std::string (on
the stack) from a non-const one. This is exactly equivalent to the above

const std::string s = boost::lexical_cast<std::string>(*this);
ar << s;

its just that the copy is hidden.

>> template<class Archive>
>> void load(Archive & ar, unsigned int version) // note no "const"
>> here {
>> // Serialization has problems with shared_ptr, so use strings.
>> std::string s
>> ar >> s;
>> *this = s; // whatever means
>> }
>>
>> Now this could create a tracking problem.
>
> Tracking problem here? I'm just saving a string!

note that std::string is unusual in that it is classified as a "primitive"
like int so it is by default "track_never". So the only tracking problem
would be serialization of pointers to strings won't be loaded correctly and
the error will be undetected.

So it won't trap as an error - which could be a problem.

>> So if ths is really what needs
>> to be done, I would create an wrapper:
>>
>> struct untracked_string : public std::string {
>> ...
>> };
>>
>> and set untracked_string to "track_never"
>
> I hope you'll agree that this solution is rather inconvenient.

I do. That's why I prefer one of the other ones above. But it does point
to an interesting issue. The seralization traits to primitives are set to
"track_never". Its sort of an easily remembered hueristic in that usuallly
we wouldn't want to track all ints even if someone want's to serialize a
pointer to an int somewhere in the program.

However, if someone does serialize a pointer to an int, then it won't be
tracked and that could create an error. One option would be to set tracking
for int as "track_selectively". However this would track ALL the serialized
ints which for sure are all over the place - potentially a big efficiency
issue. A better option would be to make a wrapper arond int class
trackable_int which can can set as "track_selectively) which would be in
always the same as int except that it would be tracked. This might be
automated along the lines of BOOST_STRONG_TYPEDEF. In practice this hasn't
come up as an issue though.

>> Its really odd to me to see something like
>>
>> ar << *this
>>
>> or
>>
>> ar << *ptr
>>
>> which means we're circumventing and maybe re-implementing part of the
>> serialization library.
>
> I think that for a split 'save/load' you'll always have to use
> operator<<
> and operator>> of the archive.

not true- insied save/load one can just as well use &

What is odd about the above in my mind is the ptr dereferencing. If we're
already inside the object, why not just serialize the members? doesn't ar
<< *this just make a recurrsive call itself?

I'm curious if anyone else is following this thread. Its getting pretty
deep in the details of the serializaiton library.

Robert Ramey

Next message: Larry Evans: "[boost] Re: [policy_ptr]source, sink policies proposal"
Previous message: Andrei Alexandrescu (See Website For Email): "[boost] Re: flex_string license"
In reply to: Vladimir Prus: "[boost] Re: [serialization] Regressions?"
Next in thread: Jeff Flinn: "[boost] Re: [serialization] Regressions?"
Reply: Jeff Flinn: "[boost] Re: [serialization] Regressions?"
Reply: Vladimir Prus: "Re: [boost] [serialization] Regressions?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk