Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2002-07-16 23:39:10


Dear Vahan,

As I promised, I have considered your original criticism more carefully. Here is my answer. Afterwards you will find your original email that I have included for reference.

Basically, this problem occurs when de-serialization creates a pointer, and an exception is thrown before that pointer can be stored in the de-serialized structure. The results in a memory leak. The one and only place in the serialization library where memory is allocated is in the following code

template<class T>
inline void serialization<T>::load_ptr(
        boost::basic_iarchive & ar, T * &t, boost::version_type file_version
){
        T * t = new T();
        // if an exception occurs before the following returns
        // memory allocated by the above new will be leaked.
        ar >> *t;
}

I propose to replace the above code with the following

template<class T>
inline void serialization<T>::load_ptr(
        boost::basic_iarchive & ar, T * &t, boost::version_type file_version
){
        std::auto_ptr<T> tx(new T());
        // if an exception occurs before the following completes
        // tx will be destroyed by the exception mechanism. This
        // will in turn destroy the element created by the new above
        ar >> *tx;
        // if no exception has occured, save the pointer
        t = tx.get();
        // and set the auto_ptr<T> to NULL
        tx.release();
}

This will guarentee that no memory leaks occur.

Note in order for this to be effective, the constructor ot class T must not leave any contained member pointers uninitialized. That is, any contained member pointers must be initialized to NULL. Example

class T
{
        U *u
        T() : u(NULL){}
        ~T(){
                delete u;
        }
};

In this manner any objects created but not yet deserialized can be destroyed without problem.

I submit that this makes the de-serialization process (i.e.loading) exception-safe. I have considered it in light of your examples below and believe that it addresses all know issues. It requires no extra effort on the part of the user and no change in the serialization library interface.

Please review this and comment. If this is acceptable. I will remove the delete_created_pointers function and add the above change.

Robert Ramey

Date: Wed, 22 May 2002 20:28:22 +0500
From: "Vahan Margaryan" <vahan_at_[hidden]>
To: <boost_at_[hidden]>
Subject: Re: [boost] Serialization Draft Submission # 3 - Final Draft
Message-ID: <001101c201a5$4f3dff90$2709a8c0_at_[hidden]>
References: <01C1FCD0.0A5982C0_at_[hidden]>

Hello Robert,
Have you considered solving the exception-safety implications of the pointer
serialization mechanism? There is a problem when an exception occurs during
loading of objects that are pointed to from several locations.

The problem.
Consider this, I have a class A that holds a single pointer to another
object of class A:

class A{ ...
 A* other_;
};

A doesn't own that pointer - it is passed to it via constructor and is not
deleted in the destructor.

Then I have a class AManager. AManager holds a vector of A*. In one of the
member functions, AManager creates 100 A objects, links them via their
other_ members and pushes them into the vector. In the destructor, AManager
destroys the elements of the vector (thus, it owns them).

What happens when I restore an object of the type AManager and something
wrong happens? It goes like this:
1. AManager tries to load a vector. It tries to read in the first element of
the vector.
2. Loading the first element means loading the other_ member of the first
element.
3. Since all the A objects are linked together, the process loads all the
other A objects too, while still in the context of loading the first object.
All the A objects are allocated.
4. Then the loading function of the first object fails with an exception
(corrupted file, end-of-file, or something else).
5. AManager attempts to destroy the loaded objects by destroying the
elements of the vector. But the vector contains no elements.
6. All the objects allocated while loading AManager are left in memory,
memory leaks result.

A possible solution.
Generally speaking, there's no quick fix for this, AFAIK. The way I solved
it was by indicating whether the pointer was owned by the object.
When an object owns its member pointer, it says so to the loading function:

load(ar, ptr, owns( )); //owns is a type

When the object doesn't own its member pointer, it just says:

load(ar, ptr);

(I use overloading, because the ownership status is usually known at compile
time.)

Now the lookup table that stores pointers that have been new-ed (to check
for multiple references to the same object) also holds a boolean value for
each pointer, that indicates whether the ownership of the given pointer has
been taken.

When an exception occurs, the system iterates over the items in the lookup
table and deletes all the items that don't have the boolean value set (the
others will be erased by their owning objects).

Container issues.
The situation is slightly more complicated for standard containers. Given a
vector<A*>, should the system assume the vector owns its elements or not?
There's no universal answer. Therefore, when loading/saving containers, the
user needs to specify the ownership (except for the non-owning default).
Containers can be several levels deep, like in vector<vector<A*> * >, so the
ownership indicator can be fairly complex.

I would suggest using a typelist as a parameter to load, with elements
owns( ) and noown( ). At each level, the loading function pops off the
topmost type and based on that, performs either owning or non-owning load.
The tail of the typelist is passed further.

The function load(T) calls load(T, noown( ) ).

Aside from this, the container loading functions must be modified to make
sure the currently read object isn't leaked. Thus, instead of:

for( int i = 0; i < size; ++i)
{
    T t;
    load(ar, t, own);
    vector.push_back(t);
}

the function should go:

for(int i = 0; i < size; ++i)
{
    vector.push_back(T( ));
    load(ar, vector.back( ), own);
}

For set/hashset/map/hashmap this is not possible, so a try/catch is needed.

Since passing extra parameters is needed, the >> operator isn't convenient,
so I used load.

Unfortunately, I can't submit any real code implementing the approach (for
legal reasons), but it has been implemented and it works.

Hope that reading this loooooong mail wasn't wasted time :)
Regards,
-Vahan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk