Boost logo

Boost :

From: David Abrahams (abrahams_at_[hidden])
Date: 2001-03-24 19:44:27

A number of things have come at once which argue for a rework of part of the
Boost.Python library. Since these changes would affect the existing library
interface, I'd like to discuss possible approaches here before doing

A Missing Feature [Ullrich and Ralf are already familiar with this part]

One issue is the need to be able to return references and pointers to the
internals of classes exposed to Python. For example, if the following
are exported to Python:

    struct inner {
      void set_x(int x) { this->x = x; }
      int get_x() const { return x; }
      int x;

    struct outer {
       inner& get_inner() { return b; }
       inner b;

We should be able to do the following:

>>> o = outer()
>>> o.get_inner().set_x(1)
>>> assert o.get_inner().get_x() == 1

[This ought to work identically if outer::get_inner() returned an inner*
of an inner&]

It seems rather obvious at first that this should work, but getting it to
safely is problematic. The problem occurs in wrapping outer::get_inner(): we
need to
return an object which acts just like a inner instance created from Python

>>> i = inner()

but which contains only a reference to the inner embedded in outer instead
of a new
inner instance. Although we have the technology to do this, it is
because the lifetime of the inner instance is tied to the lifetime of the

>>> o = outer()
>>> i = o.get_inner()
>>> del o # its inner is gone, now, too
>>> i.get_x() # crash!!

The obvious fix for this case is to have i manage a reference-count on o. In
other words, the reference-count on o is incremented when get_inner()
and it is decremented when the return value is finally destroyed.

It is tempting to think we should automatically manage a reference-count on
"self" object whenever a function returning a reference is wrapped, but this
approach doesn't generalize well:

1. A member function may return a reference into one of its arguments:

   // unlikely but plausible example
   struct outer {
      inner& get_other_inner(outer& other)
        { return other.b; }

2. Functions at namespace scope can also return references:

   inner& get_inner2(outer& x) { return x.get_inner(); }

So we need a way to tell Boost.Python how a function wrapper should manage
return value. I suggest the following interface which I have already
a bit with Ullrich and Ralf:

                           &outer::get_inner, "get_inner");

Since the "self" parameter is implicitly passed as argument zero, this will
cause a reference count on the self parameter to be managed by the resulting

This usage suggests a generalization wherein an option initial parameter to
def() could be used to indicate that a pointer return value should be
managed by Python, allowing us to conveniently wrap factory functions as
recently requested by Karl Bellve:

    inner* make_inner() { return new inner; }
                          make_inner, "make_inner");

An Existing Bug

How should we convert the return values of wrapped functions to python?

* For types that have an immutable analogue in Python (int, std::string) the
  answer is simple: values and const references should be converted to the
  corresponding Python type, but an attempt to wrap a function returning a
  nonconst reference should not compile (presumably, the intention of such a
  function is to eventually allow the caller to modify the referent).

* For class types with an accessible copy constructor that have been exposed
  Python with a class_builder, we can convert returned values by copying
  into a new extension class instance. The existing Boost.Python strategy is
  convert const references the same way as values. This strategy has two

  1. The referent is copied unneccessarily.

  2. It is not exactly equivalent to returning a value; even though the
     reference is const, another function is free to modify the value of the

     struct inner2 {
        inner2() : x(0) {}
        void get_x() { return this->x; }
        int x;

     struct outer2 {
        const inner2& get_inner() { return this->y; }
        void set_x(int x) { this->y.x = x; }
        inner y;

>>> o = outer2()
>>> i = o.get_inner() # copies o.y into a new extension instance
>>> o.set_x(1) # affects o.y, but not i
>>> i.get_x() # expecting to see 1 here!

  With the approach outlined above under "A Missing Feature", both of the
  problems could be avoided. The downside is that existing code wrapping
  functions that return references would have to change to use the
  owned_by_arg(n) syntax. My opinion is that it's worth it, but I would like
  hear if there are any objections.

* Boost.Python was intended to prevent the wrapping of functions returning
  non-const references to class types, for reasons already discussed. This
  where the bug occurs.

  The mechanism that converts return values of wrapped functions to python
  simple: the function's return value is passed to the overloaded to_python
  function, and the resulting PyObject* is passed on to Python. The problem
  this strategy is that if you allow the conversion of values or const
  references to python, you also allow the conversion of non-const
  references. This occurs due to the ordinary C++ type conversion rules. In
  other words:

  PyObject* to_python(inner);
  PyObject* to_python(const outer&);
  void f(inner& i, outer& o)
     PyObject* p1 = to_python(i); // if inner is copyable, this compiles!
     PyObject* p2 = to_python(o); // compiles no matter what!

  As Charlie Barrows discovered in, the result is that a
  returned non-const reference is silently treated just like a returned
value or
  const reference (the referent is copied into a new object).

Possible Fixes

The most conservative approach I can think of (short of not fixing the bug!)
goes like this:

  1. Change return-value handling so that to_python is passed an additional
     parameter of type boost::type<R>, where R is the return type of the
     function. By carefully generating only to_python overloads for the
types we
     want to implicitly convert to python (i.e. without using
     we can control which types get converted. This works because
     boost::type<R&> is not implicitly converted to boost::type<R> or
     boost::type<R const&>. We would then have a nice symmetry between the
     functions used for conversion:
        from_python(PyObject*, type<T>)
        to_python(const T&, type<T>)

        from_python(PyObject*, type<const T&>)
        to_python(const T&, type<const T&>)

  2. Retain the single-argument version of to_python() as a template
     function. Why? Because the availability of the single-argument
     has already been exposed to users, who may be using it for a variety of
     things. It would look something like this:

     template <class T>
     PyObject* to_python(const T& x)
       return to_python(x, boost::type<T>());

     [Lots of gory detail omitted here, since we need additional
     to deal with noncopyable types as described in the declaration of
     python_extension_class_converters in

  This fix breaks any user code that supplies customized single-argument
  to_python() functions, since the function return mechanism will now be
  searching for a two-argument version.

A more radical possibility would be to completely change the conversion
mechanism to use converter objects instead of functions:

  Ullrich proposed this approach months ago, and even coded up a sample
  implementation (though as I recall it requires partial specialization). It
  would allow us to stop using C++ exception-handling to deal with argument
  errors and overload resolution. There are two disadvantages to using EH
  this purpose:

    1. Using EH for overload resolution causes exceptions to be handled when
       there's really no error, potentially slowing down function

    2. Some compilers (e.g. GCC 2.95.2) still have EH bugs which can cause
       or, presumably, worse problems.

  On the other hand, I see several potential downsides:

    1. Complexity: Each converter object would have to contain
       uninitialized storage for an instance of the type being converted
       from_python in addition to a pointer.

    2. Code size: It looks to me like it would generate a lot more inline
       than the current approach, since we would have do deal with explicit
       conversion error checking in addition to any exceptions thrown by the
       wrapped function which we'd have to handle anyway.

    3. Incompatibility: as a more radical change, it would probably break
       user code - customized from_python functions would no longer work

I'd like to get some feedback from the user and developer communities about
these approaches. If there are alternatives, I'd love to hear them.


Boost list run by bdawes at, gregod at, cpdaniel at, john at