Boost logo

Boost :

From: David Abrahams (abrahams_at_[hidden])
Date: 2001-03-24 19:44:27


A number of things have come at once which argue for a rework of part of the
Boost.Python library. Since these changes would affect the existing library
interface, I'd like to discuss possible approaches here before doing
anything.

A Missing Feature [Ullrich and Ralf are already familiar with this part]
=================

One issue is the need to be able to return references and pointers to the
internals of classes exposed to Python. For example, if the following
classes
are exported to Python:

    struct inner {
      void set_x(int x) { this->x = x; }
      int get_x() const { return x; }
      int x;
    };

    struct outer {
       inner& get_inner() { return b; }
       inner b;
    };

We should be able to do the following:

>>> o = outer()
>>> o.get_inner().set_x(1)
>>> assert o.get_inner().get_x() == 1

[This ought to work identically if outer::get_inner() returned an inner*
instead
of an inner&]

It seems rather obvious at first that this should work, but getting it to
work
safely is problematic. The problem occurs in wrapping outer::get_inner(): we
need to
return an object which acts just like a inner instance created from Python
via

>>> i = inner()

but which contains only a reference to the inner embedded in outer instead
of a new
inner instance. Although we have the technology to do this, it is
problematic
because the lifetime of the inner instance is tied to the lifetime of the
outer
instance:

>>> o = outer()
>>> i = o.get_inner()
>>> del o # its inner is gone, now, too
>>> i.get_x() # crash!!

The obvious fix for this case is to have i manage a reference-count on o. In
other words, the reference-count on o is incremented when get_inner()
returns,
and it is decremented when the return value is finally destroyed.

It is tempting to think we should automatically manage a reference-count on
the
"self" object whenever a function returning a reference is wrapped, but this
approach doesn't generalize well:

1. A member function may return a reference into one of its arguments:

   // unlikely but plausible example
   struct outer {
      inner& get_other_inner(outer& other)
        { return other.b; }
   };

2. Functions at namespace scope can also return references:

   inner& get_inner2(outer& x) { return x.get_inner(); }

So we need a way to tell Boost.Python how a function wrapper should manage
its
return value. I suggest the following interface which I have already
discussed
a bit with Ullrich and Ralf:

   outer_class_builder.def(boost::python::owned_by_arg(0),
                           &outer::get_inner, "get_inner");

Since the "self" parameter is implicitly passed as argument zero, this will
cause a reference count on the self parameter to be managed by the resulting
object.

This usage suggests a generalization wherein an option initial parameter to
def() could be used to indicate that a pointer return value should be
completely
managed by Python, allowing us to conveniently wrap factory functions as
recently requested by Karl Bellve:

    inner* make_inner() { return new inner; }
    ...
    my_module_builder.def(boost::python::factory_function,
                          make_inner, "make_inner");

An Existing Bug
===============

How should we convert the return values of wrapped functions to python?

* For types that have an immutable analogue in Python (int, std::string) the
  answer is simple: values and const references should be converted to the
  corresponding Python type, but an attempt to wrap a function returning a
  nonconst reference should not compile (presumably, the intention of such a
  function is to eventually allow the caller to modify the referent).

* For class types with an accessible copy constructor that have been exposed
to
  Python with a class_builder, we can convert returned values by copying
them
  into a new extension class instance. The existing Boost.Python strategy is
to
  convert const references the same way as values. This strategy has two
  problems:

  1. The referent is copied unneccessarily.

  2. It is not exactly equivalent to returning a value; even though the
returned
     reference is const, another function is free to modify the value of the
     referent:

     struct inner2 {
        inner2() : x(0) {}
        void get_x() { return this->x; }
        int x;
     };

     struct outer2 {
        const inner2& get_inner() { return this->y; }
        void set_x(int x) { this->y.x = x; }
        inner y;
     };

>>> o = outer2()
>>> i = o.get_inner() # copies o.y into a new extension instance
>>> o.set_x(1) # affects o.y, but not i
>>> i.get_x() # expecting to see 1 here!
     0

  With the approach outlined above under "A Missing Feature", both of the
above
  problems could be avoided. The downside is that existing code wrapping
  functions that return references would have to change to use the
  owned_by_arg(n) syntax. My opinion is that it's worth it, but I would like
to
  hear if there are any objections.

* Boost.Python was intended to prevent the wrapping of functions returning
  non-const references to class types, for reasons already discussed. This
is
  where the bug occurs.

  The mechanism that converts return values of wrapped functions to python
is
  simple: the function's return value is passed to the overloaded to_python
  function, and the resulting PyObject* is passed on to Python. The problem
with
  this strategy is that if you allow the conversion of values or const
  references to python, you also allow the conversion of non-const
  references. This occurs due to the ordinary C++ type conversion rules. In
  other words:

  PyObject* to_python(inner);
  PyObject* to_python(const outer&);
  void f(inner& i, outer& o)
  {
     PyObject* p1 = to_python(i); // if inner is copyable, this compiles!
     PyObject* p2 = to_python(o); // compiles no matter what!
  }

  As Charlie Barrows discovered in
  http://groups.yahoo.com/group/boost/message/10024, the result is that a
  returned non-const reference is silently treated just like a returned
value or
  const reference (the referent is copied into a new object).

Possible Fixes
==============

The most conservative approach I can think of (short of not fixing the bug!)
goes like this:

  1. Change return-value handling so that to_python is passed an additional
     parameter of type boost::type<R>, where R is the return type of the
wrapped
     function. By carefully generating only to_python overloads for the
types we
     want to implicitly convert to python (i.e. without using
owned_by_arg(n)),
     we can control which types get converted. This works because
     boost::type<R&> is not implicitly converted to boost::type<R> or
     boost::type<R const&>. We would then have a nice symmetry between the
     functions used for conversion:
        from_python(PyObject*, type<T>)
        to_python(const T&, type<T>)

        from_python(PyObject*, type<const T&>)
        to_python(const T&, type<const T&>)

  2. Retain the single-argument version of to_python() as a template
     function. Why? Because the availability of the single-argument
to_python
     has already been exposed to users, who may be using it for a variety of
     things. It would look something like this:

     template <class T>
     PyObject* to_python(const T& x)
     {
       return to_python(x, boost::type<T>());
     }

     [Lots of gory detail omitted here, since we need additional
metaprogramming
     to deal with noncopyable types as described in the declaration of
     python_extension_class_converters in
boost/python/detail/extension_class.hpp]

  This fix breaks any user code that supplies customized single-argument
  to_python() functions, since the function return mechanism will now be
  searching for a two-argument version.

A more radical possibility would be to completely change the conversion
mechanism to use converter objects instead of functions:

  Ullrich proposed this approach months ago, and even coded up a sample
  implementation (though as I recall it requires partial specialization). It
  would allow us to stop using C++ exception-handling to deal with argument
type
  errors and overload resolution. There are two disadvantages to using EH
for
  this purpose:

    1. Using EH for overload resolution causes exceptions to be handled when
       there's really no error, potentially slowing down function
dispatching.

    2. Some compilers (e.g. GCC 2.95.2) still have EH bugs which can cause
leaks
       or, presumably, worse problems.

  On the other hand, I see several potential downsides:

    1. Complexity: Each converter object would have to contain
suitably-aligned
       uninitialized storage for an instance of the type being converted
       from_python in addition to a pointer.

    2. Code size: It looks to me like it would generate a lot more inline
code
       than the current approach, since we would have do deal with explicit
       conversion error checking in addition to any exceptions thrown by the
       wrapped function which we'd have to handle anyway.

    3. Incompatibility: as a more radical change, it would probably break
more
       user code - customized from_python functions would no longer work
either.

I'd like to get some feedback from the user and developer communities about
these approaches. If there are alternatives, I'd love to hear them.

Regards,
Dave


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk