|
Boost : |
From: David Abrahams (abrahams_at_[hidden])
Date: 2001-03-24 19:44:27
A number of things have come at once which argue for a rework of part of the
Boost.Python library. Since these changes would affect the existing library
interface, I'd like to discuss possible approaches here before doing
anything.
A Missing Feature [Ullrich and Ralf are already familiar with this part]
=================
One issue is the need to be able to return references and pointers to the
internals of classes exposed to Python. For example, if the following
classes
are exported to Python:
struct inner {
void set_x(int x) { this->x = x; }
int get_x() const { return x; }
int x;
};
struct outer {
inner& get_inner() { return b; }
inner b;
};
We should be able to do the following:
>>> o = outer()
>>> o.get_inner().set_x(1)
>>> assert o.get_inner().get_x() == 1
[This ought to work identically if outer::get_inner() returned an inner*
instead
of an inner&]
It seems rather obvious at first that this should work, but getting it to
work
safely is problematic. The problem occurs in wrapping outer::get_inner(): we
need to
return an object which acts just like a inner instance created from Python
via
>>> i = inner()
but which contains only a reference to the inner embedded in outer instead
of a new
inner instance. Although we have the technology to do this, it is
problematic
because the lifetime of the inner instance is tied to the lifetime of the
outer
instance:
>>> o = outer()
>>> i = o.get_inner()
>>> del o # its inner is gone, now, too
>>> i.get_x() # crash!!
The obvious fix for this case is to have i manage a reference-count on o. In
other words, the reference-count on o is incremented when get_inner()
returns,
and it is decremented when the return value is finally destroyed.
It is tempting to think we should automatically manage a reference-count on
the
"self" object whenever a function returning a reference is wrapped, but this
approach doesn't generalize well:
1. A member function may return a reference into one of its arguments:
// unlikely but plausible example
struct outer {
inner& get_other_inner(outer& other)
{ return other.b; }
};
2. Functions at namespace scope can also return references:
inner& get_inner2(outer& x) { return x.get_inner(); }
So we need a way to tell Boost.Python how a function wrapper should manage
its
return value. I suggest the following interface which I have already
discussed
a bit with Ullrich and Ralf:
outer_class_builder.def(boost::python::owned_by_arg(0),
&outer::get_inner, "get_inner");
Since the "self" parameter is implicitly passed as argument zero, this will
cause a reference count on the self parameter to be managed by the resulting
object.
This usage suggests a generalization wherein an option initial parameter to
def() could be used to indicate that a pointer return value should be
completely
managed by Python, allowing us to conveniently wrap factory functions as
recently requested by Karl Bellve:
inner* make_inner() { return new inner; }
...
my_module_builder.def(boost::python::factory_function,
make_inner, "make_inner");
An Existing Bug
===============
How should we convert the return values of wrapped functions to python?
* For types that have an immutable analogue in Python (int, std::string) the
answer is simple: values and const references should be converted to the
corresponding Python type, but an attempt to wrap a function returning a
nonconst reference should not compile (presumably, the intention of such a
function is to eventually allow the caller to modify the referent).
* For class types with an accessible copy constructor that have been exposed
to
Python with a class_builder, we can convert returned values by copying
them
into a new extension class instance. The existing Boost.Python strategy is
to
convert const references the same way as values. This strategy has two
problems:
1. The referent is copied unneccessarily.
2. It is not exactly equivalent to returning a value; even though the
returned
reference is const, another function is free to modify the value of the
referent:
struct inner2 {
inner2() : x(0) {}
void get_x() { return this->x; }
int x;
};
struct outer2 {
const inner2& get_inner() { return this->y; }
void set_x(int x) { this->y.x = x; }
inner y;
};
>>> o = outer2()
>>> i = o.get_inner() # copies o.y into a new extension instance
>>> o.set_x(1) # affects o.y, but not i
>>> i.get_x() # expecting to see 1 here!
0
With the approach outlined above under "A Missing Feature", both of the
above
problems could be avoided. The downside is that existing code wrapping
functions that return references would have to change to use the
owned_by_arg(n) syntax. My opinion is that it's worth it, but I would like
to
hear if there are any objections.
* Boost.Python was intended to prevent the wrapping of functions returning
non-const references to class types, for reasons already discussed. This
is
where the bug occurs.
The mechanism that converts return values of wrapped functions to python
is
simple: the function's return value is passed to the overloaded to_python
function, and the resulting PyObject* is passed on to Python. The problem
with
this strategy is that if you allow the conversion of values or const
references to python, you also allow the conversion of non-const
references. This occurs due to the ordinary C++ type conversion rules. In
other words:
PyObject* to_python(inner);
PyObject* to_python(const outer&);
void f(inner& i, outer& o)
{
PyObject* p1 = to_python(i); // if inner is copyable, this compiles!
PyObject* p2 = to_python(o); // compiles no matter what!
}
As Charlie Barrows discovered in
http://groups.yahoo.com/group/boost/message/10024, the result is that a
returned non-const reference is silently treated just like a returned
value or
const reference (the referent is copied into a new object).
Possible Fixes
==============
The most conservative approach I can think of (short of not fixing the bug!)
goes like this:
1. Change return-value handling so that to_python is passed an additional
parameter of type boost::type<R>, where R is the return type of the
wrapped
function. By carefully generating only to_python overloads for the
types we
want to implicitly convert to python (i.e. without using
owned_by_arg(n)),
we can control which types get converted. This works because
boost::type<R&> is not implicitly converted to boost::type<R> or
boost::type<R const&>. We would then have a nice symmetry between the
functions used for conversion:
from_python(PyObject*, type<T>)
to_python(const T&, type<T>)
from_python(PyObject*, type<const T&>)
to_python(const T&, type<const T&>)
2. Retain the single-argument version of to_python() as a template
function. Why? Because the availability of the single-argument
to_python
has already been exposed to users, who may be using it for a variety of
things. It would look something like this:
template <class T>
PyObject* to_python(const T& x)
{
return to_python(x, boost::type<T>());
}
[Lots of gory detail omitted here, since we need additional
metaprogramming
to deal with noncopyable types as described in the declaration of
python_extension_class_converters in
boost/python/detail/extension_class.hpp]
This fix breaks any user code that supplies customized single-argument
to_python() functions, since the function return mechanism will now be
searching for a two-argument version.
A more radical possibility would be to completely change the conversion
mechanism to use converter objects instead of functions:
Ullrich proposed this approach months ago, and even coded up a sample
implementation (though as I recall it requires partial specialization). It
would allow us to stop using C++ exception-handling to deal with argument
type
errors and overload resolution. There are two disadvantages to using EH
for
this purpose:
1. Using EH for overload resolution causes exceptions to be handled when
there's really no error, potentially slowing down function
dispatching.
2. Some compilers (e.g. GCC 2.95.2) still have EH bugs which can cause
leaks
or, presumably, worse problems.
On the other hand, I see several potential downsides:
1. Complexity: Each converter object would have to contain
suitably-aligned
uninitialized storage for an instance of the type being converted
from_python in addition to a pointer.
2. Code size: It looks to me like it would generate a lot more inline
code
than the current approach, since we would have do deal with explicit
conversion error checking in addition to any exceptions thrown by the
wrapped function which we'd have to handle anyway.
3. Incompatibility: as a more radical change, it would probably break
more
user code - customized from_python functions would no longer work
either.
I'd like to get some feedback from the user and developer communities about
these approaches. If there are alternatives, I'd love to hear them.
Regards,
Dave
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk