|
Boost : |
From: Aaron W. LaFramboise (aaronrabiddog51_at_[hidden])
Date: 2005-03-06 20:29:09
Over the past few years, the style of our abstract interfaces for
system-dependent features has bothered me more and more. Heres my
challenge to those happy with the status quo: Find me an interface in
Boost or the C++ standard that represents an observable inter-program
interface, such that, for any popular operating system, I cant find a
feature of the operating system that is impossible to express using that
interface without relying on undefined behavior, undocumented behavior,
or circumventing access specifiers.
First, a realistic example is in order. We all know and love C++
standard iostreams (iostreams library,
<http://www.cplusplus.com/ref/iostream>), which is almost certainly the
most widespread operating system abstraction in the C++ world. So we
create a std::ifstream, and use it to open some file. Now we want to
map this file to a region of memory. Do what?! Suddenly, everyone
breaks out in laughter, with tears rolling down their cheeks, little
globules of spittle escaping from the corners of their mouths,
ridiculing the very notion of any standard library being involved in
such a reasonable and common operation.
But why? The operating system can do this, and its not incompatible in
the slightest with the concept or interface of std::ifstream. In fact,
if we had some way of stealing the secret file descriptor stashed away
inside the filebuf, we could do itbut programmers, deep down, feel that
this sort of access just isnt right. Grab some popcorn and watch
programmers sweat and squirm in their own cognitive dissonance as they
try to find an elegant solution to this insolvable problem (Legalize
access to file descriptors now!
<http://gcc.gnu.org/ml/libstdc++/2005-02/msg00090.html>).
Let me get straight to the heart of the problem: we love our concrete
interfaces. Every programmer loves the ability to be able to manipulate
a class, knowing its complete interface, exactly what sort of animal it
is, and exactly what its going to doand know all of this before the
program ever runs for the first time.
I hope Im not bursting anyones bubble here, but if when were talking
about components that abstract system interfaces, then this just isnt
the way the world works. Its time we grow up, get out of the playpen,
and realize that there really is a big scary world of polymorphism out
there.
My core belief is this. We dont fully know the capabilities of the
system components we manipulate, not at compile time, not at load time,
and perhaps not even by the time the program halts. Any interface that
pretends to is a lie, and most of the time, a pretty bad liar at that.
Heres another one of my favorite examples, in Boost.Threads
(Boost.Threads, <http://www.boost.org/doc/html/threads.html>).
boost::thread lacks a method to forcefully terminate a thread, despite
the fact that many threading systems have one. However, we cant add
one, because there exists at least one threading system that doesnt
have this feature. Well, we *could*, but then wed be playing a game of
chicken with the user daring them to call a function that has completely
undecidable behavior. Now lets say the user was really determined get
this feature, and so she decided to write her own thread class. Nope,
she loses again! Because her class is not named boost::thread, the
class is incompatible with all of the rest of the thread manipulation
functions, and so is entirely unusable with Boost.Threads.
Do you like scary movies? I have an idea for one. Its about a future
C++ standard that includes a threading library that I cant use if I
want forced termination semantics, or any other feature that any
operating system has that the library lacks.
Besides the inability of our concrete interfaces to support a capability
set that varies, and the lack of ability to reimplement these interfaces
compatibly, concrete schemes also cant represent what I call
/multi-interface systems/. Some environments, such as Cygwin (Cygwin
Information and Installation, <http://www.cygwin.com>), have two (or
more) entirely distinct sets of system interfaces that may be used as
underlying primitives for a particular concept. In the case of Cygwin
and threads, the environment supports both the POSIX and Windows thread
interfaces. Both may be used simultaneously, and as each system has its
own unique characteristics, it may be entirely reasonable to do so.
But, Boost.Threads cant do this, and really couldnt possibly be
expected to, in its current form.
Note that the total set of capabilities is not decidable at compile
time, or even load time. For instance, lets say were using a process
class on a System V-style system. Were implementing a debugger, and so
wed like to get access to the process core memory through the /proc
interface. However, up until we actually try to do so, we really have
no way of knowing whether this is supported, as the /proc filesystem may
well not be mounted.
Im calling for polymorphism. These interfaces really are conceptually
polymorphic; lets reflect that in our language. Lets give the user
the tools she needs to be able to write her own compatible classes when
the ones we write prove insufficient.
For the sake of exposition, let me propose a sketch of a possible design
for a process class. A generic process is represented by an abstract
base class. Derived from it are classes for the major types of
processes: POSIX, Windows, DCE, and whatever else. Derived from each of
those are specific variants of these types, with additional or extended
capabilities. For example, a child of the POSIX class might be a class
implementing the ptrace() process debugging interface.
Naturally, wed instantiate objects of these classes with some factory.
Since we might not know at creation time exactly what capabilities the
system has, or what capabilities are needed, we need a copying mechanism
to construct a new process from an old one. This copy might, for
example, copy the POSIX base, so as to get the process identifier, and
slice the rest off, as unneeded. Clearly there would need to be
significant design effort put into this area. This and other
implementation issues are mostly tangential from my primary concerns.
Finally, pointers are a pain, so we can wrap pointers to a process in
your favorite smart pointer. This smart pointer might have additional
associated mechanics to enable it to automatically select the best kind
of process creation machinery as it is able. I could see this basic
process class (which is practically concrete) as having a very pleasant
and desirable syntax.
Lastly, performance needs to be addressed. Im not at all worried about
making operations that would be normal function calls into virtual
function calls, and you shouldnt be, either. An indirect call will
often have a cost an order of magnitude less than the cost of the actual
underlying system operation, which might involve many more indirect
calls, context switches, and synchronization. A more significant
concern is the additional machinery needed to support virtual
inheritance. RTTI may also be necessary to fully exercise the class
hierarchys capabilities. However, when using the subset of features
that would be available to an equivalent concrete implementation, RTTI
and similar should not be needed, so a user shouldnt have to pay (too
much) for what shes not using.
So heres my question to the Boost community. How many people have
similar concerns and experiences? How often, in real code, do concrete
classes prove insufficient? Who here has to entirely reimplement
libraries like Boost.Threads for relatively silly reasons? Are there
any alternate solutions for the problem I describe? What unforeseen
problems might there be with this polymorphic style? Would you use a
library such as Boost.Thread if it had been rewritten in this manner?
Submitted with Love,
Aaron W. LaFramboise
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk