|
Boost : |
Subject: Re: [boost] [function] "paranoia" vs efficiency
From: Domagoj Saric (dsaritz_at_[hidden])
Date: 2009-10-28 18:34:29
...
... ok ... the fourth attempt in two days ... now i'll try without an
attachment (rar and tgz did not pass) and put the source in the boostpro vault
( home / function objects / new_function_implementation_proposal.tgz ) ...
...
"OvermindDL1" <overminddl1_at_[hidden]> wrote in message
news:3f49a9f40910021215g981da6cxd93c43ac52fee5cc_at_mail.gmail.com...
> That is something I have been curious about since I first looked at
> the Boost.Function source. I was curious why it did not use any real
> policies like most other things in boost
it seems to me rather that there is no "clear cut" policy (pardon the pun :)
about that (the policy approach) in boost...for example the shared_ptr
documentation states that shared_ptr uses a more joe sixpack approach with a
"forced" dynamic deleter on purpose, to achieve a simpler interface and a more
'stable' abi (the point seems moot imho, the same could be achieved with
default template parameters while retaining compile-time configurability)...
> for example one policy could
> enforce it to only use function pointers (no functors) for situations
> where you know it would only be pointers so you could get a speed
> advantage
well if you are going to use only (fixed signature) function pointers there's
no need for boost::function in the first place (or so it seems to me)...
> And I *love* the idea about using a nop static func that throws, or
> perhaps asserts, all depending on policy settings for example. :)
well...here it is..."and more" ;-)
the boost::function documentation states that virtual functions are not used
because they cause code bloat but comparing the actual binaries and code
generated i found boost::function to actually be worse in that respect than the
std::tr1::function implementation provided with msvc++ 9.0 SP1 (that does use
virtual functions)...
...then, looking at the code in more detail i found a number of culprits/things
that could be improved so i rewrote most of it and here is my proposal for a
new boost::function implementation...
{ before continuing just a few terminology definitions (just for this post for
easier reading): looking at the boost/tr1::function<> design/idea we can
identify two 'domains', the "front side" or the interface side or the "fixed
type" or ABI side...the interface that the user sees and the code that exists
and executes _before_ the call through a function pointer in the vtable (e.g.
the code in operator() before the invoke call is made, or code in operator=
before calls to the manager are made ...); ...and the "back side" or that which
stands behind the 'function pointer brick wall' (the actual invoker and the
actual manager), which changes with each assignement, which, at the point of
change or creating has all the type information it wants... }
the change list is as follows:
- there was a lot of functionality in the function_template.hpp (in the main
boost::function<> template class) that was not related/specific to the
particular boost::function<> instantiation and could be extracted into the
non-template function_base base class (and was causing actual bloat as the
compiler/linker did not/could not merge it)...moved as much of such code as
possible into the base header/class
- along with that 'binary code duplication' caused by template code bloat
there was also a certain amount of source code duplication (like same
descisions being made both at the "front side" and the "back side", or type
information managing implemented both in the 'shared' manager and again in each
type specific manager...)
- there is now only a single invoker template (actually two, the void
returning version of the first one)
- there is now only one vtable type/struct (the one in the base header)
- the invoker pointer was moved into the base vtable and was placed at the
beginning of the vtable so that the vtable pointer would point to it directly
(to avoid the pointer offset calculation mentioned in the "[optional] layout"
thread)
- the invoker pointer/signature was changed to be a member function of the
function_buffer so that the thiscall calling convention would be used.
according to this information
http://en.wikipedia.org/wiki/X86_calling_conventions#thiscall this makes no
difference for gcc but does for MSVC (and x86) because now the pointer to the
function_buffer is passed in the ECX register making the stack look exactly (or
more closely, depending on the target) as it needs to be to invoke the target
in the invoker function providing for a more direct call of the target (with no
or less stack adjustment/additional pushing/poping)
- the manager design was changed to allow easier splitting and extracting of
functionality (and in the end for both smaller and faster code)...previous
manager design used a combination of one function pointer and
'message'/'op-code' passing and switching...this, among other things, caused
separate manager code to be generated for every different type of object
assigned to the boost::function<> object because the same function was
'managing' both the (obiously different) type information and the
move/clone/destroy functionality (which is actually shared for a vast majority
of assigned types of objects)...this was changed so that the op-codes were
replaced with an enlarged vtable with separate pointers/functions for move,
clone, destroy and type information functionality...type information
functionality was also changed so that the "back side" only creates an object
containing all the relevant information and passes it to the "front side" that
can then use how and _if_ it needs it (this way the type comparison code for
the target() member function is only generated if it is actually used)
...cv-qualifier information for objects stored as references is also no
longer saved in two run-time bools but is encoded in the type of the "back
side" (so function_buffer::type_t is now typed_functor, and no longer needs to
reside in the function_buffer and function_buffer::obj_ref_t is gone all
together)...
- added more managers (total of 5, previously there were 3) that work with as
little type information as possible (the purpose and implementation should be
obvious from the source and acompanying comments)
- the vtable static initialization was moved outside the vtable_for_functor()
function to ensure that static initialization was actually always used (with
increased numer of function pointers/size of the vtable msvc would start using
lazy initialization...like in a 'normal' "meyers singleton")
- the "has_trivial_copy_and_destroy" optimization and the encoding of that
information into the vtable pointer was removed because in practice it was
actually a pessimization (atleast/especially with the new design)...if it
somehow turns out usefull at some later time it can now be more easily
implemented (e.g. with a larger vtable, the trivial functionality could simply
use null pointers...)...
- assignment functionality was moved from the (no longer present)
boost::function<> instantiation-specific vtable struct (BOOST_FUNCTION_VTABLE)
into the appropriate functor manager classes in the base header
- assignment functionality was also heavily optimized (although there is
still room for improvement)...until now the default 'safe' approach of
<copy-constructing a temporary and calling swap> was always used (which
resulted in hundreds if not a few thousands of asm instructions getting
executed, and was part of the template so it caused bloat)...now this is
done only for function objects that have nontrivial copy-constructors
and/or do not fit in the function_buffer (and is not part of the
template)...for the often case of simple binds and function pointers
assignment (and construction) now results in a call to the nothrow, and most
often trivial, destroy function and a couple of movs...
- empty-invocation handling was changed to no longer place an "if (empty)
throw" section in the operator() (of the template class, thus pretty much
adding to bloat) but to use (assign to itself an) "EmptyHandler" function
object when in the empty state (that then gets invoked and does 'its
thing')...(this also removed the need for the msvc 6.0 workaround for the
in/out of class definition of the operator() function - as it is now only a
thin wrapper it can, actually should, be safely inlined)...
- the three main changes above (managment, assignment and invocation) pretty
much solved the issues of the, what i call, 'bad code bloat'...i.e. the code
that is redundant but does actually intertwine with usefull code and thus gets
executed, pollutes the cache etc...but there is also a different kind of bloat
that is actually probably better called 'data bloat' which only increases the
binary size (maybe perhaps in this case also slows down dynamic_cast
performance) that is in this case caused by rtti
records...boost/tr1::function<> causes rtti records to be created even for non
polymorphic objects because of the forced use of typeid...this can be 'very'
significant for templated function objects (that have very long type
names)...because of this i added BOOST_NO_TYPEID support that disables the type
information functionality...in a real life example where i have only two
boost::function<> instantiations/signatures to which i assign mpl generated
functors...switching from the default/current boost::funtction<> implementation
to this new one using BOOST_NO_TYPEID and the nothrow policy (described later)
a ~300kb binary shrunk by ~30kb (~10%)...
...this (no) type information should preferably be made a matter of policy so
that it can be selectively, not globally, disabled...it could be usefull for
libraries like boost::thread or boost::signals that use boost::function and do
not need type information...then they could "hold on to more type information
for longer" to possibly create more efficient boost::function targets without
fear of code-bloat...
- explicit exception handling was replaced with guard objects
- const's were added, mutable's removed and pointers converted to references
where ever possible
- the generic has_empty_target() function was fixed for MSVC to always use
the ( void const * ) overload because MSVC does not inline vararg functions so
it was always generating calls to a function that only returns false (so it was
unable to remove code that does not execute for the false case)...(perhaps this
- an additional template parameter was added to the main boost::function<>
template class called PolicyList (a list to cut down on the number of
parameters) that is expected to be a boost::mpl::map-like container of
policies. currently two policies exist:
- how to handle invocation of empty function objects (three default
handlers are provided: throw_on_empty, assert_on_empty and nop_on_empty)
- whether the function<> instantiation should 'report'/mark its operator()
as nothrow and accept only nothrow targets (for supported compilers)
- for the nothrow functionality to be any usefull the compiler needs to
provide a way to mark a function as nothrow but without it being a
pessimization as is the case with current exception specifications (e.g. MSVC
__declspec( nothrow ))
- for the nothrow functionality to be safe (not allow assignment of function
objects that can throw to a nothrow marked boost::function<>) we need a way do
detect nothrow functions...i could come up with two ways:
- the 'exception specification shadow type system' (
http://www.gotw.ca/publications/mill22.htm ) could provide a compile-time
method that would work for functions/function pointers...i tried it with comeau
online and it seemed to work ok (it would properly report as a compile time
error an attempt to assign a function with a mismatched exception
specification) but with msvc it, ofcourse, did not...i pretty much 'broke my
teeth' trying to make it work...only to discover that msvc, on the one hand
incorrectly allows a throw(...) marked function to be assigned to a throw()
marked pointer but on the other hand allows a typedef with an exception
specification and "mangles the living hell" out of the type of any function
that has an exception specification (even its builtin __declspec( nothrow ))...
...it actually changes the type of the function (you can see it in compiler
error messages, for example, like "...with T = void (* __cdecl) ( int ) throw()
...") but it does it in some weird and inconsistent way: for example you can
use this type difference to detect/differentiate between throw() marked and
unmarked functions in a template function that accepts a pointer to a function
_but_ only on first invocation of that call...in other words if that function
has a compile-time conditional statement that branches depending if the passed
function pointer is marked as throw() or not only a _single_ implementation of
that function will be generated depending on with what function pointer it was
called for the first time...in other words if we have
void f();
void ff() throw();
template <class F>
bool g( F const & f )
{
if ( <f has throw() decoration> )
return true;
else
return false;
}
g() will always return true if we first call it with g(&ff) (no matter with
what we call it with later) and will always return false if we first call it
with
g(&f)...?!?
...even worse than that is that the type of ff() in the above case is so
'mangled' that it is no longer 'recognized' by even the tr1 type traits
distributed with the compiler (is_pointer, is_function, has_trivial_copy&co.
all return false)...so i had to add a workaround
is_msvc_exception_specified_function_pointer<> 'quasi-trait' to recognize such
objects as functions (otherwise they would be treated as completely generic
objects)...perhaps this or some smarter workaround should be added to
boost::type_traits also...
...perhaps someone with gcc experience could try and make a proper
implementation for a conforming compiler although i don't know if it's worth
it, considering it would only work with plain function pointers and a
conforming compiler would also probably have a conforming implementation of
exception specifications which would render this a pessimization in most
cases...
- the other way or method relies on no "language dark corners" but on the
ability of the compiler optimizer to detect nothrow functions and remove
try-catch blocks around them and then the linker's ability to merge identical
but differently named functions...this method is always safe in that it will
never report that a function object is nothrow if in fact it is not (unless of
course the compiler and/or the linker are so terribly broken but then almost
anything produced by such a compiler/linker would be broken)...it is also
tested to work as expected with msvc 9.0 sp1...but it has the drawbacks that it
is "link-time knowledge" so it works at runtime and thus slightly complicates
the assignement and construction code with the runtime is-throwable check and
adds a certain amount of the "less bad code bloat" (three small functions are
generated for every type of assigned function object, two of which are never
executed, only their addresses are compared, and the third is a 5 instruction
static bool initializer executed only once at startup)
...in any case, if a boost::function<> is instantiated with the nothrow
policy it will use the above tests to accept (on construction and assignement)
only nothrow targets...if it cannot be deduced that a target is nothrow, the
EmptyHandler will be assigned >and then executed< (just my current idea)...
- fixed the BOOST_NO_STD_TYPEINFO for the defined( BOOST_MSVC ) &&
!_HAS_EXCEPTIONS case (seems that MSVC (9.0 SP1) standard headers do not import
type_info into the std namespace when STL exceptions are disabled)...this
should probably be moved into the BOOST_NO_STD_TYPEINFO definition site/header
- allocator support is currently disabled/commented out...i can add it/
properly implement it if the changes so far get accepted (this should probably
be merged into one code base, the default being the use of the
std::alocator...)...
...that would be about it for the most important changes/proposals in
boost::function<> itself...
a few more notes:
- the default interface and behaviour remain the same (current code would work
the same without modification with these changes, "or so it seems to me")
- the empty handler objects are required to satisfy the is_stateless<>
condition
- all function objects used with boost::function<> are expected to have a
nothrow destructor
- the swap and non-trivial assignement function do not offer the strong
exception safety guarantee in the special case when one or both of the objects
at stake is or has (if it is a boost::function<> object "holding" some function
object) a function object that fits in the function_buffer but does not have a
nothrow copy constructor and therefore does not have a no-fail move
operation...in this case it can happen that the final move operation (in the
swap procedure) from the tmp object to *this can fail and then the attempt to
restore (move) the original *this from a backup location can also fail leaving
us with no valid object in *this...in this situation the *this function object
is cleared (the EmptyHandler is assigned)...as far as i could tell the original
boost::function<> implementation had the same behaviour...
- a few more ideas of what can be made a matter of policy: size and alignement
of the function buffer (perhaps some library uses only complex targets so it
would benefit if the function_buffer optimization would be eliminated all
together...on the other some other library would benefit if the buffer would be
better aligned or enlarged by "just one int" and then no memory allocation
would take place for that library ...etc...)
- there is the ever present problem of what to return from empty handlers for
non-void and non-default constructible return types...
...
as far as performance tests are concerned:
- i modified the test code presented here
http://marc.info/?l=boost&m=118835709647084&w=2 to work with boost::signals2
with the dummy mutex/no threading support and a simple recompilation with the
new boost::function<> code yielded a ~10% boost, changing the default policy to
be nothrow the boost doubled to ~20%...i suspect this to be because of the
temporary shared_ptr() that is constructed prior to the call (and is imho
probably unnecessary anyway in the single threaded case)...in the generic
'throwable' case the compiler had to insert EH code that would call the
shared_ptr<> destructor in case of an exception...
- apart from the EH issue mentioned above a performance boost can come from:
- the simpler invoker "back side"/setup code for compilers that benefit from
the thiscall change (which becomes irrelevant for function object targets that
get inlined into the invoker anyway)
- the simpler invoker "front side"/operator() which is mostly benefitial for
situations like the one described above where you make a single call to a
boost::function<> in a non-inlined enclosing function...in situations where a
boost::function<> is called in a tight loop the difference becomes much less
noticable, like 1-2% (atleast with msvc 9.0 that is able to move most of the
previous operator() complexities out of the main loop path)
- i suspect that copy/assignement would be faster in orders of magnitude for
most real world cases but didn't have time to test it...
- space improvements/issues are probably best examined/appreciated with a
profiler capabale of static analysis (to compare the amount of template code
generated)
there is also one more important source of performance, as is often the case,
the use of static/compile-time information..with function pointer/bind
expression targets...
as the boost::function<> documentation mentiones using/assigning function
pointers adds another indirect call...but we are usually assigning pointers to
exact/specific/known at the assign site functions...if we could somehow assign
them using a template parameter that one extra indirect call would be
gone (additionally the compiler could perform the nothrow analysis on the
compile-time know function target, which it cannot on a plain pointer)...the
problem is we need the type in advance to have a non-type template
parameter...so we have to resort to tedious things like bf.assign(
&f ).static_assign<&f>() (don't know if constexpr-essions would help here)...
...but with boost::function<> we already know the type for the trivial case of
assignment of a global function with an exact signature, and can also trivialy
extend it for the case of member function that matches the exact signature...
...that's why i added two template overloads for the assign member function so
the following can compile and work properly:
class A
{
void mem_func( int );
}
void global_func( int );
void main()
{
boost::function<void (int)> bf;
bf.assign<&global_func>();
bf( 1 );
A a;
bf.assign<A, &A::mem_func>( a );
bf( 2 );
}
using 'static assignment' the difference between old and new boost::function<>
can be even up to two calls less...for example (depending on compiler switches
and day of the month) the call stack in the first case (that would then
ofcourse use the dynamic bf.assign( &global_func )) above could look like this:
{
global_func()
bf::invoker()
bf::operator()
main()
}
for the previous/current official implementation
and like this:
{
global_func()
main()
}
with the new implementation...
motivated by the above i also added a proposal/extension for boost::ref() (here
called boost::sref() with a helper macro BOOST_SREF) to offer a generic way for
creating static references as well as a 'hack modification' (inclusion into a
test namespace with different configuration macors) of the mem_fn template that
can store/use/work with the above generated static references to member
functions...
...ok...enough for now...hope this sweat turns out useful ;)
ps. didn't use/post patch files deliberately because the changes are too big...
-- "That men do not learn very much from the lessons of history is the most important of all the lessons of history." Aldous Huxley
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk