Boost logo

Boost :

From: Daniel Frey (d.frey_at_[hidden])
Date: 2002-07-03 02:34:22


Hi Boosters,

I'd like to propose two changes to boost/operators.hpp:

----- Provide a safer return type -----

The first proposal is the change of the return type of most operators
from 'T' to 'const T'. Without the change, the following code would be
legal:

  (a+b) = c;
  a++++;

Most probably, this is not what the user would expect. The first one
assigns 'c' to the temporary result of 'a+b', which is mostly
useless. It usually happens as a accident when you want to write
something like '(a+b) == c'. The second statement may look as the
logical equivalent to the allowed and useful '++++a', but it doesn't
increment 'a' twice, as the second ++ increments the temporary object
of the first ++ after copying it. Both the result and the effect of
the statement is not what you probably had in mind when writing it.

See also Scott Meyers "More Efficient C++", Item 2.2

----- NRVO-friendly implementation -----

The second change affects the way of implementing operators to allow
compilers to apply the NRVO (named return value optimization). The
current implementation yields two problems: It doesn't allow the most
efficient implementation and it is not symmetric. From the theory,
when defining an operator+ (or -, *, / etc.) you need exactly one new
object to hold the result. It should be our goal to provide an
implementation that allows (good) compilers to build code that doesn't
yield any additional intermediate objects. Now, let's look at the
current implementation style and see what's wrong with it:

friend const T operator+( T lhs, const T& rhs )
{
   return lhs += rhs;
}

This looks clean, fast and beautiful. But the parameter 'lhs' is
copied, the operation is applied to the copy and the result
is... copied! This is required, because the copy of the parameter is
done by the caller of the operator, rather then the callee. Also, this
may have side-effects and the standard forbids optimizations that may
change the observable result of the code. The standard also names
several optimizations that are explicitly allowed to violate this
rule, but here, none of these exceptions applies.

When the above operator+ is used to calculate the expression 'a+b+c',
there is an optimization made. The result of the first sub-expression
'a+b' is passed as the first parameter for the second call directly,
without copying it. Thus, one intermediate object is optimized
away. Still, 'a+b+c' needs to construct three objects instead of the
theoretical minimum of two objects. If we call 'c+(a+b)', there are
four objects, thus the current implementation is asymmetric.

What is the NRVO and how could it solve the problems? The NRVO is an
optmization that the standard allows to remove an intermediate object
even if there are observable side-effects. An implementation of the
operator which allows to apply the NRVO looks like this:

friend const T operator+( const T& lhs, const T& rhs )
{
   T nrv( lhs );
   nrv += rhs;
   return nrv;
}

With this implementation, the compiler is allowed to construct and use
the object 'nrv' in the functions return slot directly, thus no
unnecessary object is involved here. Also, this function is symmetric,
as both arguments are taken by reference.

See also Scott Meyers "More Efficient C++", Item 4.7. If you read the
book, you might get the impression I misunderstood it completely, but
please read Scott's errata for this item, available at:

http://www.aristeia.com/BookErrata/mec++-errata_frames.html

----- The real world, part 1: Performance -----

Some people doubt that this will ever make any difference. This is
true for several reasons. First of all, you need a compiler that
actually implements the NRVO, otherwise, the new implementation will
produce *more* objects (see the 'a+b+c'-example mentioned above). The
next point is, that for small classes like 'complex' or similar
classes, the compiler optimizes lots of things in the later
compilation passes. Intermediate 'objects' are still there, but the
assembler code which copies the values is optimized in a later
optimization pass. This hides the fact, that there still is a
superfluous object from the C++'s point of view in between. So, if
there is no effect, why bother?

The real value shows, when you apply operators.hpp to large classes,
e.g. matrices, vectors etc. When the compiler can't optimize the code
in the background, it really helps to remove intermediate objects as
early as possible in the compilation process. To test this, I wrote a
small example programm (benchmark.cc) which allows you to compare the
old and the new implementation. For me, the new version is 15% faster
than the old version (using GCC 3.1), YMMV. Applications that work
with large matrices etc. are typically very performance-hungry, thus
it is a very important area for an operator-library to keep in mind.

----- The real world, part 2: Compilers -----

The NRVO is still not very common, AFAIK. The GCC had no NRVO before
version 3.1, The GCC 3.1 implements it correctly and receives good
results as reported above. Some compilers way implement the NRVO but
still have bugs or don't follow the standard closely. An example for
this is the Intel C++ 6.0, which allows the NRVO to be applied only if
the type of the local variable matches the return type of the function
exactly. The standard only requires the cv-unqualified types to match,
which is required for the 'const' in the return type. (1)

A one-size-fits-all approach is a neat idea, but not more. For a
compiler without a (correct) NRVO, the old code is faster, but not
symmetric. Most users will prefer it anyway, so I supplied both
versions for operators_new.hpp, they are switched by

BOOST_NO_NRVO

For compilers without NRVO, we need to change the config-headers. As a
bonus, the user might set

BOOST_FORCE_SYMMETRIC_OPERATORS

to force the use of the new, symmetric implementation even for those
compilers. This may result in slower code for these compilers, but for
newer compilers that have the NRVO, there will be no difference.

----- Education :) -----

The NRVO is not widely known, thus I decided to call the local
variable 'nrv' instead of 'tmp' or 'result' or something similar. The
reason is, that 'nrv' is a hint to the unaware. If it is called 'tmp',
people will remove it and return to the old implementation style
without noticing what they have done when writing their own
operators. When the variable is called 'nrv', chances are that anyone
who reads the code wonders about the name and hopefully will start to
ask questions.

----- Fin -----

Any comments, suggestions, improvements, ...?

Regards, Daniel

PS: Thanks to John Potter for explaining why certain optimizations are
not allowed, see csc++ "Temporaries and optimizations".

(1) If you want to use the Intel C++ AND you want to use the NRVO,
    consider using this code:

friend const T operator+( const T& lhs, const T& rhs )
{
  const T nrv( lhs );
  *const_cast< A* >( &nrv ) += rhs;
  return nrv;
}

Please note that I don't want to show that the Intel compiler is a bad
compiler - in fact it creates faster code than the GCC 3.1 once the
work-around is applied. I just don't have any other compilers
installed. :)





Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk