Boost logo

Boost :

From: Daniel Frey (daniel.frey_at_[hidden])
Date: 2002-10-22 03:56:15


Andrei Alexandrescu wrote:
>
> "Daniel Frey" <d.frey_at_[hidden]> wrote in message
> news:ap1v2g$ttq$1_at_main.gmane.org...
> > Please apply or discuss as needed :)
>
> Cool. I'd like to use this as a pretext for trying to understand the way RVO
> in its different flavors is implemented in various compilers.
>
> 1. What all compilers are guaranteed to do
>
> Consider a bona fide T type:
>
> struct T
> {
> T(const T&);
> };
>
> T Fun()
> {
> return expression;
> }
>
> T obj(Fun());
>
> In this setting, the compiler is guaranteed to copy result into a temporary
> value and a temporary value into obj. The copy constructor is invoked at
> most two times.

The important thing to note is, that T(const T&) may print a message.
This message will appear twice without optimizations. Of course the
compiler is always free to apply every optimization it can find - if it
doesn't change to result of the code. Here, all "internal" optimizations
must print the message twice. RVO/NRVO are optimizations that go beyond
this and thus they needed to be allowed by the standard explicitly. They
allow the observable behaviour of the program to change, i.e. the
message could be printed only once. If no message is printed and no
other side-effects occur, a compiler is always allowed to remove
temporaries, but AFAIK most compilers aren't clever enough to find out
whether a class is free of side-effects.

> 2. What most compilers actually do
>
> When generating code, most compilers do something like that:
>
> void Fun(void* __pResult)
> {
> new(__pResult) T(expression);
> }
>
> char __buf[sizeof(T)]; // assume proper alignment
> Fun(&__buf);
> T& obj = *reinterpret_cast<T*>(&__buf);
>
> As you see, there's a copy constructor there that takes expression as a
> source.

The memory for the object that holds a function's result is provided by
the caller, while the construction of the object is done by the callee.
This was - at least for me - the most important thing to understand.
Almost everything else follows from this fact :)

> 3. The URVO (Unnamed RVO)
>
> This is the easiest optimization to perform. Whenever /expression/ is a
> temporary of type T, the compiler is smart enough to fuse the temporary with
> the constructor call. For example, if expression is T(a, b, c), then the
> compiler will be smart enough to say:
>
> new(__pResult) T(a, b, c);
>
> instead of:
>
> new(__pResult) T(T(a, b, c));

Given the two points above, it seems trivial to implement the RVO and
AFAIK most compilers get it right. Problematic are functions with
multiple return paths. It usually isn't a problem to chain the RVO like
this:

T f() { return T( a, b, c ); }
T g() { return f(); }
T h() { return g(); }

int main() { T t = h(); }

Most compilers will create *no* temporaries, the code is equivalent to

int main() { T t( a, b, c ); }

> So far, so good. However, when a named value is used, such as 'result'
> (result being a variable of type T), still the copy constructor is used.
>
> 4. The NRVO (Named RVO)
>
> This is a more advanced optimization. The compiler is smart enough to detect
> patterns such as:
>
> T Fun()
> {
> T result(a, b, c);
> ....
> return result;
> }
>
> and generates code such as:
>
> void Fun(void* __pResult)
> {
> new(__pResult) T(a, b, c);
> ...
> }
>
> so it basically creates result at the address received from the caller.

It's the same as the unnamed RVO - except that the object is created
earlier and it has a name. Once a compiler has implemented this, it
seems to be pretty stable. The GCC (3.1+) applies the NRVO even when all
optimization flags are turned off. The Intel-compiler (6.0+?) has
implemented the NRVO, too - although it's a bit buggy. The standard
requires the *cv-unqualified* return type and the named temporary type
to match, while the Intel expects the types to match exactly.

> It is unclear to me whether (and which) compilers that do NRVO can do RVO as
> well (in the presence of multiple returns).

Multiple returns are indeed a problem, but it's usually possible to keep
that in mind and design functions to be (N)RVO-friendly.

> An interesting tweak that simulates NRVO on not-so-smart compilers is:
>
> struct T
> {
> T(const T&);
> T(T&, bool move);
> };
>
> If the second constructor is called, a move construction is done.
>
> On a compiler that doesn't know NRVO, you can say:
>
> T Fun()
> {
> T result;
> ...
> return T(result, true);
> }

We are now discussing the basics of Mojo, are we? :) You can mark all
objects that you don't need anymore as 'movable':

template< typename T > movable : public T {};

class A {
  A( const A& ) ...copy semantics
  A( movable< A >& ) ...move sematics

  A( int a, int b, int c ) ...some other ctor

  movable< A >& as_movable() {
    return *static_cast< movable< A > >( static_cast< void* >( this ) );
  }
};

template<> movable< A > : public A {
  movable( const A& a ) : A( a ) {}
  movable( movable< A >& m ) : A( m ) {}
  movable( int a, int b, int c ) : A( a, b, c ) {}
};

movable< A > f() { return movable< A >( a, b, c ); }
int main() { A a = f(); }

Well, this is of course only a draft of the idea and Mojo will most
probably do a better job.

Regards, Daniel

--
Daniel Frey
aixigo AG - financial training, research and technology
Schloß-Rahe-Straße 15, 52072 Aachen, Germany
fon: +49 (0)241 936737-42, fax: +49 (0)241 936737-99
eMail: daniel.frey_at_[hidden], web: http://www.aixigo.de

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk