Boost logo

Boost :

From: Terje Slettebø (tslettebo_at_[hidden])
Date: 2002-10-12 15:28:01


>From: "Andrei Alexandrescu" <andrewalex_at_[hidden]>

> The pass-by-value is important because it communicates to the compiler
what
> the programmer is doing. The programmer lets the compiler take care of
> creating the temporary instead of creating it by herself.
>
> I recently reached the conclusion that taking a parameter by const
reference
> just to make a copy of it inside the function is a "lie". The signature
> says: "I don't need a value! A reference to a const is all I need!" and
the
> code says: "The first line of this function makes a copy!" There will be a
> section in my upcoming article entitled "The Lying const". Taking const& T
> as arguments in /any/ function when you actually *do* need a copy chokes
the
> compiler (and Zuto) and practically forbids them to make important
> optimizations.

I've done some preliminary testing (only tested on one compiler, Intel C++
7.0 beta), to test this hypothesis, to test the various ways of implementing
operator+(). I made the following test program:

--- Start ---

class Test
{
public:
  Test(int n) : num(n) {}

  Test &operator+=(const Test &other)
  {
    num+=other.num;

    return *this;
  }

  int num;
  int array[1024]; // Just so that copying shows up
};

Test test_num1(1);
Test test_num2(2);
int num;
int array;

int main()
{
Test test_num=test_num1+test_num2;

num=test_num.num;
array=test_num.array[0];
}

--- End ---

The global variables are to prevent the calculation from being optimised out
of existence. The same goes for the assignments at the end of main().

Inside main() should now be just enough to generate the new test_num from
the result of the addition.

The code is compiled on Intel C++, which makes quite optimised code, with
optimisations set to maximum speed, including inlining.

First operator+():

Test operator+(const Test &t1,const Test &t2)
{
  return Test(t1)+=t2;
}

Code for Intel C++ (Comments added with "//")
----------------------------------------------------------------

33: int main()
34: {
00401010 push ebp
00401011 mov ebp,esp
00401013 sub esp,3
00401016 and esp,0FFFFFFF8h
00401019 add esp,4
0040101C push edi
0040101D push esi
0040101E mov eax,2200h
00401023 call $$$00001 (0043b720) // Some internally generated
routine. It came when "array" was added. It doesn't do much, though
35: Test test_num=test_num1+test_num2;
00401028 lea edi,[esp]
0040102B mov esi,offset test_num1 (0047fe40)
00401030 mov ecx,401h
00401035 rep movs dword ptr [edi],dword ptr [esi]
00401037 mov eax,[test_num2 (00480e60)]
0040103C lea edi,[esp+1100h]
00401043 lea esi,[esp]
00401046 add dword ptr [esp],eax
00401049 mov ecx,401h
0040104E rep movs dword ptr [edi],dword ptr [esi]
36:
37: num=test_num.num;
00401050 mov edx,dword ptr [esp+1100h]
38: array=test_num.array[0];
00401057 mov ecx,dword ptr [esp+1104h]
39: }
0040105E xor eax,eax
36:
37: num=test_num.num;
00401060 mov dword ptr [num (0047e868)],edx
38: array=test_num.array[0];
00401066 mov dword ptr [array (0047e86c)],ecx
39: }
0040106C add esp,2200h
00401072 pop esi
00401073 pop edi
00401074 mov esp,ebp
00401076 pop ebp
00401077 ret

Note the two "rep movs". This shows copying of the "array" member.

Next version:

inline Test operator+(const Test &t1,const Test &t2)
{
  Test nrv(t1);
  nrv+=t2;
  return nrv;
}

44: int main()
45: {
00401010 push ebp
00401011 mov ebp,esp
00401013 sub esp,3
00401016 and esp,0FFFFFFF8h
00401019 add esp,4
0040101C push edi
0040101D push esi
0040101E mov eax,1100h
00401023 call $$$00001 (0043b720)
46: Test test_num=test_num1+test_num2;
00401028 lea edi,[esp]
0040102B mov esi,offset test_num1 (0047fe40)
00401030 mov ecx,401h
00401035 rep movs dword ptr [edi],dword ptr [esi]
00401037 mov edx,dword ptr [esp]
49: array=test_num.array[0];
0040103A mov ecx,dword ptr [esp+4]
50: }
0040103E xor eax,eax
46: Test test_num=test_num1+test_num2;
00401040 add edx,dword ptr [test_num2 (00480e60)]
00401046 mov dword ptr [esp],edx
47:
48: num=test_num.num;
00401049 mov dword ptr [num (0047e868)],edx
49: array=test_num.array[0];
0040104F mov dword ptr [array (0047e86c)],ecx
50: }
00401055 add esp,1100h
0040105B pop esi
0040105C pop edi
0040105D mov esp,ebp
0040105F pop ebp
00401060 ret

Preliminary tests seem to confirm what Howard and Daniel said, that using a
named temporary, rather than the constructor call with "+=", may make more
optimised code. There's only one "rep movs" (for copying the array) in the
code above, compared to two in the first one. The one copy is needed for the
receiving variable, "test_num", so the above is in fact optimal code, with
no unnecessary temporaries being created.

Let's try the third alternative:

inline Test operator+(Test t1,const Test &t2)
{
  t1+=t2;

  return t1;
}

55: int main()
56: {
00455F08 push ebp
00455F09 mov ebp,esp
00455F0B sub esp,3
00455F0E and esp,0FFFFFFF8h
00455F11 add esp,4
00455F14 push edi
00455F15 push esi
00455F16 mov eax,2200h
00455F1B call $$$00001 (0043b720)
57: Test test_num=test_num1+test_num2;
00455F20 lea eax,[esp+1100h]
00455F27 lea edi,[esp+4]
00455F2B mov dword ptr [esp],eax
00455F2E mov esi,offset test_num1 (0047fe40)
00455F33 mov ecx,401h
00455F38 rep movs dword ptr [edi],dword ptr [esi]
00455F3A mov eax,offset test_num2 (00480e60)
00455F3F mov edx,dword ptr [eax]
00455F41 lea esi,[esp+4]
00455F45 mov dword ptr [esp+1008h],eax
00455F4C mov edi,dword ptr [esp]
00455F4F add dword ptr [esp+4],edx
00455F53 mov ecx,401h
00455F58 rep movs dword ptr [edi],dword ptr [esi]
58:
59: num=test_num.num;
00455F5A mov edx,dword ptr [esp+1100h]
60: array=test_num.array[0];
00455F61 mov ecx,dword ptr [esp+1104h]
61: }
00455F68 xor eax,eax
58:
59: num=test_num.num;
00455F6A mov dword ptr [num (0047e868)],edx
60: array=test_num.array[0];
00455F70 mov dword ptr [array (0047e86c)],ecx
61: }
00455F76 add esp,2200h
00455F7C pop esi
00455F7D pop edi
00455F7E mov esp,ebp
00455F80 pop ebp
00455F81 ret

Hm. Back to having two copies, again (two "rep movs").

Note, this is only tested on _one_ compiler, but it may give us something to
go on. From these results, Daniel's suggestion (second version here) turned
out to be the most optimised one.

It seems that, at least for this compiler, Andrei's suggestion to pass by
value if you need to make a copy, anyway, resulted in less optimised code.
Considering that, in that case, it has to make a copy, to call the function,
then it's already too late to use the NRVO in the function, as it's already
a copy, so the above results makes sense.

To quote again from above:

> Taking const& T
> as arguments in /any/ function when you actually *do* need a copy chokes
the
> compiler (and Zuto) and practically forbids them to make important
> optimizations.

At least for Intel C++, this turns out to be the other way around. Calling
by value prevents the NRVO.

Regards,

Terje


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk