Boost logo

Boost Users :

From: Andrea Denzler (andrea_at_[hidden])
Date: 2008-08-07 15:39:54


>> If you are lucky the compiler can produce code that does this in one
>> instruction set, if not you will get an overhead. Again if h and w are
low
>> values you will not notice it.
>And that's exactly what happens when you do loop tiling, the inner h and w
are rather small (tile often occupies less than half the L1 cache in size)

>> This example is of course much faster, and yes, it is not elegant nor
clear.
>> int *p=array,*pend=&array[h][w];
>> while (p < pend) *p++ = uni() ;
>Well, I copy/pasted this in my array.cpp in place of the loop nest.
>My 2D array take 9.938s to iterate 10000 over a 512*512 image of float,
while your "while loop" took 9.891. I only lose ~0.5% by using NRC
allocation + >indexing. It's indeed faster but not by that much and, indeed,
far less elegant.

When you use an index you loose the computation cost of calculating the
address and the maintenance of the two indexes. Only that. I don't think it
has much to do with L1 cache, especially if the few variables are located in
a register.

 

Now how much is that cost? A few CPU cycles per address, maybe 4 not much
more with a good compiler/cpu. If that above uni() function need something
like 50 cpu cycles then the gain for switching to pointer is very low as in
your example. But if instead of the uni function you have a simple add that
cost only 1 cycle then you are having a 5 times performance loss. But also
this can be not important if the total number of items is relatively low,
and this is the common case.

 

Try to do the pointer access with the simple "res += array[ti][tj] ;"
instruction.

 

In a program where I use much images and arrays the real gain was switching
few functions to handwritten assembler code. 6 times faster than an already
good c++ algorithm. All the rest of the application is standard elegant and
"slow" code. But it doesn't impact the overall performance.



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net