> Try to do the pointer access with the simple "res += array[ti][tj] ;"
> instruction.
The results from my earlier post was from this. The res = uni() is not the core of the computation, just the array initialisation :) The greater the computation, the smaller the overhead will be IMHO. For a computation like

res + = array[i][j]*cos(array[i][j])
versus its while( begin++) equivalent, the overhead is never greater than 1%

> In a program where I use much images and arrays the real gain was
> switching few functions to handwritten assembler code. 6 times faster than
> an already good c++ algorithm. All the rest of the application is standard
> elegant and "slow" code. But it doesn't impact the overall performance.

I perfectly agree but I think that in most case, going down that low level is not needed, vene in image processing. i would rather take a few minutes to SIMDify a code if possible than rewriting it in inline assembly.

Anyway, I'm onto writing this small multi_array with indexing to see how it fares.