From: Ken Hagan (k.hagan_at_[hidden])
Date: 2000-09-27 09:13:14
Gary Powell wrote:
> Remember the optimization motto:
> Get it working,
> Get it right,
> Make it small,
> Make it fast.
> And Test Test Test! (Profile before and after changing.) I keep
> having to relearn this. I put in an unrolled duff's device and it
> made the loop 8x slower because the code no longer fit in the on
> the cache.
I keep learning this too. For the current generation of hardware, it
seems to me that memory access patterns and cache sizes are the only
game in town. Actual CPU instructions are unimportant, since they
nearly all run in a single cycle.
I recently coded a real-time (video field rate) image processing
loop. There were a handful of floating point operations to perform
per pixel, but the actual instructions didn't really matter. What
speeded the loop up (by a factor of 2 or 3) was inserting redundant
memory reads. These pre-filled the cache, and did so with a localised
pattern of access which allowed the motherboard RAM to work in burst
mode. The naively coded loop had a more random access pattern and I
was paying the 5 clock maximum latency for each one.
I would expect any decent compiler to remove these reads when in
optimising mode, thereby slowing the code down hugely! C++ just
doesn't give me the low level control that I need for this game. I'm
using assembly language for these routines, so I suppose my
optimisation motto is
"If you can't be bothered to write it in assembly language,
it obviously isn't important to you, so don't optimise!"
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk