From: Michael Tegtmeyer (tegtmeye_at_[hidden])
Date: 2006-01-29 12:52:06
> In order to get an idea whether its gcc's valarray implementation
> at fault,
> you could evaluate macstl (macs and pcs) at www.pixelglow.com
> They cite significant speed improvements just by replacing gcc's
> with their alternative (albeit they are concerned with vectorised
> almost exclusively).
I have looked at macstl before and I appreciate what they are doing.
Much of it seems to be implementation advantages, however (ie
> Its got a slightly restrictive license, but has valarrays and
> similar fixed
> size alternatives.
A more open license would be nice. I have also looked at macstl's
statarray class (I think this is the class your referring to). This
violates much of the restrictions that the STL places on
std::valarray to reducing aliasing however.
> FWIW wrapping simple C arrays and using 'restrict' seems to work as
> well in
> many optimising compilers I've looked at on the PC.
> Something like macstl wins when you have longer operations that can
> make use
> of expression templates.
Simple C arrays with the restrict keyword work well but all numeric
operations have to be hand coded. std::valarray has convenient syntax
but the user must pay for a memory allocation at each object creation
which will dominate the runtime for smaller sized arrays.
My own motivation is operations in space-time where I am repetitively
doing complex numeric operations in 3 and 4 tuple space. Another
example off of the top of my head could be those doing graphics who
always use 4x4 matrices.
What I am proposing is an interface that mirrors std::valarray that
has been adjusted for those whose sizes are constant and known at
compile-time. By keeping the interface as close to std::valarray as
possible, I believe it makes it easy for users to adopt when
appropriate and it maintains the restrictions placed on std::valarray
to reduce aliasing for implementors as well as allowing for the
elimination of temporaries (expression templates).
The interface is what I am placing the most emphasis on.
The claimed performance increase over std::valarray it solely due to
the elimination of the memory allocation that std::valarray has to
perform during object construction. (This is not an implementation
trick, just a direct result of the interface.) The potential
performance increase comes from various places most notably from the
fact that certain compilers will unroll loops when the number of
iterations are known at compile time (which is the case here). My
particular implementation uses full expression templates to eliminate
temporaries in the same manner as gcc's std::valarray.
> I'd LOVE to see something like MacSTL developed for Boost. I
> haven't really
> been following MTL recently, but some smart people are working on
> it behind
> the scenes. I'm hoping that can be used as a framework for an
I would love to vectorized version in boost as well and looking into
it is a definite todo but it is my understanding that there is some
overhead associated with getting into the vector unit and due to the
favored small element size of what I am proposing, the overhead may
not be worth it. It is worth looking into however.
Lastly, I am new to boosts procedures for proposals and the like so I
apologize in advance if I end up needing some hand-holding during
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk