From: David Abrahams (abrahams_at_[hidden])
Date: 2001-03-13 18:19:43
----- Original Message -----
From: "Dean Calver" <deano_at_[hidden]>
> I'm very interested in vectors and matrices but I think that it would be
> nice to specify the interface in ways that will allow vendors to implement
> platform specific versions, most processors today support vector
> upto 4 floats or 8 integers as a basic type. Most of the numerics
> I have seen seem to ignore the hardware support that exists in processors
It should be possible to structure any library so that it is possible to
write specializations taking advantage of hardware accelleration.
> I should note that the comments I'm making are for 'small' vector/matrix
> operation max 8x8 matrixes and are likely complete rubbish for proper
> numerics :-)
Not neccessarily. Many of the problems I will have to solve involve matrices
of "blocks" (small matrices).
> Some issues I (already) foresee if we were to try are
> 1) returning references.
> If a processor has a built in type, say a 128bit 4 float vector then we
> should NOT return references.
> Good: mathvector<float,4> something();
> Bad: mathvector<float,4>& something();
> but if not hardware we should? use a reference to stop having to copy a
> structure around.
You can't generally use the "bad" form anyway unless the mathvector<> is
already stored somewhere. If something(a, b) returns a + b you are out of
luck. Things like Blitz++ use reference-counting to avoid copies. Probably
we'd not want to reference-count anything as small as 4 floats regardless of
> 2) partials.
> Most vector units have a fixed size (e.g. 4 floats) but like to have
> partials type treated specially.
> float b = 10.f;
> a = a * b; // likely to be slow as it has to move to/from memory across
> registers etc
> partial1_mathvector<float,4> c = 10.f; // there has to be many special
> a = a * c; // likely to be faster register to register operation
I'm thoroughly lost in the above code.
> 3) multiple units.
> Many processors have multiple units that can be used simultanously, using
> an STL memory allocator like system would allow this when speed is really
> important (not portable probably) something like this.
> mathvector<float,4, VECTOR_UNIT0> va,vb;
> mathvector<int,8, MMI> ia,ib;
> mathvector<float,4, FPU> fa,fb;
> va = va * vb;
> ia = ia * ib;
> fa = fa * fb;
I hope I never have reason to write code like that ;-)
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk