Boost logo

Boost :

From: Dean Calver (deano_at_[hidden])
Date: 2001-03-14 14:37:49


> You can't generally use the "bad" form anyway unless the mathvector<> is
> already stored somewhere. If something(a, b) returns a + b you are out of
> luck. Things like Blitz++ use reference-counting to avoid copies. Probably
> we'd not want to reference-count anything as small as 4 floats regardless
of
> hardware support.

Your right of course. For small vectors any memory overhead would be bad, I
wonder if 'small' numerics should be seperate from 'large'.

> > 2) partials.
> > Most vector units have a fixed size (e.g. 4 floats) but like to have
> > partials type treated specially.
> > mathvector<float,4>a;
> > float b = 10.f;
> > a = a * b; // likely to be slow as it has to move to/from memory
across
> > registers etc
> > partial1_mathvector<float,4> c = 10.f; // there has to be many special
> > partials
> > a = a * c; // likely to be faster register to register operation
>
> I'm thoroughly lost in the above code.
I think I've got a better example this time :-)
When is a float not a float? When its a 1D vector.

mathvector<float,1> != float

The reason is that the FPU and VU (Vector Unit) often have different
register sets and copying from VU register to FPU register often goes across
memory bus. If you can set things up to only use floats when really
nessecary you can get a major speed increase by performing 1D vector
operations on the VU.

What makes it difficult (ish) is that for really good support, we have to
think about sparse partials. i.e. There are actually 4 1D partial vectors
per 4D vector and then all the 2D partials etc.

Example (simple directional lighting if your interested).

mathvector<float,4> a,b,c,result;
float dot_product = a dot b;
result = c * dot_product;

but this causes dot_product to get moved out of the VU and then back.
Thinking about it this is just a compiler optimsation (its missing in EEGCC
2.95, so I've been doing it by hand), I just wonder how long till the
compilers do it? (I remember hearing GCC 3 has partial vector support)

Problem solved, its the compilers problem :-)

> > 3) multiple units.
> > Many processors have multiple units that can be used simultanously,
using
> > an STL memory allocator like system would allow this when speed is
really
> > important (not portable probably) something like this.
> > mathvector<float,4, VECTOR_UNIT0> va,vb;
> > mathvector<int,8, MMI> ia,ib;
> > mathvector<float,4, FPU> fa,fb;
> > va = va * vb;
> > ia = ia * ib;
> > fa = fa * fb;
>
> I hope I never have reason to write code like that ;-)
> -D

Welcome to the wonderful world of PS2 :-) Actually this is the nice bit,
VECTOR_UNIT1 doesn't have a C compiler (LWI ASM only). The EEGCC compiler
(the PS2 GCC) works brillently on the above code, producing code that is
almost optimal.
Its not only strange consoles though, P3/4 has a similar system, you can
(for serious speed) get the MMX unit to do 4 int operations, SSE to do 4
float operations and the x86 core to do normal integer stuff all at the same
time.
There is another good reason for selectors, the MMX can't coexist with the
FPU code. As such integer vectors will have to default to the non MMX
versions (on PC), without selectors (or a really good compiler) we could
never have an MMX version.

Bye,
    Deano

Dean Calver
Games Console 3D Programmer


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk