
Boost : 
From: Andy Little (andy_at_[hidden])
Date: 20061019 13:59:30
"Joel de Guzman" <joel_at_[hidden]> wrote in message
news:eh85r6$ba$1_at_sea.gmane.org...
> Ullrich Koethe wrote:
>> Joel de Guzman wrote:
>>> > VIGRA doesn't have an explicit RGBA type (TinyVector<T, 4> can be used
>>> > instead), because so far noone came up with a convincing proposal for
>>> > these operations. But without them, RGBA is pretty useless.
>>> >
>>>
>>> Hmmm... TinyVector<T, 4>... I think VIGRA should use Fusion for
>>> that instead ;)
>>>
>>
>> I had a look at Fusion, but I'm not sure whether it would be helpful in
>> this context. TinyVector is based on three design goals: it should support
>> the std::vector interface (except for resize etc.),
>
> Like boost::array?
>
> it should be fast (you
>> have millions of these beasts in a single image),
>
> Definitely.
>
> and it should behave
>> like a builtin arithmetic type (except for division which is problematic
>> because the zero vector is not the only one that may cause a
>> divisionbyzero error).
>
> No problem. But have you seen Andy's work on matrices using fusion?
As far as the work on "tuple" matrices is concerned, though originally conceived
to enable use in my Quan types in transform matrices:
http://quan.sourceforge.net/quan_matters/doc/html/index.html
The IMO more important use is to replace run time doubles with compile time
"static" doubles usually for values of 1 or 0.
The effect of this is to reduce a typical 4 x 4 matrix multiply from 64
multiplies and 48 adds down to for example of 9 multiplies and 9 adds in the
case of a translation x rotation x translation transform That is quite a
profound reduction. Similar reductions are of course possible when applying the
transform to vertices.
However there is a problem in VC7.1, which is that the compiler simply runs out
of resources in relatively simple transfoms, using Fusion, and there is no way
round that with Fusion AFAICS. OTOH There is no such problem in VC8 or gcc4.1.1
the other 2 compilers I tested. However rather than lose VC7.1, I opted to try
a hand rolled version, IOW I stripped Fusion out completely and removed the
iterators and provided custom vectors of 3,9, 4 and 16 elements and custom row
and columns. This is not quite as neat as Fusion where one algorithm can be
applied to theoretically any combination of matrices, however in looking at the
assembler output from the hand made version I saw that by simplifying the
programming and removing the extra layers of references that the compiler did
now produce what looks to me perfect. (The example code here is simply of a 3x3
rotation matrix multiplied by itself.)
N.B as an improvement on perfect, It should also be well noted that because
this is a simple test with local constants, that the compiler has in fact Not
instantiated this assembler code at all in the main function, but has actually
simply outputs constants. (This can be seen in the main assembler at the end).
This is an improvement on the Fusion version, where I guess the references do
provide a barrier to some optimisations and functions were called in main. Be
wary of short tests however ;)
Note also the custom at_c functors, which I found useful. These enable the
actual type of result... reference, const reference, value, to be sorted on a
element by
element basis. In fact the quanta::as_ref etc are functors so arbitrary functors
could be substituted for e.g multiply by a constant etc.
IOW in light of this I am not sure now that using Fusion is optimal for what I
want, but it did provide a good starting point and one could see this as
optimising...
Source, with some extraneous stuff is at the end. The assembler represents the
mux(matrix,matrix) part before its optimised out in this example. Finally the
main assembler, showing output of a constant.
regards
Andy Little
00001 dd 02 fld QWORD PTR [edx]
00003 dc 09 fmul QWORD PTR [ecx]
00005 dd 41 18 fld QWORD PTR [ecx+24]
00008 dc 4a 08 fmul QWORD PTR [edx+8]
0000b de c1 faddp ST(1), ST(0)
0000d dd 18 fstp QWORD PTR [eax]
0000f dd 42 08 fld QWORD PTR [edx+8]
00012 dc 49 20 fmul QWORD PTR [ecx+32]
00015 dd 02 fld QWORD PTR [edx]
00017 dc 49 08 fmul QWORD PTR [ecx+8]
0001a de c1 faddp ST(1), ST(0)
0001c dd 58 08 fstp QWORD PTR [eax+8]
0001f dd 42 20 fld QWORD PTR [edx+32]
00022 dc 49 18 fmul QWORD PTR [ecx+24]
00025 dd 42 18 fld QWORD PTR [edx+24]
00028 dc 09 fmul QWORD PTR [ecx]
0002a de c1 faddp ST(1), ST(0)
0002c dd 58 18 fstp QWORD PTR [eax+24]
0002f dd 41 08 fld QWORD PTR [ecx+8]
00032 dc 4a 18 fmul QWORD PTR [edx+24]
00035 dd 42 20 fld QWORD PTR [edx+32]
00038 dc 49 20 fmul QWORD PTR [ecx+32]
0003b de c1 faddp ST(1), ST(0)
0003d dd 58 20 fstp QWORD PTR [eax+32]
int main()
{
matrix_type matrix(
1.,2.,zero(),
4.,5.,zero(),
zero(),zero(),one()
);
typedef quanta::matrix_row<2,matrix_type,quanta::as_const_ref> row0_type;
row0_type row0(matrix);
std::cout << quanta::of_vector::at_c<2,quanta::as_const_ref>()(row0) <<'\n';
typedef quanta::matrix_col<2,matrix_type,quanta::as_const_ref> col2_type;
col2_type col2(matrix);
std::cout << quanta::of_vector::at_c<2,quanta::as_const_ref>()(col2) <<'\n';
quanta::dot_product<0,0,matrix_type::cols> dot;
std::cout << dot(matrix,matrix) <<'\n';
typedef quanta::matrix_mux<3,3,3,3> mux_type;
mux_type mux;
mux_type::result<matrix_type,matrix_type>::type result = mux(matrix,matrix);
std::cout << result.at<0,0>() <<'\n';
}
main function assembler for std::cout << result.at<0,0>() <<'\n';
; Line 84
000c8 dd 05 00 00 00
00 fld QWORD PTR __real_at_4022000000000000
000ce 51 push ecx
000cf dd 1c 24 fstp QWORD PTR [esp]
000d2 e8 00 00 00 00 call
??6?$basic_ostream_at_DU?$char_traits_at_D@std@@@std@@QAEAAV01_at_N@Z ;
std::basic_ostream<char,std::char_traits<char> >::operator<<
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk