# Boost :

From: Andy Little (andy_at_[hidden])
Date: 2006-10-20 07:30:57

"Larry Evans" <cppljevans_at_[hidden]> wrote in message
news:eh99e2\$4ch\$1_at_sea.gmane.org...
> On 10/19/2006 12:59 PM, Andy Little wrote:
>> "Joel de Guzman" <joel_at_[hidden]> wrote in message
>> news:eh85r6\$ba\$1_at_sea.gmane.org...
> [snip]
>>
>>>No problem. But have you seen Andy's work on matrices using fusion?
>>
>>
>>
>> As far as the work on "tuple" matrices is concerned, though originally
>> conceived
>> to enable use in my Quan types in transform matrices:
>>

>>
>> The IMO more important use is to replace run time doubles with compile time
>> "static" doubles usually for values of 1 or 0.
>>
> Andy, If all the types are the same(e.g. all double), then maybe a tuple
> is too heavyweight. I'd think something like vector_c, maybe called
> matrix_c, would be more appropriate. Or maybe you're wanting units with
> that so that the elements would be a double, but a double with units
> length/time or something else. Is that why tuples instead of matrix_c
> is being used?

OK.

The use I want to put matrices to is quite simple. I want to make 2D and 3D
transform matrices for transforming points from one coordinate system to
another. The first point is that this is essentially runtime rather than compile
time calculations.

In the normal world AFAICS everything is a double. In this case making a 2D
transform matrix is simple. You just need some container(s) of doubles which you
then view as a matrix. AFAIK this is how UBLAS and MTL work for example.

In the world of Quan however doubles are replaced by types representing (say)
lengths in mm. This is a very different world from the "everything's a double"
world. Lets take a multiply where everythings a double:

typedef double T;

T a, b

T c = a * b;

No problem.

Apply this approach to a Quan length::mm;

typedef quan::length::mm T;

T a, b

T c = a * b; // Error

In the world of Quan when you multiply 2 lengths you get something representing
an area:

typedef quan::length::mm T1;

T1 a, b

typedef quan::area::mm2 T2

T2 c = a * b; // OK.

(As to whether you see this as useful or not is beside the point. This is simply
what Quan does).

In fact there is a standard, obvious way to find the result type generically in
Quan :

typeof(a*b) c = a*b;

or

auto c = a* b;

This is why I am a fan of Boost.Typeof ;-)

Once you have access to the result_type of an operation, then it can be applied
to any two types for which operator * is defined. It could be quan::lengths, it
could be doubles, it could be complex quantities, Colours ;-)

It could also be a matrix . Like in the length * length op above else the type
of the result of multiplying 2 matrices is not the same as the arguments, but
also it can be deduced in exactly the same way using typeof(matrix1 * matrix2);

This is why I am a fan of Fusion ;-)

In a matrix it is quite common to have many elements with a value of 0 or 1. In
case of Q * 1, we already know the result at compile time, similarly Q * 0.

Although I know that the result of Q * 0 ---> Q(0) unfortunately in general the
compiler doesnt. However you can inform the compiler of your intent by using a
special 'static' type, which in Quan I call a static_value.

The static value has a type and a value, but the value is a compile time value
not a runtime value. In quan I use the quan::meta::rational type as the value.
To represent a static length in mm, with a value of 0 mm and 1 mm.

typedef static_value<quan::length::mm, meta::rational<0,1> > zero_mm;
typedef static_value<quan::length::mm, meta::rational<1,1> > one_mm;

or a double:

typedef static_value<double, meta::rational<0,1> > zero;
typedef static_value<double, meta::rational<1,1> > one;

These static_value types are also PODs, IOW without ctors or user defined
assignment ops etc. This means the compiler can make assumptions and apply
certain optimisations which it couldnt otherwise.

static_values have runtime operations both for ops with other static_values and
with runtime types. Because 'zero' is a different type to 'one' it is possible
to provide separate overloads for particular types. The actual implementation is
relatively complicated but doable even for static Quan quantities.

The rules for these ops are that wherever possible the result_type is a
static_value rather than a runtime value:

auto result = quan::length::mm(3.45678) * zero;

In this case the result type is predictable:

static_assert( equal_to< typeof(result),zero_mm>::value);

IOW the result in this case is a static value.

The actual calc, quan::length::mm(3.45678) * zero , is in fact a no-op, and the

The result is a dramatic improvement in perfomance. In my previous post I showed
the VC8 assemly for a multiply of a matrix with some static_values, by itself
Repeated below for contrast. Here the transform is a rotation matrix,
(different transforms, scaling translation and combos, etc have a different
'type').

typedef quanta::of_vector::vector9<
double, double, zero,
double, double, zero,
zero,zero,one
> my_vect9;

typedef quanta::matrix<3,3,my_vect9> matrix_type;

matrix_type matrix(
1.,2.,zero(),
4.,5.,zero(),
zero(),zero(),one()
);

assembler for mux function (optimised out in example)...

00001 dd 02 fld QWORD PTR [edx]
00003 dc 09 fmul QWORD PTR [ecx]
00005 dd 41 18 fld QWORD PTR [ecx+24]
00008 dc 4a 08 fmul QWORD PTR [edx+8]
0000b de c1 faddp ST(1), ST(0)
0000d dd 18 fstp QWORD PTR [eax]
0000f dd 42 08 fld QWORD PTR [edx+8]
00012 dc 49 20 fmul QWORD PTR [ecx+32]
00015 dd 02 fld QWORD PTR [edx]
00017 dc 49 08 fmul QWORD PTR [ecx+8]
0001a de c1 faddp ST(1), ST(0)
0001c dd 58 08 fstp QWORD PTR [eax+8]
0001f dd 42 20 fld QWORD PTR [edx+32]
00022 dc 49 18 fmul QWORD PTR [ecx+24]
00025 dd 42 18 fld QWORD PTR [edx+24]
00028 dc 09 fmul QWORD PTR [ecx]
0002a de c1 faddp ST(1), ST(0)
0002c dd 58 18 fstp QWORD PTR [eax+24]
0002f dd 41 08 fld QWORD PTR [ecx+8]
00032 dc 4a 18 fmul QWORD PTR [edx+24]
00035 dd 42 20 fld QWORD PTR [edx+32]
00038 dc 49 20 fmul QWORD PTR [ecx+32]
0003b de c1 faddp ST(1), ST(0)
0003d dd 58 20 fstp QWORD PTR [eax+32]

, and this was actually optimised out resulting in output of a constant with no
runtime calc in the example.

If I replace the static_values in the matrix with doubles (and incidentally
changing the type of the matrix therefore):

typedef quanta::of_vector::vector9<
double, double, double,
double, double, double,
double,double,double
> my_vect9;

typedef quanta::matrix<3,3,my_vect9> matrix_type;

matrix_type matrix(
1.,2.,0.,
4.,5.,0.,
0.,0.,1.
);

assembler for mux function (In this case the compiler calls the function from
main in the modified example):

00001 dd 02 fld QWORD PTR [edx]
00003 dc 09 fmul QWORD PTR [ecx]
00005 dd 42 08 fld QWORD PTR [edx+8]
00008 dc 49 18 fmul QWORD PTR [ecx+24]
0000b de c1 faddp ST(1), ST(0)
0000d dd 41 30 fld QWORD PTR [ecx+48]
00010 dc 4a 10 fmul QWORD PTR [edx+16]
00013 de c1 faddp ST(1), ST(0)
00015 dd 18 fstp QWORD PTR [eax]
00017 dd 41 08 fld QWORD PTR [ecx+8]
0001a dc 0a fmul QWORD PTR [edx]
0001c dd 41 20 fld QWORD PTR [ecx+32]
0001f dc 4a 08 fmul QWORD PTR [edx+8]
00022 de c1 faddp ST(1), ST(0)
00024 dd 41 38 fld QWORD PTR [ecx+56]
00027 dc 4a 10 fmul QWORD PTR [edx+16]
0002a de c1 faddp ST(1), ST(0)
0002c dd 58 08 fstp QWORD PTR [eax+8]
0002f dd 41 10 fld QWORD PTR [ecx+16]
00032 dc 0a fmul QWORD PTR [edx]
00034 dd 42 08 fld QWORD PTR [edx+8]
00037 dc 49 28 fmul QWORD PTR [ecx+40]
0003a de c1 faddp ST(1), ST(0)
0003c dd 41 40 fld QWORD PTR [ecx+64]
0003f dc 4a 10 fmul QWORD PTR [edx+16]
00042 de c1 faddp ST(1), ST(0)
00044 dd 58 10 fstp QWORD PTR [eax+16]
00047 dd 42 18 fld QWORD PTR [edx+24]
0004a dc 09 fmul QWORD PTR [ecx]
0004c dd 41 18 fld QWORD PTR [ecx+24]
0004f dc 4a 20 fmul QWORD PTR [edx+32]
00052 de c1 faddp ST(1), ST(0)
00054 dd 41 30 fld QWORD PTR [ecx+48]
00057 dc 4a 28 fmul QWORD PTR [edx+40]
0005a de c1 faddp ST(1), ST(0)
0005c dd 58 18 fstp QWORD PTR [eax+24]
0005f dd 41 20 fld QWORD PTR [ecx+32]
00062 dc 4a 20 fmul QWORD PTR [edx+32]
00065 dd 42 18 fld QWORD PTR [edx+24]
00068 dc 49 08 fmul QWORD PTR [ecx+8]
0006b de c1 faddp ST(1), ST(0)
0006d dd 41 38 fld QWORD PTR [ecx+56]
00070 dc 4a 28 fmul QWORD PTR [edx+40]
00073 de c1 faddp ST(1), ST(0)
00075 dd 58 20 fstp QWORD PTR [eax+32]
00078 dd 41 28 fld QWORD PTR [ecx+40]
0007b dc 4a 20 fmul QWORD PTR [edx+32]
0007e dd 42 18 fld QWORD PTR [edx+24]
00081 dc 49 10 fmul QWORD PTR [ecx+16]
00084 de c1 faddp ST(1), ST(0)
00086 dd 41 40 fld QWORD PTR [ecx+64]
00089 dc 4a 28 fmul QWORD PTR [edx+40]
0008c de c1 faddp ST(1), ST(0)
0008e dd 58 28 fstp QWORD PTR [eax+40]
00091 dd 41 18 fld QWORD PTR [ecx+24]
00094 dc 4a 38 fmul QWORD PTR [edx+56]
00097 dd 42 30 fld QWORD PTR [edx+48]
0009a dc 09 fmul QWORD PTR [ecx]
0009c de c1 faddp ST(1), ST(0)
0009e dd 41 30 fld QWORD PTR [ecx+48]
000a1 dc 4a 40 fmul QWORD PTR [edx+64]
000a4 de c1 faddp ST(1), ST(0)
000a6 dd 58 30 fstp QWORD PTR [eax+48]
000a9 dd 41 20 fld QWORD PTR [ecx+32]
000ac dc 4a 38 fmul QWORD PTR [edx+56]
000af dd 42 30 fld QWORD PTR [edx+48]
000b2 dc 49 08 fmul QWORD PTR [ecx+8]
000b5 de c1 faddp ST(1), ST(0)
000b7 dd 41 38 fld QWORD PTR [ecx+56]
000ba dc 4a 40 fmul QWORD PTR [edx+64]
000bd de c1 faddp ST(1), ST(0)
000bf dd 58 38 fstp QWORD PTR [eax+56]
000c2 dd 41 28 fld QWORD PTR [ecx+40]
000c5 dc 4a 38 fmul QWORD PTR [edx+56]
000c8 dd 42 30 fld QWORD PTR [edx+48]
000cb dc 49 10 fmul QWORD PTR [ecx+16]
000ce de c1 faddp ST(1), ST(0)
000d0 dd 41 40 fld QWORD PTR [ecx+64]
000d3 dc 4a 40 fmul QWORD PTR [edx+64]
000d6 de c1 faddp ST(1), ST(0)
000d8 dd 58 40 fstp QWORD PTR [eax+64]

call in main of double version, dramatic contrast to optimising to a constant
(See my previous post )

; Line 82
00081 8d 4c 24 34 lea ecx, DWORD PTR _matrix\$[esp+196]
00085 83 c4 04 add esp, 4
00088 8b d1 mov edx, ecx
0008a 8d 44 24 78 lea eax, DWORD PTR _result\$[esp+192]
0008e e8 00 00 00 00 call
??\$?RU?\$matrix@\$02\$02U?\$vector9_at_NNNNNNNNNUas_value_at_quanta@@@of_vector_at_quanta@@@quanta@@U01@@?\$matrix_mux@\$02\$02\$02\$02_at_quanta@@QBE?AU?\$matrix@\$02\$02U?\$vector9_at_NNNNNNNNNUas_value_at_quanta@@@of_vector_at_quanta@@@1_at_ABU21@0_at_Z
;
quanta::matrix_mux<3,3,3,3>::operator()<quanta::matrix<3,3,quanta::of_vector::vector9<double,double,double,double,double,double,double,double,double,quanta::as_value>
>,quanta::matrix<3,3,quanta::of_vector::vector9<double,double,double,double,double,double,double,double,double,quanta::as_value> > >; Line 84 00093 dd 44 24 78 fld QWORD PTR _result\$[esp+192] 00097 83 ec 08 sub esp, 8 0009a dd 1c 24 fstp QWORD PTR [esp] 0009d e8 00 00 00 00 call??6?\$basic_ostream_at_DU?\$char_traits_at_D@std@@@std@@QAEAAV01_at_N@Z ;std::basic_ostream<char,std::char_traits<char> >::operator<<The example is for 2D matrices. The results will not be quite as good for 3Dtypes where the ratio of static to runtime types is lower, but other transformshave a much lower cost than a rotation * rotation, when using static values.More complex transforms may approach more and more the all runtime version asstatic_values are replaced by runtime as the calc proceeds, but this is a caseof runtime types being supplied 'on demand' rather than always. In other casesruntime calcs are knocked out by being multiplied by a static zero. For aparticular combination of transforms it is I think quite simple to work
out thepercentage improvement in terms of number of adds and multiplies. I worked out astandard transform rotation round a point . I was incorrect in my previous post.The all double version has 128 muxes and 96 adds. The static_value version has 9multiplies and 9 Adds. (I need to check this again , but I think its correct).That is a factor of 10 improvement, which isnt bad ;-)regardsAndy Little