Boost logo

Boost :

From: Joel de Guzman (joel_at_[hidden])
Date: 2006-10-19 19:38:17


Andy Little wrote:
> "Joel de Guzman" <joel_at_[hidden]> wrote in message
> news:eh85r6$ba$1_at_sea.gmane.org...
>> Ullrich Koethe wrote:
>>> Joel de Guzman wrote:
>>>> > VIGRA doesn't have an explicit RGBA type (TinyVector<T, 4> can be used
>>>> > instead), because so far no-one came up with a convincing proposal for
>>>> > these operations. But without them, RGBA is pretty useless.
>>>> >
>>>>
>>>> Hmmm... TinyVector<T, 4>... I think VIGRA should use Fusion for
>>>> that instead ;-)
>>>>
>>> I had a look at Fusion, but I'm not sure whether it would be helpful in
>>> this context. TinyVector is based on three design goals: it should support
>>> the std::vector interface (except for resize etc.),
>> Like boost::array?
>>
>> it should be fast (you
>>> have millions of these beasts in a single image),
>> Definitely.
>>
>> and it should behave
>>> like a built-in arithmetic type (except for division which is problematic
>>> because the zero vector is not the only one that may cause a
>>> division-by-zero error).
>> No problem. But have you seen Andy's work on matrices using fusion?
>
>
> As far as the work on "tuple" matrices is concerned, though originally conceived
> to enable use in my Quan types in transform matrices:
>
> http://quan.sourceforge.net/quan_matters/doc/html/index.html
>
> The IMO more important use is to replace run time doubles with compile time
> "static" doubles usually for values of 1 or 0.
>
> The effect of this is to reduce a typical 4 x 4 matrix multiply from 64
> multiplies and 48 adds down to for example of 9 multiplies and 9 adds in the
> case of a translation x rotation x translation transform That is quite a
> profound reduction. Similar reductions are of course possible when applying the
> transform to vertices.
>
> However there is a problem in VC7.1, which is that the compiler simply runs out
> of resources in relatively simple transfoms, using Fusion, and there is no way
> round that with Fusion AFAICS. OTOH There is no such problem in VC8 or gcc4.1.1
> the other 2 compilers I tested. However rather than lose VC7.1, I opted to try
> a hand rolled version, IOW I stripped Fusion out completely and removed the
> iterators and provided custom vectors of 3,9, 4 and 16 elements and custom row
> and columns. This is not quite as neat as Fusion where one algorithm can be
> applied to theoretically any combination of matrices, however in looking at the
> assembler output from the hand made version I saw that by simplifying the
> programming and removing the extra layers of references that the compiler did
> now produce what looks to me perfect. (The example code here is simply of a 3x3
> rotation matrix multiplied by itself.)
>
> N.B as an improvement on perfect, It should also be well noted that because
> this is a simple test with local constants, that the compiler has in fact Not
> instantiated this assembler code at all in the main function, but has actually
> simply outputs constants. (This can be seen in the main assembler at the end).
> This is an improvement on the Fusion version, where I guess the references do
> provide a barrier to some optimisations and functions were called in main. Be
> wary of short tests however ;-)
>
> Note also the custom at_c functors, which I found useful. These enable the
> actual type of result... reference, const reference, value, to be sorted on a
> element by
> element basis. In fact the quanta::as_ref etc are functors so arbitrary functors
> could be substituted for e.g multiply by a constant etc.
>
> IOW in light of this I am not sure now that using Fusion is optimal for what I
> want, but it did provide a good starting point and one could see this as
> optimising...
>
> Source, with some extraneous stuff is at the end. The assembler represents the
> mux(matrix,matrix) part before its optimised out in this example. Finally the
> main assembler, showing output of a constant.

Andy, thanks for your analysis. Here are my thoughts:

1) Compilers will get better. That is implied in your statement that
    the problem you have with VC7.1 (internal structure overflow) is
    no longer present with VC8.

2) Fusion can and will be improved. Have you seen the latest code
    by Eric Niebler on segmented iterators and algorithms?

3) A tuple is a struct and a struct is a tuple. You can definitely
    mix hand-rolled structs and fusion tuples at will and provide
    algorithms "overloads" for optimization.

There's great benefit with generic programming. Certainly there
are obstacles that hinder progress and we generic programmers
try to solve those as best we can. It's a journey. Anyway, thank
you very much for your initial efforts!

Regards,

-- 
Joel de Guzman
http://www.boost-consulting.com
http://spirit.sf.net

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk