Boost logo

Ublas :

Subject: Re: [ublas] I have a problem with expression templates
From: Oswin Krause (Oswin.Krause_at_[hidden])
Date: 2012-09-26 18:09:00


Hi,

Even though i understand your reasoning(yes i have also read your
second mail, and I fully agree to Leos opinion), it is totally missing
the point.

At this dimensionality, in the current design, without intermediate
memory, a call to prod() will be approximately 3-5x slower than
axpy_prod and 40-50 times slower than any optimized routine. This is by
design! There is no way to implement prod() in the current interface
such, that it is reasonable fast.
So while not crashing, prod() would just not finish in any reasonable
time. compared to this, axpy_prod gives you exactly the needed control
over memory, while providing at least no absurdly bad performance
penalty. At the moment prod() combines the worst of both worlds while
providing a tempting easy interface.

See for example the Benchmarks here(I think they already use axpy_prod
here,at least the code says it, but i am not sure):
http://eigen.tuxfamily.org/index.php?title=Benchmark

using uBLAS for your requirements is just wrong. ATLAS does a much
better job while providing a clean interface without any "smart"
decisions. Or maybe even look at the numeric bindings package. All very
dumb routines, but optimized to the max and offering you all the control
you need.

Have a nice evening :)
Greetings,
Oswin

On 2012-09-26 23:37, Michael Lehn wrote:
> Hi.
>
> In my opinion the fact that no temporary matrix gets created
> implicitly/automatically
> is a feature! In numerical applications you want to be in control
> over memory resources.
> Just consider some extreme cases where one chooses matrix dimensions
> so big that they just
> fit into memory. Even in cases not that extreme, consider that the
> nested matrix-matrix
> product is inside a loop. In this case I don't wanna be forced to
> rely on a things like
> memory pools or some other smart memory management. If the creation
> of temporaries is
> acceptable and wanted then creating them explicitly is a transparent
> solution.
>
> Cheers,
>
> Michael
>
>
> Am 26.09.2012 um 22:13 schrieb Oswin Krause:
>
>> Hi,
>>
>> This issue is actually bugging me a lot.
>> While i can understand that one wants to prevent unneccessary
>> copying, i don't understand why this is applied to matrix-matrix
>> multiplication. The issue arises quite often and is in fact a FAQ.
>>
>> I think, that the better default behavior should be that prod()
>> should always return an intermediate result for matrix-matrix
>> multiplication, thus making this point obsolete. Given a high enough
>> data dimension, the unneccessary copy does not actually matter and
>> since C++11 move semantics helps us in most cases. If the copy really
>> matters, there is always axpy_prod.
>>
>> A change in this direction would also help for another issue:
>> actually prod() is the wrong choice of function in the first place,
>> because axpy_prod (the one with the worse syntax) is a lot faster
>> given problems of higher dimensionality (like 20 or so). Which is
>> totally counter intuitive. So whil the copy dos not matter most of the
>> time, it also nables us to use the optimized kernels of axpy_prod (and
>> allows to remove a lot of code...)
>>
>>
>> So I suggest a change along the following line
>> (temporary_matrix_type is some scary type traits I don't know by heart
>> right now):
>>
>> template<class E1, class E2>
>> temporary_matrix_type prod(blas::matrix_expression<E1> const& a,
>> blas::matrix_expression<E2> const& b){
>> temporary_matrix_type result;
>> axpy_prod(a,b,result);
>> return result;
>> }
>>
>> There is a small disadvantage in edge cases like:
>>
>> Matrix B(100,100);
>> Matrix C(100,100);
>> Matrix A(3,3);
>> A+=subrange(prod(B,C),0,3,0,3); //most of prods work is in vain and
>> we have a unneccessary intermediate result.
>>
>> however, in almost all cases this change should do what most people
>> actually need.
>>
>> Greetings,
>> Oswin
>>
>>
>> On 2012-09-26 21:53, Gunter Winkler wrote:
>>> Hello,
>>>
>>> Am 26.09.2012 19:59, schrieb Oswin Krause:
>>>> Hi,
>>>>
>>>> it is a bad idea to nest calls to prod(), so ublas prevents it.
>>>> Your
>>>> workaround with the intermediate result is fine.
>>>>
>>>> Also, try ublas::prod(trans(A), ublas::prod<ublas::matrix<double>
>>>> >(B,
>>>> A))
>>> Let me add some additional explanation:
>>>
>>> The expression prod(A, B) does not mean "compute the product of A
>>> and
>>> B". It means "remember to multiply A and B on demand". The actual
>>> computation is deferred until next assignment.
>>>
>>> What does that mean for a nested prod()?
>>>
>>> prod(A, prod(C,D)) means that during computation of the final
>>> result
>>> each element of B is computed (repeatedly on the fly) from the
>>> expression prod(C,D). The "performance" would be a bad surprise.
>>> Thus
>>> uBLAS explicitly forbids nested product expressions.
>>>
>>> The solution is to use a temporary (as George said) or to define
>>> the
>>> result type of prod as a real matrix instead of the default
>>> matrix_expression (as Oswin suggested).
>>>
>>> mfg
>>> Gunter
>>>
>>>
>>> _______________________________________________
>>> ublas mailing list
>>> ublas_at_[hidden]
>>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>>> Sent to: Oswin.Krause_at_[hidden]
>>
>> _______________________________________________
>> ublas mailing list
>> ublas_at_[hidden]
>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>> Sent to: michael.lehn_at_[hidden]
>>
>>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: Oswin.Krause_at_[hidden]