Boost logo

Ublas :

Subject: Re: [ublas] Matrix decompositions.
From: Salman Javaid (javaid.salman_at_[hidden])
Date: 2013-05-19 02:09:00


Thank you, Oswin and Nasos for the detailed responses. I am hugely
grateful. This is enormous help. I am now trying to dig into optimized
algorithms for different matrices.

Oswin, thank you for the link to the paper. I am going through it.

Best Regards,
Salman Javaid

On Thu, May 16, 2013 at 7:06 AM, Karl Rupp <rupp_at_[hidden]> wrote:

> Hi guys,
>
> I have a working implementation of QR for uBLAS in ViennaCL for about a
> year already:
>
> https://github.com/viennacl/**viennacl-dev/blob/master/**
> viennacl/linalg/qr.hpp<https://github.com/viennacl/viennacl-dev/blob/master/viennacl/linalg/qr.hpp>
>
> Feel free to copy&paste and relicense as needed, I agree to whatever is
> necessary to integrate into uBLAS if of interest.
>
> There is a bit of duplication for the ViennaCL types (requiring host<->GPU
> transfers) and the uBLAS types (operating in main RAM), yet it gives an
> idea how things can be implemented. It can certainly be improved here and
> there (more details on request), yet it addresses most of the points raised
> by Oswin. And it's faster than a standard LAPACK for sizes above ~1k times
> 1k.
>
> I recommend extracting the Householder reflections into a nice separate
> interface, since this functionality will also be needed for other
> algorithms like SVD or GMRES. As a nice side effect, it makes the
> implementation for QR more compact.
>
> Generally, in order to get *any* reasonable performance, one really needs
> to use matrix-matrix multiplications where possible in order to avoid the
> memory bandwidth bottleneck.
>
> Best regards,
> Karli
>
>
>
> On 05/16/2013 01:06 AM, oswin krause wrote:
>
>> Hi,
>>
>> These are further good points!
>>
>> I also came up with a few new ones(and tips!):
>>
>> - QR needs pivoting. It's main usages are SVD and pseudo-inverses. In
>> both cases the input does not necessary have full rank. Also pivoting
>> helps for matrices with high condition numbers.
>>
>> - For the same reasons H was not formed explicitly, Q should not be
>> formed. Instead there should be a version of the algorithm which does
>> only return the reflection vectors forming Q.
>>
>> - For dense matrices at least, it is possible to do the QR in-place by
>> storing the R part as lower triangular and the householder
>> transformation vectors in the upper triangular. (That is very similar to
>> how LU is implemented).
>>
>> - The best sources for algorithmic information are the LAPACK working
>> notes.
>> http://www.netlib.org/lapack/**lawns/>
>>
>> In your case lawn114 sems to be the most relevant, even though it
>> assumes a fast BLAS3.
>>
>> Greetings,
>> Oswin
>>
>>
>> On 16.05.2013 05:32, Nasos Iliopoulos wrote:
>>
>>> That's not a bad start.
>>>
>>> I think Oswin covered a whole lot of items here, but a more complete
>>> algorithm needs to satisfy some or all of the following:
>>>
>>> - The algortihm should have a dispatch mechanism so that optimized
>>> versions for various matrix types can be provided. (sparse, banded,
>>> etc.). You don't necessarily need to provide them all to start with.
>>> - The algorithm should operate on matrix expressions rather than
>>> matrices (so it can be applied to for example subranges). Static
>>> dispatch or overload if for some reason this seems to reduce performance.
>>> - Const correctness is important. Try using const reference on
>>> immutable types.
>>> - Instead of 0.00025 provide a value based on a user choice.If it is
>>> hard coded by the user, the compiler will probably convert it into a
>>> const value.
>>> - Don't use ints for indexing, use either std::size_t, or
>>> container::size_type. If you need a signed type (i.e. to count for
>>> differences on unsigned types) use ptrdiff_t. uBlas containers provide
>>> a difference_type typedef for that purpose (i.e.
>>> matrix<double>::difference_**type).
>>> - use noalias(..) = in every assignment that the lhs is not a part of
>>> rhs, or when the algebraic operation is mapped 1-1. (i.e. A=2*A+B can
>>> be written as noalias(A)=2*A+B, but A=prod(B,A)+D cannot atm). This
>>> provides more benefits than just avoiding temporaries.
>>>
>>>
>>> The QR decomposition of a 100x100 matrix should take no more than a
>>> few miliseconds (or even less than a milisecond) to run.
>>> A 1000x1000 should take around 1/3 to 1/10 of a sec.
>>>
>>> Compile with:
>>> g++ -O3 -DNDEBUG Main.cpp -o qrtest
>>>
>>> Then you'll see that your code runs pretty fast, but it doesn't scale
>>> well as Oswin noted.
>>>
>>> Best regards,
>>> -Nasos
>>>
>>>
>>>
>>>
>>> On 05/15/2013 10:12 PM, Salman Javaid wrote:
>>>
>>>> Thank you, Oswin for the detailed response. I am going to update the
>>>> code.
>>>>
>>>> David, Nasos, any advise on coding conventions? Or anything else that
>>>> you can possible suggest? I will stand grateful.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Best Regards,
>>>> Salman Javaid
>>>>
>>>>
>>>> On Tue, May 14, 2013 at 10:53 PM, oswin krause
>>>> <oswin.krause_at_ruhr-uni-bochum.**de <oswin.krause_at_[hidden]>
>>>> <mailto:oswin.krause_at_ruhr-uni-**bochum.de<oswin.krause_at_[hidden]>>>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> in the order I stumbled over the things:
>>>>
>>>> main.cpp
>>>> line 44-54: you don't need a copy, instead you should use a
>>>> combination of row/subrange.
>>>> line 58-60: you should take a look at inner_prod
>>>> line 63: 0.00025 is too big.
>>>> line 66: You should never create H explicitly.
>>>> line 67: because you formed H, this step is O(n^3) which makes
>>>> the whole algorithm O(n^4). This can be done in O(n^2)
>>>> line 73-79: same applies here.
>>>>
>>>> Greetings,
>>>> Oswin
>>>>
>>>> On 14.05.2013 22:12, Salman Javaid wrote:
>>>>
>>>>> Hello uBLAS Contributors:
>>>>>
>>>>> I have applied to GSoC 2013
>>>>> and pitched implementation of SVD factorization for uBLAS. In
>>>>> order to better prepare myself and to get my hands dirty at
>>>>> uBLAS, I ended up implementing QR Factorization employing
>>>>> Householder Reflections using uBLAS. This is only the first
>>>>> draft and will be needing significant improvement, e.g.,
>>>>> computation of QR decomposition of 100 * 100 matrix takes around
>>>>> 30 seconds. But I guess just to get familiar with code base, it
>>>>> was a good exercise. Over the next week or two I will be trying
>>>>> to optimize the code.
>>>>>
>>>>>
>>>>> I will be absolutely
>>>>> grateful if contributors can have a quick glance at the code,
>>>>> and point me to any improvements they can suggest. Particularly
>>>>> in speeding up matrix multiplication.
>>>>>
>>>>> I used Visual Studio 2010 to compile the code. I will try to get
>>>>> the code running on my Ubuntu machine in a couple of days
>>>>> hopefully.
>>>>>
>>>>> Here the header file:
>>>>>
https://github.com/**salmanjavaid/QR_Decomposition/**
>>>>> blob/master/QR_Header.hpp<https://github.com/salmanjavaid/QR_Decomposition/blob/master/QR_Header.hpp>
>>>>>
>>>>>
>>>>> The main file:
>>>>> https://github.com/**salmanjavaid/QR_Decomposition/**
>>>>> blob/master/Main.cpp<https://github.com/salmanjavaid/QR_Decomposition/blob/master/Main.cpp>
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Salman Javaid
>>>>>
>>>>>
>>>>> ______________________________**_________________
>>>>> ublas mailing list
>>>>> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
>>>>> http://lists.boost.org/**mailman/listinfo.cgi/ublas>
>>>>> Sent to:Oswin.Krause_at_ruhr-uni-**bochum.de<to%3AOswin.Krause_at_[hidden]> <mailto:
>>>>> Oswin.Krause_at_ruhr-uni-**bochum.de <Oswin.Krause_at_[hidden]>>
>>>>>
>>>>
>>>>
>>>> ______________________________**_________________
>>>> ublas mailing list
>>>> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
>>>>
http://lists.boost.org/**mailman/listinfo.cgi/ublas>
>>>> Sent to: Javaid.Salman_at_[hidden] <mailto:Javaid.Salman_at_gmail.**com<Javaid.Salman_at_[hidden]>
>>>> >
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ______________________________**_________________
>>>> ublas mailing list
>>>> ublas_at_[hidden]
>>>>
http://lists.boost.org/**mailman/listinfo.cgi/ublas>
>>>> Sent to:athanasios.iliopoulos.ctr.**gr_at_[hidden]<to%3Aathanasios.iliopoulos.ctr.gr_at_[hidden]>
>>>>
>>>
>>>
>>>
>>> ______________________________**_________________
>>> ublas mailing list
>>> ublas_at_[hidden]
>>>
http://lists.boost.org/**mailman/listinfo.cgi/ublas>
>>> Sent to:Oswin.Krause_at_ruhr-uni-**bochum.de<to%3AOswin.Krause_at_[hidden]>
>>>
>>
>>
>>
>> ______________________________**_________________
>> ublas mailing list
>> ublas_at_[hidden]
>>
http://lists.boost.org/**mailman/listinfo.cgi/ublas>
>> Sent to: rupp_at_[hidden]
>>
>>
> ______________________________**_________________
> ublas mailing list
> ublas_at_[hidden]
>
http://lists.boost.org/**mailman/listinfo.cgi/ublas>
> Sent to: Javaid.Salman_at_[hidden]
>