
Ublas : 
Subject: Re: [ublas] Matrix decompositions.
From: Salman Javaid (javaid.salman_at_[hidden])
Date: 20130520 09:27:57
Yes, I forgot and that is criminal on my part. His code is an excellent
starter to better understanding the implementation intricacies. I actually
went through the code but hadn't got time to really study it. But no
excuses, hugely grateful to Karl. I am really looking forward to working
with him and all of you guys through the summer.
Best Regards,
Salman Javaid
On Mon, May 20, 2013 at 6:06 AM, Nasos Iliopoulos <nasos_i_at_[hidden]>wrote:
> You welcome, but I think Karl's contribution was much more useful,
> right? :)
>
> Nasos
>
>
> On 05/19/2013 02:09 AM, Salman Javaid wrote:
>
> Thank you, Oswin and Nasos for the detailed responses. I am hugely
> grateful. This is enormous help. I am now trying to dig into optimized
> algorithms for different matrices.
>
>
> Oswin, thank you for the link to the paper. I am going through it.
>
>
>
>
>
> Best Regards,
> Salman Javaid
>
>
> On Thu, May 16, 2013 at 7:06 AM, Karl Rupp <rupp_at_[hidden]> wrote:
>
>> Hi guys,
>>
>> I have a working implementation of QR for uBLAS in ViennaCL for about a
>> year already:
>>
>>
>> https://github.com/viennacl/viennacldev/blob/master/viennacl/linalg/qr.hpp
>>
>> Feel free to copy&paste and relicense as needed, I agree to whatever is
>> necessary to integrate into uBLAS if of interest.
>>
>> There is a bit of duplication for the ViennaCL types (requiring
>> host<>GPU transfers) and the uBLAS types (operating in main RAM), yet it
>> gives an idea how things can be implemented. It can certainly be improved
>> here and there (more details on request), yet it addresses most of the
>> points raised by Oswin. And it's faster than a standard LAPACK for sizes
>> above ~1k times 1k.
>>
>> I recommend extracting the Householder reflections into a nice separate
>> interface, since this functionality will also be needed for other
>> algorithms like SVD or GMRES. As a nice side effect, it makes the
>> implementation for QR more compact.
>>
>> Generally, in order to get *any* reasonable performance, one really needs
>> to use matrixmatrix multiplications where possible in order to avoid the
>> memory bandwidth bottleneck.
>>
>> Best regards,
>> Karli
>>
>>
>>
>> On 05/16/2013 01:06 AM, oswin krause wrote:
>>
>>> Hi,
>>>
>>> These are further good points!
>>>
>>> I also came up with a few new ones(and tips!):
>>>
>>>  QR needs pivoting. It's main usages are SVD and pseudoinverses. In
>>> both cases the input does not necessary have full rank. Also pivoting
>>> helps for matrices with high condition numbers.
>>>
>>>  For the same reasons H was not formed explicitly, Q should not be
>>> formed. Instead there should be a version of the algorithm which does
>>> only return the reflection vectors forming Q.
>>>
>>>  For dense matrices at least, it is possible to do the QR inplace by
>>> storing the R part as lower triangular and the householder
>>> transformation vectors in the upper triangular. (That is very similar to
>>> how LU is implemented).
>>>
>>>  The best sources for algorithmic information are the LAPACK working
>>> notes.
>>> http://www.netlib.org/lapack/lawns/
>>>
>>> In your case lawn114 sems to be the most relevant, even though it
>>> assumes a fast BLAS3.
>>>
>>> Greetings,
>>> Oswin
>>>
>>>
>>> On 16.05.2013 05:32, Nasos Iliopoulos wrote:
>>>
>>>> That's not a bad start.
>>>>
>>>> I think Oswin covered a whole lot of items here, but a more complete
>>>> algorithm needs to satisfy some or all of the following:
>>>>
>>>>  The algortihm should have a dispatch mechanism so that optimized
>>>> versions for various matrix types can be provided. (sparse, banded,
>>>> etc.). You don't necessarily need to provide them all to start with.
>>>>  The algorithm should operate on matrix expressions rather than
>>>> matrices (so it can be applied to for example subranges). Static
>>>> dispatch or overload if for some reason this seems to reduce
>>>> performance.
>>>>  Const correctness is important. Try using const reference on
>>>> immutable types.
>>>>  Instead of 0.00025 provide a value based on a user choice.If it is
>>>> hard coded by the user, the compiler will probably convert it into a
>>>> const value.
>>>>  Don't use ints for indexing, use either std::size_t, or
>>>> container::size_type. If you need a signed type (i.e. to count for
>>>> differences on unsigned types) use ptrdiff_t. uBlas containers provide
>>>> a difference_type typedef for that purpose (i.e.
>>>> matrix<double>::difference_type).
>>>>  use noalias(..) = in every assignment that the lhs is not a part of
>>>> rhs, or when the algebraic operation is mapped 11. (i.e. A=2*A+B can
>>>> be written as noalias(A)=2*A+B, but A=prod(B,A)+D cannot atm). This
>>>> provides more benefits than just avoiding temporaries.
>>>>
>>>>
>>>> The QR decomposition of a 100x100 matrix should take no more than a
>>>> few miliseconds (or even less than a milisecond) to run.
>>>> A 1000x1000 should take around 1/3 to 1/10 of a sec.
>>>>
>>>> Compile with:
>>>> g++ O3 DNDEBUG Main.cpp o qrtest
>>>>
>>>> Then you'll see that your code runs pretty fast, but it doesn't scale
>>>> well as Oswin noted.
>>>>
>>>> Best regards,
>>>> Nasos
>>>>
>>>>
>>>>
>>>>
>>>> On 05/15/2013 10:12 PM, Salman Javaid wrote:
>>>>
>>>>> Thank you, Oswin for the detailed response. I am going to update the
>>>>> code.
>>>>>
>>>>> David, Nasos, any advise on coding conventions? Or anything else that
>>>>> you can possible suggest? I will stand grateful.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Salman Javaid
>>>>>
>>>>>
>>>>> On Tue, May 14, 2013 at 10:53 PM, oswin krause
>>>>> <oswin.krause_at_[hidden]
>>>>> <mailto:oswin.krause_at_[hidden]>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> in the order I stumbled over the things:
>>>>>
>>>>> main.cpp
>>>>> line 4454: you don't need a copy, instead you should use a
>>>>> combination of row/subrange.
>>>>> line 5860: you should take a look at inner_prod
>>>>> line 63: 0.00025 is too big.
>>>>> line 66: You should never create H explicitly.
>>>>> line 67: because you formed H, this step is O(n^3) which makes
>>>>> the whole algorithm O(n^4). This can be done in O(n^2)
>>>>> line 7379: same applies here.
>>>>>
>>>>> Greetings,
>>>>> Oswin
>>>>>
>>>>> On 14.05.2013 22:12, Salman Javaid wrote:
>>>>>
>>>>>> Hello uBLAS Contributors:
>>>>>>
>>>>>> I have applied to GSoC 2013
>>>>>> and pitched implementation of SVD factorization for uBLAS. In
>>>>>> order to better prepare myself and to get my hands dirty at
>>>>>> uBLAS, I ended up implementing QR Factorization employing
>>>>>> Householder Reflections using uBLAS. This is only the first
>>>>>> draft and will be needing significant improvement, e.g.,
>>>>>> computation of QR decomposition of 100 * 100 matrix takes around
>>>>>> 30 seconds. But I guess just to get familiar with code base, it
>>>>>> was a good exercise. Over the next week or two I will be trying
>>>>>> to optimize the code.
>>>>>>
>>>>>>
>>>>>> I will be absolutely
>>>>>> grateful if contributors can have a quick glance at the code,
>>>>>> and point me to any improvements they can suggest. Particularly
>>>>>> in speeding up matrix multiplication.
>>>>>>
>>>>>> I used Visual Studio 2010 to compile the code. I will try to get
>>>>>> the code running on my Ubuntu machine in a couple of days
>>>>>> hopefully.
>>>>>>
>>>>>> Here the header file:
>>>>>>
>>>>>> https://github.com/salmanjavaid/QR_Decomposition/blob/master/QR_Header.hpp
>>>>>>
>>>>>>
>>>>>> The main file:
>>>>>>
>>>>>> https://github.com/salmanjavaid/QR_Decomposition/blob/master/Main.cpp
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Salman Javaid
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ublas mailing list
>>>>>> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
>>>>>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>>>>>> Sent to:Oswin.Krause_at_[hidden] <mailto:
>>>>>> Oswin.Krause_at_[hidden]>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ublas mailing list
>>>>> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
>>>>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>>>>> Sent to: Javaid.Salman_at_[hidden] <mailto:Javaid.Salman_at_[hidden]>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ublas mailing list
>>>>> ublas_at_[hidden]
>>>>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>>>>> Sent to:athanasios.iliopoulos.ctr.gr_at_[hidden]
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ublas mailing list
>>>> ublas_at_[hidden]
>>>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>>>> Sent to:Oswin.Krause_at_[hidden]
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ublas mailing list
>>> ublas_at_[hidden]
>>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>>> Sent to: rupp_at_[hidden]
>>>
>>>
>> _______________________________________________
>> ublas mailing list
>> ublas_at_[hidden]
>> http://lists.boost.org/mailman/listinfo.cgi/ublas
>> Sent to: Javaid.Salman_at_[hidden]
>>
>
>
>
> _______________________________________________
> ublas mailing listublas_at_[hidden]http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: Javaid.Salman_at_[hidden]
>