Boost logo

Ublas :

Subject: Re: [ublas] Matrix decompositions.
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2013-05-28 11:41:18


Yes,
additionally with the boost transition to github we are planning to make
contributing to uBlas much easier. We would like though to stress
quality and find a way to define performance requirements
specifications. When the github repo (a fork of boostorg/ublas) is ready
(I hope I can have it ready this weekend) we will include instructions
on how to work with pull requests (to submit an algorithm or a small
bugfix for example), or in cases where larger involvement is required
maybe give access to the ublas development repo to ppl that want to
contribute.

-Nasos

On 05/28/2013 10:59 AM, Salman Javaid wrote:
> Are there any plans to integrate Karl's QR Decomposition
> implementation into Boost.uBLAS? I will be grateful for a response.
>
>
>
> Best Regards,
> Salman Javaid
>
>
> On Thu, May 16, 2013 at 7:06 AM, Karl Rupp <rupp_at_[hidden]
> <mailto:rupp_at_[hidden]>> wrote:
>
> Hi guys,
>
> I have a working implementation of QR for uBLAS in ViennaCL for
> about a year already:
>
> https://github.com/viennacl/viennacl-dev/blob/master/viennacl/linalg/qr.hpp
>
> Feel free to copy&paste and relicense as needed, I agree to
> whatever is necessary to integrate into uBLAS if of interest.
>
> There is a bit of duplication for the ViennaCL types (requiring
> host<->GPU transfers) and the uBLAS types (operating in main RAM),
> yet it gives an idea how things can be implemented. It can
> certainly be improved here and there (more details on request),
> yet it addresses most of the points raised by Oswin. And it's
> faster than a standard LAPACK for sizes above ~1k times 1k.
>
> I recommend extracting the Householder reflections into a nice
> separate interface, since this functionality will also be needed
> for other algorithms like SVD or GMRES. As a nice side effect, it
> makes the implementation for QR more compact.
>
> Generally, in order to get *any* reasonable performance, one
> really needs to use matrix-matrix multiplications where possible
> in order to avoid the memory bandwidth bottleneck.
>
> Best regards,
> Karli
>
>
>
> On 05/16/2013 01:06 AM, oswin krause wrote:
>
> Hi,
>
> These are further good points!
>
> I also came up with a few new ones(and tips!):
>
> - QR needs pivoting. It's main usages are SVD and
> pseudo-inverses. In
> both cases the input does not necessary have full rank. Also
> pivoting
> helps for matrices with high condition numbers.
>
> - For the same reasons H was not formed explicitly, Q should
> not be
> formed. Instead there should be a version of the algorithm
> which does
> only return the reflection vectors forming Q.
>
> - For dense matrices at least, it is possible to do the QR
> in-place by
> storing the R part as lower triangular and the householder
> transformation vectors in the upper triangular. (That is very
> similar to
> how LU is implemented).
>
> - The best sources for algorithmic information are the LAPACK
> working notes.
> http://www.netlib.org/lapack/lawns/
>
> In your case lawn114 sems to be the most relevant, even though it
> assumes a fast BLAS3.
>
> Greetings,
> Oswin
>
>
> On 16.05.2013 05:32, Nasos Iliopoulos wrote:
>
> That's not a bad start.
>
> I think Oswin covered a whole lot of items here, but a
> more complete
> algorithm needs to satisfy some or all of the following:
>
> - The algortihm should have a dispatch mechanism so that
> optimized
> versions for various matrix types can be provided.
> (sparse, banded,
> etc.). You don't necessarily need to provide them all to
> start with.
> - The algorithm should operate on matrix expressions
> rather than
> matrices (so it can be applied to for example subranges).
> Static
> dispatch or overload if for some reason this seems to
> reduce performance.
> - Const correctness is important. Try using const
> reference on
> immutable types.
> - Instead of 0.00025 provide a value based on a user
> choice.If it is
> hard coded by the user, the compiler will probably
> convert it into a
> const value.
> - Don't use ints for indexing, use either std::size_t, or
> container::size_type. If you need a signed type (i.e. to
> count for
> differences on unsigned types) use ptrdiff_t. uBlas
> containers provide
> a difference_type typedef for that purpose (i.e.
> matrix<double>::difference_type).
> - use noalias(..) = in every assignment that the lhs is
> not a part of
> rhs, or when the algebraic operation is mapped 1-1. (i.e.
> A=2*A+B can
> be written as noalias(A)=2*A+B, but A=prod(B,A)+D cannot
> atm). This
> provides more benefits than just avoiding temporaries.
>
>
> The QR decomposition of a 100x100 matrix should take no
> more than a
> few miliseconds (or even less than a milisecond) to run.
> A 1000x1000 should take around 1/3 to 1/10 of a sec.
>
> Compile with:
> g++ -O3 -DNDEBUG Main.cpp -o qrtest
>
> Then you'll see that your code runs pretty fast, but it
> doesn't scale
> well as Oswin noted.
>
> Best regards,
> -Nasos
>
>
>
>
> On 05/15/2013 10:12 PM, Salman Javaid wrote:
>
> Thank you, Oswin for the detailed response. I am going
> to update the
> code.
>
> David, Nasos, any advise on coding conventions? Or
> anything else that
> you can possible suggest? I will stand grateful.
>
>
>
>
>
> Best Regards,
> Salman Javaid
>
>
> On Tue, May 14, 2013 at 10:53 PM, oswin krause
> <oswin.krause_at_[hidden]
> <mailto:oswin.krause_at_[hidden]>
> <mailto:oswin.krause_at_[hidden]
> <mailto:oswin.krause_at_[hidden]>>> wrote:
>
> Hi,
>
> in the order I stumbled over the things:
>
> main.cpp
> line 44-54: you don't need a copy, instead you
> should use a
> combination of row/subrange.
> line 58-60: you should take a look at inner_prod
> line 63: 0.00025 is too big.
> line 66: You should never create H explicitly.
> line 67: because you formed H, this step is O(n^3)
> which makes
> the whole algorithm O(n^4). This can be done in O(n^2)
> line 73-79: same applies here.
>
> Greetings,
> Oswin
>
> On 14.05.2013 22:12, Salman Javaid wrote:
>
> Hello uBLAS Contributors:
>
> I have
> applied to GSoC 2013
> and pitched implementation of SVD
> factorization for uBLAS. In
> order to better prepare myself and to get my
> hands dirty at
> uBLAS, I ended up implementing QR
> Factorization employing
> Householder Reflections using uBLAS. This is
> only the first
> draft and will be needing significant
> improvement, e.g.,
> computation of QR decomposition of 100 * 100
> matrix takes around
> 30 seconds. But I guess just to get familiar
> with code base, it
> was a good exercise. Over the next week or two
> I will be trying
> to optimize the code.
>
>
> I will be
> absolutely
> grateful if contributors can have a quick
> glance at the code,
> and point me to any improvements they can
> suggest. Particularly
> in speeding up matrix multiplication.
>
> I used Visual Studio 2010 to compile the code.
> I will try to get
> the code running on my Ubuntu machine in a
> couple of days hopefully.
>
> Here the header file:
> https://github.com/salmanjavaid/QR_Decomposition/blob/master/QR_Header.hpp
>
>
> The main file:
> https://github.com/salmanjavaid/QR_Decomposition/blob/master/Main.cpp
>
>
> Best Regards,
> Salman Javaid
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> <mailto:ublas_at_[hidden]>
> <mailto:ublas_at_[hidden]
> <mailto:ublas_at_[hidden]>>
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to:Oswin.Krause_at_[hidden]
> <mailto:to%3AOswin.Krause_at_[hidden]>
> <mailto:Oswin.Krause_at_[hidden]
> <mailto:Oswin.Krause_at_[hidden]>>
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
> <mailto:ublas_at_[hidden]
> <mailto:ublas_at_[hidden]>>
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: Javaid.Salman_at_[hidden]
> <mailto:Javaid.Salman_at_[hidden]>
> <mailto:Javaid.Salman_at_[hidden]
> <mailto:Javaid.Salman_at_[hidden]>>
>
>
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to:athanasios.iliopoulos.ctr.gr_at_[hidden]
> <mailto:to%3Aathanasios.iliopoulos.ctr.gr_at_[hidden]>
>
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to:Oswin.Krause_at_[hidden]
> <mailto:to%3AOswin.Krause_at_[hidden]>
>
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: rupp_at_[hidden] <mailto:rupp_at_[hidden]>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: Javaid.Salman_at_[hidden] <mailto:Javaid.Salman_at_[hidden]>
>
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]