Subject: Re: [ublas] Matrix decompositions.
From: oswin krause (oswin.krause_at_[hidden])
Date: 2013-05-16 02:06:39
These are further good points!
I also came up with a few new ones(and tips!):
- QR needs pivoting. It's main usages are SVD and pseudo-inverses. In
both cases the input does not necessary have full rank. Also pivoting
helps for matrices with high condition numbers.
- For the same reasons H was not formed explicitly, Q should not be
formed. Instead there should be a version of the algorithm which does
only return the reflection vectors forming Q.
- For dense matrices at least, it is possible to do the QR in-place by
storing the R part as lower triangular and the householder
transformation vectors in the upper triangular. (That is very similar to
how LU is implemented).
- The best sources for algorithmic information are the LAPACK working notes.
In your case lawn114 sems to be the most relevant, even though it
assumes a fast BLAS3.
On 16.05.2013 05:32, Nasos Iliopoulos wrote:
> That's not a bad start.
> I think Oswin covered a whole lot of items here, but a more complete
> algorithm needs to satisfy some or all of the following:
> - The algortihm should have a dispatch mechanism so that optimized
> versions for various matrix types can be provided. (sparse, banded,
> etc.). You don't necessarily need to provide them all to start with.
> - The algorithm should operate on matrix expressions rather than
> matrices (so it can be applied to for example subranges). Static
> dispatch or overload if for some reason this seems to reduce performance.
> - Const correctness is important. Try using const reference on
> immutable types.
> - Instead of 0.00025 provide a value based on a user choice.If it is
> hard coded by the user, the compiler will probably convert it into a
> const value.
> - Don't use ints for indexing, use either std::size_t, or
> container::size_type. If you need a signed type (i.e. to count for
> differences on unsigned types) use ptrdiff_t. uBlas containers provide
> a difference_type typedef for that purpose (i.e.
> - use noalias(..) = in every assignment that the lhs is not a part of
> rhs, or when the algebraic operation is mapped 1-1. (i.e. A=2*A+B can
> be written as noalias(A)=2*A+B, but A=prod(B,A)+D cannot atm). This
> provides more benefits than just avoiding temporaries.
> The QR decomposition of a 100x100 matrix should take no more than a
> few miliseconds (or even less than a milisecond) to run.
> A 1000x1000 should take around 1/3 to 1/10 of a sec.
> Compile with:
> g++ -O3 -DNDEBUG Main.cpp -o qrtest
> Then you'll see that your code runs pretty fast, but it doesn't scale
> well as Oswin noted.
> Best regards,
> On 05/15/2013 10:12 PM, Salman Javaid wrote:
>> Thank you, Oswin for the detailed response. I am going to update the
>> David, Nasos, any advise on coding conventions? Or anything else that
>> you can possible suggest? I will stand grateful.
>> Best Regards,
>> Salman Javaid
>> On Tue, May 14, 2013 at 10:53 PM, oswin krause
>> <mailto:oswin.krause_at_[hidden]>> wrote:
>> in the order I stumbled over the things:
>> line 44-54: you don't need a copy, instead you should use a
>> combination of row/subrange.
>> line 58-60: you should take a look at inner_prod
>> line 63: 0.00025 is too big.
>> line 66: You should never create H explicitly.
>> line 67: because you formed H, this step is O(n^3) which makes
>> the whole algorithm O(n^4). This can be done in O(n^2)
>> line 73-79: same applies here.
>> On 14.05.2013 22:12, Salman Javaid wrote:
>>> Hello uBLAS Contributors:
>>> I have applied to GSoC 2013
>>> and pitched implementation of SVD factorization for uBLAS. In
>>> order to better prepare myself and to get my hands dirty at
>>> uBLAS, I ended up implementing QR Factorization employing
>>> Householder Reflections using uBLAS. This is only the first
>>> draft and will be needing significant improvement, e.g.,
>>> computation of QR decomposition of 100 * 100 matrix takes around
>>> 30 seconds. But I guess just to get familiar with code base, it
>>> was a good exercise. Over the next week or two I will be trying
>>> to optimize the code.
>>> I will be absolutely
>>> grateful if contributors can have a quick glance at the code,
>>> and point me to any improvements they can suggest. Particularly
>>> in speeding up matrix multiplication.
>>> I used Visual Studio 2010 to compile the code. I will try to get
>>> the code running on my Ubuntu machine in a couple of days hopefully.
>>> Here the header file:
>>> The main file:
>>> Best Regards,
>>> Salman Javaid
>>> ublas mailing list
>>> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
>>> Sent to:Oswin.Krause_at_[hidden] <mailto:Oswin.Krause_at_[hidden]>
>> ublas mailing list
>> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
>> Sent to: Javaid.Salman_at_[hidden] <mailto:Javaid.Salman_at_[hidden]>
>> ublas mailing list
>> Sent to:athanasios.iliopoulos.ctr.gr_at_[hidden]
> ublas mailing list
> Sent to: Oswin.Krause_at_[hidden]