Hi,
These are further good points!
I also came up with a few new ones(and tips!):
- QR needs pivoting. It's main usages are SVD and pseudo-inverses.
In both cases the input does not necessary have full rank. Also
pivoting helps for matrices with high condition numbers.
- For the same reasons H was not formed explicitly, Q should not
be formed. Instead there should be a version of the algorithm
which does only return the reflection vectors forming Q.
- For dense matrices at least, it is possible to do the QR
in-place by storing the R part as lower triangular and the
householder transformation vectors in the upper triangular. (That
is very similar to how LU is implemented).
- The best sources for algorithmic information are the LAPACK
working notes.
http://www.netlib.org/lapack/lawns/
In your case lawn114 sems to be the most relevant, even though it
assumes a fast BLAS3.
Greetings,
Oswin
On 16.05.2013 05:32, Nasos Iliopoulos wrote:
That's not a bad start.
I think Oswin covered a whole lot of items here, but a more
complete algorithm needs to satisfy some or all of the
following:
- The algortihm should have a dispatch mechanism so that
optimized versions for various matrix types can be provided.
(sparse, banded, etc.). You don't necessarily need to provide
them all to start with.
- The algorithm should operate on matrix expressions rather than
matrices (so it can be applied to for example subranges). Static
dispatch or overload if for some reason this seems to reduce
performance.
- Const correctness is important. Try using const reference on
immutable types.
- Instead of 0.00025 provide a value based on a user choice.If
it is hard coded by the user, the compiler will probably
convert it into a const value.
- Don't use ints for indexing, use either std::size_t, or
container::size_type. If you need a signed type (i.e. to count
for differences on unsigned types) use ptrdiff_t. uBlas
containers provide a difference_type typedef for that purpose
(i.e. matrix<double>::difference_type).
- use noalias(..) = in every assignment that the lhs is not a
part of rhs, or when the algebraic operation is mapped 1-1.
(i.e. A=2*A+B can be written as noalias(A)=2*A+B, but A=prod(B,A)+D cannot atm). This
provides more benefits than just avoiding temporaries.
The QR decomposition of a 100x100 matrix should take no more
than a few miliseconds (or even less than a milisecond) to run.
A 1000x1000 should take around 1/3 to 1/10 of a sec.
Compile with:
g++ -O3 -DNDEBUG Main.cpp -o qrtest
Then you'll see that your code runs pretty fast, but it doesn't
scale well as Oswin noted.
Best regards,
-Nasos
On 05/15/2013 10:12 PM, Salman Javaid wrote:
Thank you, Oswin for the detailed response. I am going to
update the code.
David, Nasos, any advise on coding conventions? Or anything
else that you can possible suggest? I will stand grateful.
_______________________________________________
ublas mailing list
ublas@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: athanasios.iliopoulos.ctr.gr@nrl.navy.mil
_______________________________________________
ublas mailing list
ublas@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: Oswin.Krause@ruhr-uni-bochum.de