That's not a bad start.

I think Oswin covered a whole lot of items here, but a more complete algorithm needs to satisfy some or all of  the following:

- The algortihm should have a dispatch mechanism so that optimized versions for various matrix types can be provided. (sparse, banded, etc.). You don't necessarily need to provide them all to start with.
- The algorithm should operate on matrix expressions rather than matrices (so it can be applied to for example subranges). Static dispatch or overload if for some reason this seems to reduce performance.
- Const correctness is important. Try  using const reference on immutable types.
- Instead of 0.00025 provide a value based on a user choice.If it is hard coded by the user, the compiler will  probably convert it into a const value.
- Don't use ints for indexing, use either std::size_t, or container::size_type. If you need a signed type (i.e. to count for differences on unsigned types) use ptrdiff_t. uBlas containers provide a difference_type typedef for that purpose (i.e. matrix<double>::difference_type).
- use noalias(..) =  in every assignment that the lhs is not a part of rhs, or when the algebraic operation is mapped 1-1. (i.e. A=2*A+B can be written as noalias(A)=2*A+B, but A=prod(B,A)+D cannot atm). This provides more benefits than just avoiding temporaries.

The QR decomposition of a 100x100 matrix should take no more than a few miliseconds (or even less than a milisecond) to run.
A 1000x1000 should take around 1/3 to 1/10 of a sec.

Compile with:
g++ -O3 -DNDEBUG Main.cpp -o qrtest

Then you'll see that your code runs pretty fast, but it doesn't scale well as Oswin noted.

Best regards,


On 05/15/2013 10:12 PM, Salman Javaid wrote:
Thank you, Oswin for the detailed response. I am going to update the code.

David, Nasos, any advise on coding conventions? Or anything else that you can possible suggest? I will stand grateful.

Best Regards,
Salman Javaid

On Tue, May 14, 2013 at 10:53 PM, oswin krause <> wrote:

in the order I stumbled over the things:

line 44-54: you don't need a copy, instead you should use a combination of row/subrange.
line 58-60: you should take a look at inner_prod
line 63: 0.00025 is too big.
line 66: You should never create H explicitly.
line 67: because you formed H, this step is O(n^3) which makes the whole algorithm O(n^4). This can be done in O(n^2)
line 73-79: same applies here.


On 14.05.2013 22:12, Salman Javaid wrote:
Hello uBLAS Contributors:

                                     I have applied to GSoC 2013 and pitched implementation of SVD factorization for uBLAS. In order to better prepare myself and to get my hands dirty at uBLAS, I ended up implementing QR Factorization employing Householder Reflections using uBLAS. This is only the first draft and will be needing significant improvement, e.g., computation of QR decomposition of 100 * 100 matrix takes around 30 seconds. But I guess just to get familiar with code base, it was a good exercise. Over the next week or two I will be trying to optimize the code.

                                     I will be absolutely grateful if contributors can have a quick glance at the code, and point me to any improvements they can suggest. Particularly in speeding up matrix multiplication.

I used Visual Studio 2010 to compile the code. I will try to get the code running on my Ubuntu machine in a couple of days hopefully.

Best Regards,
Salman Javaid

ublas mailing list
Sent to:

ublas mailing list
Sent to:

ublas mailing list
Sent to: