Here is a simple patch for discussion that implements:

matrix<double> A;

vector<double> y;

cout << 2.0 * y << endl;

cout << y * 2 << endl;

cout << A * 2 << endl;

cout << 2 * A << endl;

cout << A * y << endl;

cout << y * A << endl;

cout << A * A << endl;

The patch can be applied in the ublas directory. All it does is:

- For the scalar * vector and scalar * matrix operator overloads, it uses boost::enable_if to remove anything from those overloads that doesn't fit boost::is_arithmetic. This is to prevent a clash with the new operator *

- Then, new * operators are added for matrix*vector, vector*matrix, and matrix*matrix, which forward directly to the appropriate prod() dispatcher.

- Is this as simple as I think it is? Is there any chance of overhead here?
- What about y * y for two vectors? I think we should hold off on this because of the inner_prod vs. outer_prod confusion. Perhaps with the extended numeric_traits created for the bindings this could be used to emulate some kind of matlab approach with explicitly stated orientation (row vs. column vectors), but it may be more confusing than not. I personally hate the conformance bugs that come out of * meaning both inner and outer product in matlab.

- What about b *= A; and A *= B? Am I right to think this kind of aliased multiplication should be using axpy_prod(b, A, b, false); and axpy_prod(A, B, A, false); ? Should we just call this in the overload? This operation comes up all over the place in the statistical work that i use it for.

- Operator * will do a (relatively) inefficient nested matrix multiplication using a heap allocated, row-major matrix<T> as the temporary object. What type to use for T? Use some kind of type promotion on the value_type inside of the two argument types....
- The docs tell people what it does, and how to have finer control with the prod(A, B, temp); approach.
- In line 4834, etc. of matrix_expression it tests for the complexity at compile time. Instead, in the operator* overload, we could detect the complexity at compile time and tag-dispatch to a function that would create the appropriate temporary object based on the rules I mentioned above. It would then call the prod() dispatcher. We probably would have to do this for the vector*matrix operator overloads as well.
- Would this work? Any overhead or gotchas? I don't think I am capable of implementing it myself, but can help experts out with testing, etc?