|
Ublas : |
Subject: Re: [ublas] New uBLAS maintainer
From: Jesse Perla (jesseperla_at_[hidden])
Date: 2010-03-15 11:40:17
> David Bellot <david.bellot <at> gmail.com> writes:
Thanks David, and thanks to Gunter, Rutger, Thomas, and everyone else
who has helped me over the years.
I really think the emphasis should be on getting the simple matrix and
linear
algebra semantics up to snuff first. As someone who looked at matrix
libraries
fairly recently and has tried to get others to use ublas, they almost always
start by looking at the notation for basic operations (multiplication,
assignment of small vectors, whether big matrices can be returned by value,
how to interface with data already in a C array).
> (1) as you can imagine, in machine learning, one often needs to "randomly"
?
> access to sub-matrices. A good framework is already in place for
matrix_view, >
> I would like to extend it so that to make it as versatile as it is in
other
> libraries or even Matlab.
I use these features as well and would appeciate it, though I would put it
at
a lower priority than other features.
>(2) after reading last week emails, I think we could provide basic
> implementations of a few standard algorithms like inversion, solvers,
etc...
Yup, but only a few basics with caveats that these should not be considered
high performance. I really think the bindings are a way to get the monkey
off
everyones backs with this. I think the ones that people commonly use are:
* cholesky
* LU
* determinant
* trace
* inversion (I know, I know, but sometimes it actually is the right way
to do it and it isn't our job to teach good linear algebra....)
* Basic direct solver, probably like the one on gunters site.
* Basic direct sparse solver. Having a baseline here is important since
MUMPS, etc. are less prevalent than LAPACK interfaces and tougher to build
* Hopefully all (or most) of these already have coding done.
(3) bindings are a hot topic. Let's be pragmatic: it's not supposed to
> be part of uBLAS but having a standard interface would add a strong value
to uBLAS. And, I am like others, I want to play with my brand new nVidia
card.
To me, bindings are the answer to almost all algorithms. It is ublas's
job to get all of the other stuff right first and foremost. Among other
things
ublas may not be able to compete against newer libraries on performance
for organically coded algorithms.
All the GPU stuff would be nice, but I think it is a 3rd order concern.
Lets
get the semantics of the matrices closer to the newer libraries like eigen,
get the bindings working beautifully with LAPACK and a couple of sparse
routines, then worry about other gravy.
>(4) another hot topic which is a recurrent complain about uBLAS: the
product
> of 2 matrices. Do we want prod(A,B) or A*B. Let's think about it because
> other libraries implemented A*B in a very efficient manner too.
Don't forget products of 3 matrices! Try to get new users on ublas and this
is almost a deal breaker. It was such a pain that I submitted a trivial
patch
to try and get them off my back.
> (6) I will join Gunter in his effort to provide new documentation,
> covering more topics, with tutorial and advanced topics. uBLAS is a great
> library and a good documentation is of primary interest. That is one of
> the most important topic for me (yes, way more than prod(A,B) versus
> A*B)
I don't agree with basic semantics being less important here. And if they
change at all, then tutorials would need to be rewritten. I think that
most on this list has been using prod() for so long and been looking at
the lower level code forever, that they have forgotten what a pain and poor
semantic comparison ublas has to matlab or other C++ libraries.
When those things are fixed up, then the docs could be changed to support
them.
A couple of the other user centric features that are missing (mostly in
order):
0) Matrix products A*B*C
When I can implement things like the following I will shut up.
http://en.wikipedia.org/wiki/Extended_Kalman_filter#Predict_and_update_equations
1) Construction of fixed sized matrices/vectors with something like:
ublas::vector<int> x = 1, 2, 3;
ublas::c_vector<int, 3> x = 1, 2, 3;
or at the vary least:
ublas::c_vector<int, 3> x = {1, 2, 3}; //(using old style assignment of
structs
... I thought this might work and be similar to boost::array, but couldn't
get it to work myself. I think it has something to do with inheritance.
Everyone else seems to do this and it makes ublas look pretty bad when
writing test cost where this comes up all the time. But shouldn't we be
writing a huge chunk of our test code regardless?
2) Stacking of matrices/vectors. Matlab guys, and especially those doing
econometrics/statistics/etc. do this all the time and don't even think about
it. I think an efficient implementation is related to submatrix access...
matlab:
y = [A;B];
potential ublas:
auto y = stack_vertical(A, B);
This comes up all of the time in the type of code I write.
3) Efficient operations for small, fixed sized vectors (is this supposed to
be c_vector and c_matrix?). I haven't done any direct comparisons to other
libraries on performance, but since they all say that they are heavily
optimized for this, I think a comparison and a few features as necessary
are important if ublas is going to be a general purpose matrix library.
Small
vectors come up ALL the time and if I use TVMet or something like that, then
I can't use linear algebra against my other ublas matrices.
4) Clear interface to existing C arrays. I think that there is some sort of
patch or undocumented feature here from gunter, but it needs to be well
thought
through and formalized..... in interfacing with external libraries, you end
up
doing this all of the time and copying between vector types is not
acceptable.
This also can't just be a read-only interface, though turning off resizing
could be acceptable.
5) kron. I believe there is a patch out there that should be integrated
and verified. Matlab guys use this all of the time.
6) row/column reductions. Some of these might be there, but I have trouble
figuring out what to do. Think matlab sum(A), etc.
7) Efficient return by value (can be contingent on rvalue refs in your
compiler
A lot of work on patches here exists and I am sure it is close.
8) Adaptors for multi-array
This comes up frequently where people want to take a 1 or 2D slice of a
multi_array and performance boost operations on it. I have had to copy
data to ublas structures myself, and here was another post recently
http://archives.free.net.ph/message/20100115.085831.26a2457c.en.html
9) Data interchange to matlab.
One of the first things I do when writing my programs is to generate data
for interpretation and graphing in matlab. I hate to write generic,
non-linear code in matlab but it is great for analysis and graphing. I
think most people would be in a similar boat. It also looks like
K.M.A Chai graciously submitted code that does things like this:
http://archives.free.net.ph/message/20090818.161723.0daaae36.el.html
Here is an older email when I made a plea about some of this. It looks like
I am fairly consistent with what is here..
http://archives.free.net.ph/message/20090819.015803.23707d5f.el.html
Sorry for the blast of features, but I think a decision needs to be made:
Is the goal of ublas to be competitive with features of newer libraries
that are semantically rich and target recovering matlab users (the biggest
and most important group if you ask me). Or is it to make patches for the
existing users and slow down the point where ublas becomes non-competitive.
Both are defensible strategies. If you want to do the former, then I think
the list I have given is a pretty comprehensive one and along with docs
would
make ublas competitive.
Thanks so much for listening and getting this great library to the point it
is
already at.
-Jesse