Hi Ublas users,

this is a newbie question:

I am implementing a cross validation algorithm and for that purpose I need to split up a big matrix X1 and a big vector Y1 many times in different way. The idea is to use a percentage of the initial dataset (X1 and Y1) to fit a model and the rest of this dataset to test it.

Let's say my fitting procedure is

double fit(const matrix<double>& X, const vector<double>& y)


and initially I have my big dataset defined as
matrix<double> X1;

vector<double> Y1;

during the loop of the cross-validation algo, I split up X1 and Y1 in the following manner to obtain a Xtraining and Ytraining dataset and Xtest and Ytest dataset:

|------------------------------------|
|    Xtraining   | Xtest | Xtraining |

|                |       |           |
|                |       |           |
|------------------------------------|

|------------------------------------|
| Ytraining      | Ytest | Ytraining |
|------------------------------------|

Of course, Xtest and Ytest is at a different position at each step of the loop.
Xtest and Ytest are easy to obtain with a matrix_range<matrix<double> >
However Xtraining and Ytraining require a copy of the data to a temporary matrix (and vector).

And this is my problem ! The dataset is to big and making a copy costs too much. I cannot afford having 2 copies of the dataset in memory (and copying them all the time).

So how can I do that efficiently (indirect_array ? other ?) and do I need to redefine the prototype of my fit function ?

Best Regards,
David


--
David Bellot, PhD
david.bellot@gmail.com
http://david.bellot.free.fr