Hi Ublas users,
this is a newbie question:
I am implementing a cross validation algorithm and for that purpose I need to split up a big matrix X1 and a big vector Y1 many times in different way. The idea is to use a percentage of the initial dataset (X1 and Y1) to fit a model and the rest of this dataset to test it.
Let's say my fitting procedure is
double fit(const matrix<double>& X, const vector<double>& y)
and initially I have my big dataset defined as
matrix<double> X1;
vector<double> Y1;
during the loop of the cross-validation algo, I split up X1 and Y1 in the following manner to obtain a Xtraining and Ytraining dataset and Xtest and Ytest dataset:
|------------------------------------|
| Xtraining | Xtest | Xtraining |
| | | |
| | | |
|------------------------------------|
|------------------------------------|
| Ytraining | Ytest | Ytraining |
|------------------------------------|
Of course, Xtest and Ytest is at a different position at each step of the loop.
Xtest and Ytest are easy to obtain with a matrix_range<matrix<double> >
However Xtraining and Ytraining require a copy of the data to a temporary matrix (and vector).
And this is my problem ! The dataset is to big and making a copy costs too much. I cannot afford having 2 copies of the dataset in memory (and copying them all the time).
So how can I do that efficiently (indirect_array ? other ?) and do I need to redefine the prototype of my fit function ?
Best Regards,
David
--
David Bellot, PhD
david.bellot@gmail.com
http://david.bellot.free.fr