Boost logo

Ublas :

Subject: [ublas] [newbie] submatrix and rest of a matrix
From: David Bellot (david.bellot_at_[hidden])
Date: 2009-11-17 12:55:48


Hi Ublas users,

this is a newbie question:

I am implementing a cross validation algorithm and for that purpose I need
to split up a big matrix X1 and a big vector Y1 many times in different way.
The idea is to use a percentage of the initial dataset (X1 and Y1) to fit a
model and the rest of this dataset to test it.

Let's say my fitting procedure is

double fit(const matrix<double>& X, const vector<double>& y)

and initially I have my big dataset defined as
matrix<double> X1;
vector<double> Y1;

during the loop of the cross-validation algo, I split up X1 and Y1 in the
following manner to obtain a Xtraining and Ytraining dataset and Xtest and
Ytest dataset:

|------------------------------------|
| Xtraining | Xtest | Xtraining |
| | | |
| | | |
|------------------------------------|

|------------------------------------|
| Ytraining | Ytest | Ytraining |
|------------------------------------|

Of course, Xtest and Ytest is at a different position at each step of the
loop.
Xtest and Ytest are easy to obtain with a matrix_range<matrix<double> >
However Xtraining and Ytraining require a copy of the data to a temporary
matrix (and vector).

And this is my problem ! The dataset is to big and making a copy costs too
much. I cannot afford having 2 copies of the dataset in memory (and copying
them all the time).

So how can I do that efficiently (indirect_array ? other ?) and do I need to
redefine the prototype of my fit function ?

Best Regards,
David

-- 
David Bellot, PhD
david.bellot_at_[hidden]
http://david.bellot.free.fr