On Wed, Nov 18, 2009 at 20:33, Gunter Winkler <guwi17@gmx.de> wrote:

Hello David,

please read below:

David Bellot schrieb:

> Hi Ublas users,
>
> this is a newbie question:
>
> I am implementing a cross validation algorithm and for that purpose I
> need to split up a big matrix X1 and a big vector Y1 many times in
> different way. The idea is to use a percentage of the initial dataset
> (X1 and Y1) to fit a model and the rest of this dataset to test it.
>
> Let's say my fitting procedure is
>
> double fit(const matrix<double>& X, const vector<double>& y)
>
> and initially I have my big dataset defined as
> matrix<double> X1;
> vector<double> Y1;
>
> during the loop of the cross-validation algo, I split up X1 and Y1 in
> the following manner to obtain a Xtraining and Ytraining dataset and
> Xtest and Ytest dataset:
>
> |------------------------------------|
> | Xtraining | Xtest | Xtraining |
> | | | |
> | | | |
> |------------------------------------|
>
> |------------------------------------|
> | Ytraining | Ytest | Ytraining |
> |------------------------------------|

This looks like you split the matrix into 3 sets of columns and the
vector into 3 sets of corresponding elements.

>
> Of course, Xtest and Ytest is at a different position at each step of
> the loop.
> Xtest and Ytest are easy to obtain with a matrix_range<matrix<double> >
> However Xtraining and Ytraining require a copy of the data to a
> temporary matrix (and vector).

Why do you need a copy? The indirect_array a able to select a set of
columns and rows from a matrix.

>
> So how can I do that efficiently (indirect_array ? other ?) and do I
> need to redefine the prototype of my fit function ?
>

mfg
Gunter

_______________________________________________
ublas mailing list
ublas@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: david.bellot@gmail.com