Boost logo

Boost Users :

From: Chengyuan Ma (chengyuan_ma_at_[hidden])
Date: 2008-01-28 10:58:07


I have a text file in the following format: (sparse data format,
index:value)

0.0 1:0.269474 2:0.145364 3:0.067149 4:0.112643 5:0.212212 6:0.244601
7:0.181663 8:0.238227 9:0.848362 10:0.266284 11:0.058374 12:0.071349
13:0.192308 14:0.20059 15:0.256923 16:0.385338 17:0.123268 18:0.119405
19:0.350768 20:0.187007 21:0.369464 22:0.056476 23:0.059463 24:0.07298
25:0.158566 26:0.192542 27:0.315876 28:0.503185 29:0.216059 30:0.122681
31:0.228612 32:0.116034 33:0.12488 34:0.171623 35:0.222429 36:0.278741
37:0.170732 38:0.404539 39:0.078273 40:0.201989 41:0.367349 42:0.310658
43:0.176915 44:0.215489 45:0.207045 46:0.267294 47:0.158534 48:0.114389
49:0.085446 50:0.141968 51:0.11669 52:0.804789 53:0.533344 54:0.112373
55:0.173574 56:0.495218 57:0.122419 58:0.091748 59:0.209178 60:0.100954
61:0.168572 62:0.130615 63:0.080905 64:0.552943 65:0.208904 66:0.072037
67:0.166432 68:0.539735 69:0.186302 70:0.161657 71:0.135055 72:0.131747
73:0.434487 74:0.235148 75:0.119409 76:0.137161 77:0.186354 78:0.182466
79:0.105231 80:0.049308 81:0.199764 82:0.275725 83:0.369274 84:0.222261
85:0.1464 86:0.396967 87:0.937 88:0.983 90:0.983 91:1.0 92:1.0

This one is just one line, i have 64000 lines like this one.

What's the best way to load the data? I use Boost::RegEx and
Boost::lexical_cast to do this. But It takes 2 minutes to read all the
data. Is there a better way to do this?

bool LibFile::ReadFile(const string &fileName)
{
 ifstream fin(fileName.c_str(), ios::in) ;
 boost::regex elabel("^([0-9]+\\.?[0-9]+)", boost::regbase::icase);
 boost::regex eitem("(\\d+):([-+]?[0-9]*\\.?[0-9]+)",
boost::regbase::icase);

 while (fin.good())
 {
  string buffer ;
  getline(fin, buffer) ;

  if ( buffer.length() > 0)
  {
   Instance *pinstance = new Instance() ;
   pinstance->tag = "notag" ;
   pinstance->vector = new double[featureDim] ;
   for ( int ii = 0 ; ii < featureDim; ii ++ )
   {
    pinstance->vector[ii] = 0.0 ;
   }

   boost::smatch what;
   string::const_iterator itb = buffer.begin() ;
   string::const_iterator ite = buffer.end() ;

   double label = 0.0 ;
   if ( boost::regex_search( itb, ite, what, elabel) )
   {
    label = boost::lexical_cast<double>(what[1].str()) ;
    itb = what[0].second ;
   }

   while ( boost::regex_search( itb, ite, what, eitem) )
   {
    int index = boost::lexical_cast<int>(what[1].str()) ;
    double val = boost::lexical_cast<double>(what[2].str()) ;
    if ( index <= featureDim)
    {
     pinstance->vector[index - 1] = val ;
    }
    itb = what[0].second ;
   }

   vecInstance.push_back(pinstance) ;
   instanceNum ++ ;
  }
 }
 fin.close() ;
 return true ;
}


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net