Boost logo

Boost Users :

From: Pavol Droba (droba_at_[hidden])
Date: 2007-12-13 15:14:00


chun ping wang wrote:
> Hi I was wondering which one is better and faster to split a file of csv
> value of number and put it into container of double.
> 1.) Which option is better.
> // method 1.
> std::vector<std::string> split_string;
> boost::algorithm::trim(flist);
> boost::algorithm::split(split_string, flist,
> boost::algorithm::is_any_of(","));
> std::vector<double> elements;
> BOOST_FOREACH(std::string s, split_string)
> {
> elements += boost::lexical_cast<double>(s);
> }
>
> // method 2.
> boost::char_separator<char> sep(",");
> boost::tokenizer<boost::char_separator<char> >
> tokens(flist, sep);
> std::vector<double> elements;
> BOOST_FOREACH(std::string token, tokens)
> {
> elements += boost::lexical_cast<double>(token);
> }
>
> 2.) When is it better to use string algorithm split instead of tokenizer
> and vice versa.
>

Hi,

I didn't make any speed comparison between split and tokenizer, but
there are ways for significant speed improvements when using split
algorithm.

Most speed problems results from unvanted copying of strings. This is
quite costly operation and it should be avoided at all cost it the speed
is important.

First, there is an obvious problem in your code. In BOOST_FOREACH, you
are missing a reference in the string parameter. This means, that every
string will be copied in the loop.

You can improve the actual usage of split algorithm as well.
Quite significant speedup can be achieved if you use
std::vector<boost::iterator_range<std::string::iterator> > to hold
results instead of vector-of-strings.
This way split algorthm will only store references to tokens in the
original string, avoiding any copying until it is realy needes.

Going one step futher, you can avoid using intermediate vector at all.
You can use split_iterator directly.

split_iterator<string::iterator>
        siter=make_split_iterator(
                flist,
                token_finder(is_any_of(","), token_compress_off));
BOOST_FOREACH(
        iterator_range<string::iterator> rngToken,
        make_range(siter, split_iterator<string::iterator>())
{
        // Do whatever you want with token here.
        // It is represented by an iterator_range so no copying
        // has been done yet.

        // You can make a copy if necessary
        string strToken = copy_range<string>(rngToken)
}
        
Best Regards,
Pavol.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net