Boost logo

Boost Users :

From: Victor A. Wagner Jr. (vawjr_at_[hidden])
Date: 2005-06-12 16:00:04


At 07:33 2005-06-12, Tom Browder wrote:
>Victor, my humble tokenizer:
>
> string input_to_be_tokenized;
> istringstream ss;
> string s;
> deque<string> tokens;
> while (input_to_be_tokenized >> ss)
> tokens.push_back(ss);
>
>I made 3 test programs:
> string inp;
> for (int i = 0; i < 10000000; ++i)
> inp += " a";
> // generate a container of tokens from inp with one of three methods:
> // my method (see above)
> // Victor's method
> // Boost, using a separator of (" \n\t")
>
> // if desired, loop over all the tokens using operator[] for the two
> deques, and the iterator for Boost's container
>
>Then I compiled them (gcc 3.4.2, Fedora Core 3, i386):
>
> g++ -pg progN progN.cc
>
>Ran them without the final loop and saved 'gmon.out' as a unique name.
>
>Ran them again with the final loop and saved 'gmon.out' as a unique name.
>
>Ran all six gmon's and saved the outputs to unique files:
>
> gprof progN gmonX > X.prof
>
>The accumulated times (sec) are surprising:
>
> Boost Victor's Mine
> ===== ===== ====
>no loop 1.50 38.84 20.13
>loop 131.91 38.89 23.91

it's difficult to fathom a hand written loop beating the internal
constructor for deque

>
>Granted, I didn't do the tests multiple times, but it seems to me that the
>Boost tokenizer is great if you don't need to iterate through it, but it
>is the pits if you do.
>
>-Tom
>
>I'll send you my code and results if interested.
>
>-Tom
>
> prog
>
>Ran then a
>Ran them with gprof
>
> gprof progN
>
>Ran
> generate a string with 10,000,000 tokens (" a")_: " a a a ....a" and
> timed your tokenizer against mine 10 times. Mine beat yours by 2 to 3
> seconds every time.
>
>The I used the Boost tokenizer and the timings went WAY down.
>
>So I think the benefits og the Boost tokenizer are well worth it, even for
>trivial tokenizing.
>
>-Tom
>
>
>
>my tokenizer
>
>
>----------
>From: boost-users-bounces_at_[hidden]
>[mailto:boost-users-bounces_at_[hidden]] On Behalf Of Victor A. Wagner Jr.
>Sent: Sunday, June 12, 2005 1:20 AM
>To: boost-users_at_[hidden]
>Subject: Re: [Boost-users] Tokenizer Question
>
>At 19:19 2005-06-11, you wrote:
>> > for tokenizing on whitespace, simple stream input (>>) to a
>> > std::string suffices.
>>
>>My own tokenizer does just that--and puts the tokens into a deque.
>>
>> > IMO, it's hardly worth troubling yourself with a tokenizer
>> > for whitespace.
>>
>>Well, not really. When parsing line-oriented output and semi-known
>>structured lines it's handy to be able to sometimes work with a line's
>>tokens as if they were in a vector or deque.
>
> string yourline;
> istringstream is( yourline );
> deque < string > yourvec(( istream_iterator < std :: string >( is
> )), istream_iterator <std :: string >());
>
>voila, a deque
>
>it would be interesting to profile that against the hypothetical indexable
>tokenizer.
>
>
>>In fact, I was going to add a suggestion that the tokenizer also have the []
>>operator so that the individual tokens could be addressed as tok[1], etc.
>>
>>-Tom
>>
>>_______________________________________________
>>Boost-users mailing list
>>Boost-users_at_[hidden]
>>http://lists.boost.org/mailman/listinfo.cgi/boost-users
>
>Victor A. Wagner Jr. http://rudbek.com
>The five most dangerous words in the English language:
> "There oughta be a law"
>
>_______________________________________________
>Boost-users mailing list
>Boost-users_at_[hidden]
>http://lists.boost.org/mailman/listinfo.cgi/boost-users

Victor A. Wagner Jr. http://rudbek.com
The five most dangerous words in the English language:
               "There oughta be a law"



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net