Boost Users :

Date view	Thread view	Subject view	Author view

From: Tom Browder (tbrowder_at_[hidden])
Date: 2005-06-12 09:33:39

Next message: Martin: "Re: [Boost-users] Tokenizer Question"
Previous message: Tom Browder: "Re: [Boost-users] Tokenizer Question"
In reply to: Victor A. Wagner Jr.: "Re: [Boost-users] Tokenizer Question"
Next in thread: Robert Zeh: "Re: [Boost-users] Tokenizer Question"
Reply: Robert Zeh: "Re: [Boost-users] Tokenizer Question"

Victor, my humble tokenizer:

  string input_to_be_tokenized;
  istringstream ss;
  string s;
  deque<string> tokens;
  while (input_to_be_tokenized >> ss)
    tokens.push_back(ss);

I made 3 test programs:
  string inp;
  for (int i = 0; i < 10000000; ++i)
     inp += " a";
  // generate a container of tokens from inp with one of three methods:
     // my method (see above)
     // Victor's method
     // Boost, using a separator of (" \n\t")

  // if desired, loop over all the tokens using operator[] for the two
deques, and the iterator for Boost's container

Then I compiled them (gcc 3.4.2, Fedora Core 3, i386):

   g++ -pg progN progN.cc

Ran them without the final loop and saved 'gmon.out' as a unique name.

Ran them again with the final loop and saved 'gmon.out' as a unique name.

Ran all six gmon's and saved the outputs to unique files:

  gprof progN gmonX > X.prof

The accumulated times (sec) are surprising:

               Boost Victor's Mine
              ===== ===== ====
no loop 1.50 38.84 20.13
loop 131.91 38.89 23.91

Granted, I didn't do the tests multiple times, but it seems to me that the
Boost tokenizer is great if you don't need to iterate through it, but it is
the pits if you do.

-Tom

I'll send you my code and results if interested.

-Tom

   prog

Ran then a
Ran them with gprof

  gprof progN

Ran
  generate a string with 10,000,000 tokens (" a")_: " a a a ....a" and timed
your tokenizer against mine 10 times. Mine beat yours by 2 to 3 seconds
every time.

The I used the Boost tokenizer and the timings went WAY down.

So I think the benefits og the Boost tokenizer are well worth it, even for
trivial tokenizing.

-Tom



my tokenizer

_____

From: boost-users-bounces_at_[hidden]
[mailto:boost-users-bounces_at_[hidden]] On Behalf Of Victor A. Wagner
Jr.
Sent: Sunday, June 12, 2005 1:20 AM
To: boost-users_at_[hidden]
Subject: Re: [Boost-users] Tokenizer Question

At 19:19 2005-06-11, you wrote:

> for tokenizing on whitespace, simple stream input (>>) to a
> std::string suffices.

My own tokenizer does just that--and puts the tokens into a deque.

> IMO, it's hardly worth troubling yourself with a tokenizer
> for whitespace.

Well, not really. When parsing line-oriented output and semi-known
structured lines it's handy to be able to sometimes work with a line's
tokens as if they were in a vector or deque.

        string yourline;
        istringstream is( yourline );
        deque < string > yourvec(( istream_iterator < std :: string >( is
)), istream_iterator <std :: string >());

voila, a deque

it would be interesting to profile that against the hypothetical indexable
tokenizer.

In fact, I was going to add a suggestion that the tokenizer also have the []
operator so that the individual tokens could be addressed as tok[1], etc.

-Tom

_______________________________________________
Boost-users mailing list
Boost-users_at_[hidden]
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Victor A. Wagner Jr. http://rudbek.com <http://rudbek.com/>
The five most dangerous words in the English language:
"There oughta be a law"

text/html attachment: attachment

Next message: Martin: "Re: [Boost-users] Tokenizer Question"
Previous message: Tom Browder: "Re: [Boost-users] Tokenizer Question"
In reply to: Victor A. Wagner Jr.: "Re: [Boost-users] Tokenizer Question"
Next in thread: Robert Zeh: "Re: [Boost-users] Tokenizer Question"
Reply: Robert Zeh: "Re: [Boost-users] Tokenizer Question"

Date view	Thread view	Subject view	Author view

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net