At 07:33 2005-06-12, Tom Browder wrote:
Victor, my humble
tokenizer:
string
input_to_be_tokenized;
istringstream ss;
string s;
deque<string> tokens;
while (input_to_be_tokenized >> ss)
tokens.push_back(ss);
I made 3 test programs:
string inp;
for (int i = 0; i < 10000000; ++i)
inp += " a";
// generate a container of tokens from inp with one of three
methods:
// my method (see above)
// Victor's method
// Boost, using a separator of ("
\n\t")
// if desired, loop over
all the tokens using operator[] for the two deques, and the iterator for
Boost's container
Then I compiled them (gcc
3.4.2, Fedora Core 3, i386):
g++ -pg progN
progN.cc
Ran them without the final loop
and saved 'gmon.out' as a unique name.
Ran them again with the final
loop and saved 'gmon.out' as a unique name.
Ran all six gmon's and saved
the outputs to unique files:
gprof progN gmonX >
X.prof
The accumulated times (sec) are
surprising:
Boost Victor's Mine
===== ===== ====
no loop
1.50 38.84 20.13
loop
131.91 38.89
23.91
it's difficult to fathom a hand written loop beating the internal
constructor for deque
Granted, I didn't do the tests
multiple times, but it seems to me that the Boost tokenizer is great if
you don't need to iterate through it, but it is the pits if you do.
-Tom
I'll send you my code and
results if interested.
-Tom
prog
Ran then a
Ran them with gprof
gprof progN
Ran
generate a string with 10,000,000 tokens (" a")_: "
a a a ....a" and timed your tokenizer against mine 10 times.
Mine beat yours by 2 to 3 seconds every time.
The I used the Boost tokenizer
and the timings went WAY down.
So I think the benefits og the
Boost tokenizer are well worth it, even for trivial tokenizing.
-Tom
my tokenizer
- From:
boost-users-bounces@lists.boost.org
[
mailto:boost-users-bounces@lists.boost.org] On Behalf Of Victor
A. Wagner Jr.
- Sent: Sunday, June 12, 2005 1:20 AM
- To: boost-users@lists.boost.org
- Subject: Re: [Boost-users] Tokenizer Question
- At 19:19 2005-06-11, you wrote:
- > for tokenizing on whitespace, simple stream input (>>) to
a
- > std::string suffices.
- My own tokenizer does just that--and puts the tokens into a
deque.
- > IMO, it's hardly worth troubling yourself with a tokenizer
- > for whitespace.
- Well, not really. When parsing line-oriented output and
semi-known
- structured lines it's handy to be able to sometimes work with a
line's
- tokens as if they were in a vector or deque.
- string
yourline;
-
istringstream is( yourline
);
- deque
< string
> yourvec((
istream_iterator < std
:: string >(
is )),
istream_iterator <std
:: string
>());
- voila, a deque
- it would be interesting to profile that against the hypothetical
indexable tokenizer.
- In fact, I was going to add a suggestion that the tokenizer also have
the []
- operator so that the individual tokens could be addressed as tok[1],
etc.
- -Tom
- _______________________________________________
- Boost-users mailing list
- Boost-users@lists.boost.org
-
http://lists.boost.org/mailman/listinfo.cgi/boost-users
- Victor A. Wagner Jr.
http://rudbek.com
- The five most dangerous words in the English language:
-
"There oughta be a law"
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Victor A. Wagner Jr.
http://rudbek.com
The five most dangerous words in the English language:
"There oughta be a law"