|
Boost : |
From: Robert Zeh (razeh_at_[hidden])
Date: 2003-12-03 11:43:35
I have completed implementations of char_separator and
offset_separator that are many times faster then the current
implementations. They are faster because they use the Token's assign
method rather then operator +=.
When supplied with input_iterators the faster implementations fall
back to operator +=, and there is no speed gain.
I've tested them with blocks of "x"'s separated by "|"'s. For
example, the last test tokenizes the string "xxxxxxxxxx|" with a
char_separator that splits on "|".
Under Red Hat 9, gcc 3.3 and "-O3" I see the following for
char_separator:
blocks block size speedup
10 100 6.2
10 1000 3.8
1 1000 3.4
1 10 2
For offset separator the speedup is more dramatic:
blocks block size speedup
10 100 22
2 1000 24
The offset_separator doesn't have to do a lot of work, so the penalty
for building up tokens a character at a time should be much higher.
A more description and some timing code is available at:
http://home.earthlink.net/~rzeh/Fast_boost__tokenizer_tokenizer_function.html
Robert Zeh
razeh_at_[hidden]
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk