Boost logo

Boost :

From: Robert Zeh (razeh_at_[hidden])
Date: 2003-12-03 11:43:35


I have completed implementations of char_separator and
offset_separator that are many times faster then the current
implementations. They are faster because they use the Token's assign
method rather then operator +=.

When supplied with input_iterators the faster implementations fall
back to operator +=, and there is no speed gain.

I've tested them with blocks of "x"'s separated by "|"'s. For
example, the last test tokenizes the string "xxxxxxxxxx|" with a
char_separator that splits on "|".

Under Red Hat 9, gcc 3.3 and "-O3" I see the following for
char_separator:

blocks block size speedup
10 100 6.2
10 1000 3.8
1 1000 3.4
1 10 2

For offset separator the speedup is more dramatic:

blocks block size speedup
10 100 22
2 1000 24

The offset_separator doesn't have to do a lot of work, so the penalty
for building up tokens a character at a time should be much higher.

A more description and some timing code is available at:

http://home.earthlink.net/~rzeh/Fast_boost__tokenizer_tokenizer_function.html

Robert Zeh
razeh_at_[hidden]


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk