Boost logo

Boost :

From: Robert Zeh (razeh_at_[hidden])
Date: 2003-12-03 11:43:35

I have completed implementations of char_separator and
offset_separator that are many times faster then the current
implementations. They are faster because they use the Token's assign
method rather then operator +=.

When supplied with input_iterators the faster implementations fall
back to operator +=, and there is no speed gain.

I've tested them with blocks of "x"'s separated by "|"'s. For
example, the last test tokenizes the string "xxxxxxxxxx|" with a
char_separator that splits on "|".

Under Red Hat 9, gcc 3.3 and "-O3" I see the following for

blocks block size speedup
10 100 6.2
10 1000 3.8
1 1000 3.4
1 10 2

For offset separator the speedup is more dramatic:

blocks block size speedup
10 100 22
2 1000 24

The offset_separator doesn't have to do a lot of work, so the penalty
for building up tokens a character at a time should be much higher.

A more description and some timing code is available at:

Robert Zeh

Boost list run by bdawes at, gregod at, cpdaniel at, john at