Boost logo

Boost :

From: George A. Heintzelman (georgeh_at_[hidden])
Date: 2001-10-15 14:35:42


I'm trying to use the tokenizer library to improve some code here, and
running into what are primarily questions with the documentation of

As a general comment, the examples in the documentation are a little
bit on the simple side. While its nice to see examples of the easy
cases made easy, one should also see examples of the more-fully-used
power of the library.

In more specific questions: The behavior of the _delimiters_separator
tokenizer functions doesn't tell you what happens when the tokenizer
encounters repeated separators in the string. From experimentation, it
appears that they are ignored, but this needs to be made clear. Also a
means of having a tokenizer_function which returns an empty string for
the location between repeated separators would be nice feature to add.

Second, and more crucial -- from hard-earned experience, and looking at
the source, it appears that dereferencing a token_iterator returns a
reference to an internal cache. This is fine, except that it means that:

const string &ref = *tok_it++;

does not work -- the token iterator overwrites the internal data, and
ref now will contain whatever is in the second token. Since this
doesn't happen for most iterators in the STL, I think this feature
needs to be loudly documented. I'd actually prefer if it went away from
the default policy implementation -- return the string by value if
necessary -- but that's a larger change than documenting the existing

George Heintzelman

Boost list run by bdawes at, gregod at, cpdaniel at, john at