Boost logo

Boost :

From: jeff_at_[hidden]
Date: 2001-06-01 10:35:02

I have reviewed Tokenizer and believe it should be accepted into boost. I used
g++ 2.95.2 under Cygwin for testing. I have some minor comments. I apologize
if these overlap with other reviewer comments, but I recently have not had time
to follow the boost list closely.

Code comments / questions:
1) I would have expected that the Tokenizer class would have TokenizerFunc as
the first template parameter and the input types (eg: string and
string::const_iterator) would be second and third allowing for defaults:

  template <class TokenizerFunc,
                  class Token=std::string,
              class Iterator=std::string::const_iterator>
  class Tokenizer {...}

Then the third example could be simplified from:
  typedef tokenizer<string,string::const_iterator,punct_space_separator<char> >
  typedef tokenizer<punct_space_separator<char> > Tok;

In my mind this is much clearer.

2) Is it possible to use a raw cstring instead of std::string as the input? I
experimented a bit, but I was unsuccessful in getting this to work. I have a
use for this in an application which needs to tokenize data from a socket
connection which is returned as a cstring and I wouldn't want the overhead of an
extra string construction / string copy.

3) Dependencies
This is just a point of information, I don't expect the library to change. I
used an earlier version of tokenizer and it required only the tokenizer headers,
utility.hpp, and config.hpp. The new version requires over 20 boost headers
including detail and type_traits directories. This is apparently the cost of
converting to iterator_adaptors.

Documentation Comments:

1) mainpage
   a) Add a sentence in the summary which describes and directly links the
csv_separator, offset_separator, and punct_space_separator examples. As a user
of the library, these concrete example pages are the first thing I want to read
and they are currently a buried at the very end of the TokenizerFunction page.
   b) a link to iterator adapter page in second sentence would be nice.
   c) put the "convenience iterators" first in the examples
   d) A short description of typical steps involved in usage would help explain
the library. Something like:
   Typical steps in creating a custom tokenizer are to write a
TokenizerFunc(tor) which provides and operator() and a reset function.

2) The tokenizer policy documentation page
   a) The 3rd sentence "The punct_space_separator function object..." seems out
of place.

Boost list run by bdawes at, gregod at, cpdaniel at, john at