Boost logo

Boost :

From: Jeremy Siek (jsiek_at_[hidden])
Date: 2001-06-03 11:24:14


I think Tokenizer should be accepted, however I must say I have some
reservations. The documents need a good bit of work and so does the
overall organization of the library.

The main problem is that the library is mixing up the presentation of the
public interface and the implementation details. Implementation details
should either not documented at all in the html, or instead placed in a
section clearly marking them as implementation details.

Likewise, in the source code things that are implementation details should
be placed in namespace detail. Header files should be named according
to the public classes that they provide, and not refer to the privates.
Here's how I'd like to see the organization:

Public functions and classes and their headers:

  token_iterator.hpp
    token_iterator_generator
    make_token_iterator
      (I did not list token_iterator here because that class should be
       deleted, see below)

  tokenizer.hpp
    tokenizer

  token_functions.hpp
    cvs_separator
      csv_escape_error (there's a typo in the name of this one)
    offset_separator
    punct_space_separator

Implementation details (namespace detail)

  token_iterator.hpp
    tokenizer_policy
    tokenizer_base (should be renamed token_iterator_base)

The main index.htm page needs to have a Table of Contents with links to
all of the public clases, and to TokenizerFuntion, which should go under a
"concepts" heading. Some of the things from "See Also" can move to the
TOC.

The introduction should talk more about what the tokenizer is for, and
about the public parts of the library, and should not talk about
implmentation details such as the use of iterator adaptor. Perhaps more
emphasis could be given to the different tokenizer functions, and how the
TokenizerFunction parameter allows for extensibility.

The example could use more explaining, perhaps breaking it up into three
parts with a more detailed discussion of each.

The tokenizer_policy.htm file has lots of problems. It looks like your
trying to document several different things, some of which are public and
some of which are not, all at the same time. The example documents using
token_iterator... but that's not what this page is about, is it? Also I'm
not sure which class the template parameters section goes with, and you've
got the token_iterator_generator on this page too, way down at the bottom.
But the token_iterator_generator is the only public thing on this page...
The token_iterator_generator is the most important part, it should have a
page dedicated to it, called token_iterator.html.

Comments on the code:

All of the typedefs in token_iterator_generator except "type" should be
private.

The token_iterator class should be deleted. You are attempting to create a
templated typedef using inheritance, which I'm afraid doesn't really work.
There are corner cases where bad things happen. Also, the
token_iterator_generator already serves the purpose.

Feature requests:

There are a couple additions I'd like to see to the library. We need a
tokenizer function that provides the same kind of functionality as
strtok(). I wrote up a version of this a while back. I'll dust that code
off, write docs, and then post it. Also I'd like to see a tokenizer
function that uses a regular expression (from regex) to do the spliting.

On the positive side. I must say I really like the idea of parameterizing
the token iterator with the tokenizing function. This design is really
nice.

Cheers,
Jeremy

----------------------------------------------------------------------
 Jeremy Siek www: http://www.lsc.nd.edu/~jsiek/
 Ph.D. Candidate, IU B'ton email: jsiek_at_[hidden]
 Summer Manager, AT&T Research phone: (973) 360-8185
----------------------------------------------------------------------


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk