Boost logo

Boost :

From: Douglas Gregor (gregod_at_[hidden])
Date: 2001-06-05 15:26:13


On Tuesday 05 June 2001 03:20 pm, you wrote:
> --- In boost_at_y..., Douglas Gregor <gregod_at_c...> wrote:
> > I still have a few comments on Tokenizer to throw in:
> >
> >
> > - The include guards contain "JRB015801." Is this intended to
>
> be updated as
>
> > the tokenizer version is updated? I'm curious because it's not a
>
> practice
>
> > I've seen before.
>
> In iterator_adapters.hpp there is the #define
> BOOST_ITERATOR_ADAPTOR_DWA053000_HPP_
>
> Which I guess is the filename followed by the author's initials and
> the creation date.
>
> > The name "csv_separator" should probably be changed to something
>
> more obvious
>
> > ("csv" is not a common acronym. Perhaps "comma_separator"?)
>
> I have had trouble thinking of a good name as well. Since the "comma"
> does not have to be a comma, comma_separator might be misleading. A
> name that reflects what it really is would be
> escaped_quoted_delimited_field_separator which is too unwieldy. Any
> ideas as to a better name would be appreciated.

Perhaps just "list_separator", implying a list of values separated by
something (default: comma). Escaping is reasonably common, so I don't see
that it must be part of the name (though perhaps "escaped_list_separator"
would work, and isn't terribly long).

> > punct_space_separator::operator() has a common with some incorrect
>
> grammar:
> > "skip past the punctuation only if the told to do so"
> >
> > The whitespace_and_punct class shouldn't be in the "detail"
>
> namespace if the
>
> > user is expected to specialize it for other character types.
> > Also, the names "punctuation1" and "punctuation2" don't convey much
>
> meaning,
>
> > though I'm at a loss to come up with better names. If
>
> whitespace_and_punct is
>
> > to be an interface element, I believe that it should only
>
> have "punctuation"
>
> > and "whitespace" functions. Perhaps the contents of "punctuation1"
>
> and
>
> > "punctuation2" should be separated into constants instead?
>
> I would really like to get rid of that template, while still allowing
> nice default behaviors. If you have any ideas, let me know.

Gary's comment got me thinking a bit about locales. While I don't believe I
would suggest them as the default, they should probably be supported. Perhaps
the best way to organize punct_space_separator is to remove the whitespace_
and punctuation_ members and instead parameterize it by a policy class that
contains "is_punct" and "is_space" members.

This would allow the current behavior (the policy class contains the
whitespace_ and punctuation_ strings), but also allow locales to work, using
a different policy class.

> > Perhaps the name "whitespace" in punct_space_separator is too
>
> restrictive.
>
> > strtok() calls these characters "delimiters," which seems to be
>
> more general.
>
>
> The idea was that there are some delimiters that you want returned
> (punctuation) and some you don't want returned (whitespace). For
> example, if you were breaking an expression such as
> 1 + 2 + 3-4 into a sequence of tokens, while both whitespace and +
> and - would be delimiters, you would only want + - returned and not
> the whitespace. Since most of the time, the stuff you want returned
> is punctuation the delimiters that can be returned are called that.
> Since whitespace is almost always ignored, the delimiters that never
> get returned are called that.
>
>
>
>
> Thanks for all your time and assistance,
>
> John R. Bandela

I see.

        Doug


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk