Boost logo

Boost Users :

Subject: Re: [Boost-users] Tokenizer design question
From: Zachary Turner (divisortheory_at_[hidden])
Date: 2009-07-14 13:32:15


On Tue, Jul 14, 2009 at 11:48 AM, Polder, Matthew
J<matthew.j.polder_at_[hidden]> wrote:
> The Tokenizer library has a char_separator with the option to keep
> delimiters, drop delimiters, and keep or drop empty tokens. However, with
> escaped_list_separator, the only behavior is to keep empty tokens. While
> this is the obvious behavior for parsing csv and similar files, it would be
> nice to have the ability to also drop empty tokens when constructing an
> escaped_list_separator.
>
>
>
> I have a command line parser that either reads its arguments from the
> command line itself or a text file supplied on the command line. In the file
> I’m passing in formats for the Date Time library I/O routines, and the
> formats have spaces that I’m escaping so the format will be a single token,
> which Tokenizer does find. But I sometimes use multiple tabs to separate my
> fields so it will look pretty in a text editor, and escaped_list_separator
> is keeping these. The solution for now is to have a switch in my command
> line parser for which separator I want to use.
>

In the loop that you process tokens, you should be able to deal with
this by simply doing:

mytok::iterator begin = toker.begin();
mytok::iterator end = toker.end();
while (begin != end)
{
   if (begin->empty()) continue;

   //do normal token processing

  ++begin;
}

I guess it's doing this because it was originally designed to support
CSV files, which can contain empty fields. so ,, in a CSV represents
an empty field, so in your case <space><space> would represent an
empty field too. But since it's an empty field, the value of *iter is
the empty string, and there should be no other time where it will ever
evaluate to an empty string. If nothing else it's a not-too-hackish
workaround, but maybe a constructor argument bool ignore_empty_fields
with default value of false would be niec too.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net