[Boost-users] Re: tokenizer question

2 Jun 2005


      This sounds like a job for something like Spirit 
(http://www.boost.org/libs/spirit/), rather than tokenizer...
When trying to implement this for tokenizer, you'll likely be duplicating 
stuff already done for you by Spirit.


Pablo

"Dennis Jones" <djones@oregon.com> wrote in message 
news:d7nr4q$mt3$1@sea.gmane.org...
...
Hi,
I'm using the tokenizer class to allow users of my program to concatenate
fields of data into a resultant string, where each field can be a quoted
string literal, or some pre-defined entity that gets substituted by the
program at some point later.  The + symbol is treated like a concatenation
operator.  For example, a user might enter a string like this (including 
the
quotes):
"hello," + " world"
In this case, my program would concatenate the two string literals
("hello," and " world") together so that the result is "hello, world" 
(note
that these quotes are not actually part of the result string).  My basic
tokenizer usage is below:
// FieldSpec is the incoming string as entered by the
  // user, including quotes to denote string literals
  std::string str = FieldSpec.c_str();
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
  boost::char_separator<char> fieldSeparator("+", "",
       boost::keep_empty_tokens);
  tokenizer fieldTokens(str, fieldSeparator);
  for ( tokenizer::iterator tok_iter = fieldTokens.begin();
        tok_iter != fieldTokens.end();
        ++tok_iter )
  {
       // do something with the token
       // (could be a string literal or a pre-defined entity)
  }
The problem I have is that the user might wish to include plus signs
in his string lterals, as in this example:
"1"  +  " + "  +  "2 = 3"
Here, the user has entered a " + " which should indicate a literal plus 
sign
as opposed to a concatenation operator.  The obvious desired result would
be:
"1 + 2 = 3"  (minus the quotes)
My current usage of tokenizer does not handle this at all, as it has no
regard for _where_ the '+' symbols are located in the user's string; that
is, it doesn't care if they are within quotes or not.
I would like my tokenizer usage to be smart enough to know the difference
between _real_ token separators and those that might exist as string
literals within quotes.  Can I use the tokenizer class to do this, or do I
need to use some other method to tokenize my strings?
I see something about the concept of a TokenizerFunction in the
documentation, but I don't really have any idea how
to implement one, or if it would even be helpful in this situation.  I'm
rather new to the boost libraries and template usage in general, so all
help and suggestions are welcome.
Thanks,
- Dennis

[Boost-users] Re: tokenizer question

Pablo Aguilar