tokenizer question

2 Jun 2005

      Hi,

I'm using the tokenizer class to allow users of my program to concatenate
fields of data into a resultant string, where each field can be a quoted
string literal, or some pre-defined entity that gets substituted by the
program at some point later.  The + symbol is treated like a concatenation
operator.  For example, a user might enter a string like this (including the
quotes):

"hello," + " world"

In this case, my program would concatenate the two string literals
("hello," and " world") together so that the result is "hello, world" (note
that these quotes are not actually part of the result string).  My basic
tokenizer usage is below:

   // FieldSpec is the incoming string as entered by the
   // user, including quotes to denote string literals
   std::string str = FieldSpec.c_str();

   typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
   boost::char_separator<char> fieldSeparator("+", "",
        boost::keep_empty_tokens);
   tokenizer fieldTokens(str, fieldSeparator);
   for ( tokenizer::iterator tok_iter = fieldTokens.begin();
         tok_iter != fieldTokens.end();
         ++tok_iter )
   {
        // do something with the token
        // (could be a string literal or a pre-defined entity)
   }

The problem I have is that the user might wish to include plus signs
in his string lterals, as in this example:

"1"  +  " + "  +  "2 = 3"

Here, the user has entered a " + " which should indicate a literal plus sign
as opposed to a concatenation operator.  The obvious desired result would
be:

"1 + 2 = 3"  (minus the quotes)

My current usage of tokenizer does not handle this at all, as it has no
regard for _where_ the '+' symbols are located in the user's string; that
is, it doesn't care if they are within quotes or not.

I would like my tokenizer usage to be smart enough to know the difference
between _real_ token separators and those that might exist as string
literals within quotes.  Can I use the tokenizer class to do this, or do I
need to use some other method to tokenize my strings?

I see something about the concept of a TokenizerFunction in the
documentation, but I don't really have any idea how
to implement one, or if it would even be helpful in this situation.  I'm
rather new to the boost libraries and template usage in general, so all
help and suggestions are welcome.

Thanks,

- Dennis

Dennis Jones

Pablo Aguilar

Dennis Jones

Pablo Aguilar

Dennis Jones

Pavol Droba

tags

participants (3)