
This sounds like a job for something like Spirit (http://www.boost.org/libs/spirit/), rather than tokenizer... When trying to implement this for tokenizer, you'll likely be duplicating stuff already done for you by Spirit. Pablo "Dennis Jones" <djones@oregon.com> wrote in message news:d7nr4q$mt3$1@sea.gmane.org...
Hi,
I'm using the tokenizer class to allow users of my program to concatenate fields of data into a resultant string, where each field can be a quoted string literal, or some pre-defined entity that gets substituted by the program at some point later. The + symbol is treated like a concatenation operator. For example, a user might enter a string like this (including the quotes):
"hello," + " world"
In this case, my program would concatenate the two string literals ("hello," and " world") together so that the result is "hello, world" (note that these quotes are not actually part of the result string). My basic tokenizer usage is below:
// FieldSpec is the incoming string as entered by the // user, including quotes to denote string literals std::string str = FieldSpec.c_str();
typedef boost::tokenizer<boost::char_separator<char> > tokenizer; boost::char_separator<char> fieldSeparator("+", "", boost::keep_empty_tokens); tokenizer fieldTokens(str, fieldSeparator); for ( tokenizer::iterator tok_iter = fieldTokens.begin(); tok_iter != fieldTokens.end(); ++tok_iter ) { // do something with the token // (could be a string literal or a pre-defined entity) }
The problem I have is that the user might wish to include plus signs in his string lterals, as in this example:
"1" + " + " + "2 = 3"
Here, the user has entered a " + " which should indicate a literal plus sign as opposed to a concatenation operator. The obvious desired result would be:
"1 + 2 = 3" (minus the quotes)
My current usage of tokenizer does not handle this at all, as it has no regard for _where_ the '+' symbols are located in the user's string; that is, it doesn't care if they are within quotes or not.
I would like my tokenizer usage to be smart enough to know the difference between _real_ token separators and those that might exist as string literals within quotes. Can I use the tokenizer class to do this, or do I need to use some other method to tokenize my strings?
I see something about the concept of a TokenizerFunction in the documentation, but I don't really have any idea how to implement one, or if it would even be helpful in this situation. I'm rather new to the boost libraries and template usage in general, so all help and suggestions are welcome.
Thanks,
- Dennis