
Hi, I'm using the tokenizer class to allow users of my program to concatenate fields of data into a resultant string, where each field can be a quoted string literal, or some pre-defined entity that gets substituted by the program at some point later. The + symbol is treated like a concatenation operator. For example, a user might enter a string like this (including the quotes): "hello," + " world" In this case, my program would concatenate the two string literals ("hello," and " world") together so that the result is "hello, world" (note that these quotes are not actually part of the result string). My basic tokenizer usage is below: // FieldSpec is the incoming string as entered by the // user, including quotes to denote string literals std::string str = FieldSpec.c_str(); typedef boost::tokenizer<boost::char_separator<char> > tokenizer; boost::char_separator<char> fieldSeparator("+", "", boost::keep_empty_tokens); tokenizer fieldTokens(str, fieldSeparator); for ( tokenizer::iterator tok_iter = fieldTokens.begin(); tok_iter != fieldTokens.end(); ++tok_iter ) { // do something with the token // (could be a string literal or a pre-defined entity) } The problem I have is that the user might wish to include plus signs in his string lterals, as in this example: "1" + " + " + "2 = 3" Here, the user has entered a " + " which should indicate a literal plus sign as opposed to a concatenation operator. The obvious desired result would be: "1 + 2 = 3" (minus the quotes) My current usage of tokenizer does not handle this at all, as it has no regard for _where_ the '+' symbols are located in the user's string; that is, it doesn't care if they are within quotes or not. I would like my tokenizer usage to be smart enough to know the difference between _real_ token separators and those that might exist as string literals within quotes. Can I use the tokenizer class to do this, or do I need to use some other method to tokenize my strings? I see something about the concept of a TokenizerFunction in the documentation, but I don't really have any idea how to implement one, or if it would even be helpful in this situation. I'm rather new to the boost libraries and template usage in general, so all help and suggestions are welcome. Thanks, - Dennis

This sounds like a job for something like Spirit (http://www.boost.org/libs/spirit/), rather than tokenizer... When trying to implement this for tokenizer, you'll likely be duplicating stuff already done for you by Spirit. Pablo "Dennis Jones" <djones@oregon.com> wrote in message news:d7nr4q$mt3$1@sea.gmane.org...
Hi,
I'm using the tokenizer class to allow users of my program to concatenate fields of data into a resultant string, where each field can be a quoted string literal, or some pre-defined entity that gets substituted by the program at some point later. The + symbol is treated like a concatenation operator. For example, a user might enter a string like this (including the quotes):
"hello," + " world"
In this case, my program would concatenate the two string literals ("hello," and " world") together so that the result is "hello, world" (note that these quotes are not actually part of the result string). My basic tokenizer usage is below:
// FieldSpec is the incoming string as entered by the // user, including quotes to denote string literals std::string str = FieldSpec.c_str();
typedef boost::tokenizer<boost::char_separator<char> > tokenizer; boost::char_separator<char> fieldSeparator("+", "", boost::keep_empty_tokens); tokenizer fieldTokens(str, fieldSeparator); for ( tokenizer::iterator tok_iter = fieldTokens.begin(); tok_iter != fieldTokens.end(); ++tok_iter ) { // do something with the token // (could be a string literal or a pre-defined entity) }
The problem I have is that the user might wish to include plus signs in his string lterals, as in this example:
"1" + " + " + "2 = 3"
Here, the user has entered a " + " which should indicate a literal plus sign as opposed to a concatenation operator. The obvious desired result would be:
"1 + 2 = 3" (minus the quotes)
My current usage of tokenizer does not handle this at all, as it has no regard for _where_ the '+' symbols are located in the user's string; that is, it doesn't care if they are within quotes or not.
I would like my tokenizer usage to be smart enough to know the difference between _real_ token separators and those that might exist as string literals within quotes. Can I use the tokenizer class to do this, or do I need to use some other method to tokenize my strings?
I see something about the concept of a TokenizerFunction in the documentation, but I don't really have any idea how to implement one, or if it would even be helpful in this situation. I'm rather new to the boost libraries and template usage in general, so all help and suggestions are welcome.
Thanks,
- Dennis

"Pablo Aguilar" <pablo.aguilar@gmail.com> wrote in message news:d7nt7u$e67$1@sea.gmane.org...
This sounds like a job for something like Spirit (http://www.boost.org/libs/spirit/), rather than tokenizer... When trying to implement this for tokenizer, you'll likely be duplicating stuff already done for you by Spirit.
Maybe, but tokenizer seems to be fairly simple and straight-forward, and I don't think I need all the power of a full-blown parser like Spirit for a task as simple as this. Besides, Spirit is not available for my compiler (BCB), so tokenizer is the only choice I have! - Dennis

"Dennis Jones" <djones@oregon.com> wrote in message news:d7nurq$4cb$1@sea.gmane.org...
"Pablo Aguilar" <pablo.aguilar@gmail.com> wrote in message news:d7nt7u$e67$1@sea.gmane.org...
This sounds like a job for something like Spirit (http://www.boost.org/libs/spirit/), rather than tokenizer... When trying to implement this for tokenizer, you'll likely be duplicating stuff already done for you by Spirit.
Maybe, but tokenizer seems to be fairly simple and straight-forward, and I don't think I need all the power of a full-blown parser like Spirit for a task as simple as this. Besides, Spirit is not available for my compiler (BCB), so tokenizer is the only choice I have!
- Dennis
While I haven't tried it myself, for the same reason (not available on VC6 nor BCB), I do think it'd be easier to use than trying to customize tokenizer. Also, (and again, I haven't tried it) I believe Spirit 1.6.x (spirit.sourceforge.net) is available for older compilers; I know it's supposed to work for VC6, but I'm not sure about BCB. HTH Pablo

"Pablo Aguilar" <pablo.aguilar@gmail.com> wrote in message news:d7o1bi$d60$1@sea.gmane.org...
While I haven't tried it myself, for the same reason (not available on VC6 nor BCB), I do think it'd be easier to use than trying to customize tokenizer. Also, (and again, I haven't tried it) I believe Spirit 1.6.x (spirit.sourceforge.net) is available for older compilers; I know it's supposed to work for VC6, but I'm not sure about BCB.
While I can appreciate the usefulness of Spirit, I just don't see the need for it here -- especially since it means bringing in an entire library just for a single operation in my application. In the event that tokenizer is incapable of handling my desired requirements, I have come up with a reasonable work-around: I simply pre-process the string, substituting the '+' operators with some un-printable character (like '\x01', or any other character the user is highly unlikely to type), while leaving any '+' symbols within quotes alone. Then I use the tokenizer with the '\x01' as the token separator, and voila! - Dennis

Hi, I may try to look at find_iterator facility provided in string_algo library. It is more customizable then tokenizer. Also from 1.33 on (including current cvs version) it supports BCB compiler. Best regards, Pavol On Thu, Jun 02, 2005 at 01:48:13PM -0700, Dennis Jones wrote:
Hi,
I'm using the tokenizer class to allow users of my program to concatenate fields of data into a resultant string, where each field can be a quoted string literal, or some pre-defined entity that gets substituted by the program at some point later. The + symbol is treated like a concatenation operator. For example, a user might enter a string like this (including the quotes):
"hello," + " world"
In this case, my program would concatenate the two string literals ("hello," and " world") together so that the result is "hello, world" (note that these quotes are not actually part of the result string). My basic tokenizer usage is below:
// FieldSpec is the incoming string as entered by the // user, including quotes to denote string literals std::string str = FieldSpec.c_str();
typedef boost::tokenizer<boost::char_separator<char> > tokenizer; boost::char_separator<char> fieldSeparator("+", "", boost::keep_empty_tokens); tokenizer fieldTokens(str, fieldSeparator); for ( tokenizer::iterator tok_iter = fieldTokens.begin(); tok_iter != fieldTokens.end(); ++tok_iter ) { // do something with the token // (could be a string literal or a pre-defined entity) }
The problem I have is that the user might wish to include plus signs in his string lterals, as in this example:
"1" + " + " + "2 = 3"
Here, the user has entered a " + " which should indicate a literal plus sign as opposed to a concatenation operator. The obvious desired result would be:
"1 + 2 = 3" (minus the quotes)
My current usage of tokenizer does not handle this at all, as it has no regard for _where_ the '+' symbols are located in the user's string; that is, it doesn't care if they are within quotes or not.
I would like my tokenizer usage to be smart enough to know the difference between _real_ token separators and those that might exist as string literals within quotes. Can I use the tokenizer class to do this, or do I need to use some other method to tokenize my strings?
I see something about the concept of a TokenizerFunction in the documentation, but I don't really have any idea how to implement one, or if it would even be helpful in this situation. I'm rather new to the boost libraries and template usage in general, so all help and suggestions are welcome.
Thanks,
- Dennis
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (3)
-
Dennis Jones
-
Pablo Aguilar
-
Pavol Droba