
Hi! Library version 1.33.0. The input string contains null characters. I want to use null character as the seperator. The following code produces only "X". Why "Y" and "Z" are discarded and how to fix this code? ========= #include<iostream> #include<boost/tokenizer.hpp> int main(int argc, char* argv[]) { using namespace std; using namespace boost; string str="X\0Y\0Z"; typedef tokenizer<boost::char_separator<char> > Tok; char_separator<char> sep("\0"); Tok tokens(str, sep); for(Tok::iterator tok_iter = tokens.begin();tok_iter != tokens.end(); ++tok_iter) cout << *tok_iter; return EXIT_SUCCESS; } ========= Thank you in advance! CN -- http://www.fastmail.fm - Access your email from home and the web

The input string contains null characters. I want to use null character as the seperator. The following code produces only "X". Why "Y" and "Z" are discarded and how to fix this code?
string str="X\0Y\0Z";
Take a closer look at that line. The tokenizer library isn't the problem. - me22

On Sunday 16 October 2005 03.58, me22 wrote:
The input string contains null characters. I want to use null character as the seperator. The following code produces only "X". Why "Y" and "Z" are discarded and how to fix this code?
string str="X\0Y\0Z";
Take a closer look at that line. The tokenizer library isn't the problem. Yes and no. Fixing the above to
str=string("X\0Y\0Z", 5); will only partly solve the problem since the char_separator ctor takes a 'const Char*' char_separator<char> sep("\0"); and passing a "\0" will treated as an empty string when the ctor initializes the private member m_dropped_delims. It is not clear to me how to best work around this. Kind of a feature of the interface plus the fact that c-strings are terminated by '\0';) One solution to support passing '\0' as a separator could be to add another ctor to char_separator that can accept a 'const string &' for kept and dropped delimiters; something like class char_separator { public: typedef std::basic_string<Char,Traits> string_type; //... explicit char_separator(const string_type & dropped_delims, const string_type & kept_delims = string_type(), empty_token_policy empty_tokens = drop_empty_tokens) : m_dropped_delims(dropped_delims), m_kept_delims(kept_delims), m_use_ispunct(false), m_use_isspace(false), m_empty_tokens(empty_tokens), m_output_done(false) { } //... seems to work when char_separator is used as char_separator<char> sep(string(1,'\0')); Tok tokens(str, sep); -- Regards, Fredrik Hedman

On second thought there seems to be a way without changing the interface of the library: use and escaped_list_separator instead. A variation of your would then become #include <boost/tokenizer.hpp> #include <iostream> #include <string> using namespace std; using namespace boost; int main() { string str=string("X\0Y\0\0Z", 6); typedef tokenizer<boost::escaped_list_separator<char> > Tknz; typedef Tknz::iterator Iter; escaped_list_separator<char> sep(string(), string(1,'\0'), string()); Tknz tok(str, sep); for(Iter t = tok.begin(); t != tok.end(); ++t) cout << '<' << *t << '>'; cout << endl; } This will give: <X><Y><><Z> There does not seem to be a hook for getting rid of empty tokens for escaped_list_separator. -- Regards, Fredrik Hedman

Hello, it seems to me that boost::char_separator needs to have another ctor that can accept delimiters that are string types. For example, given a std::string("X\0Y\0\0Z", 6), it does not seem to be possible to use the current ctor of boost::char_separator so that '\0' can be used as a separator. It is possible to use boost::escaped_list_separator, since *it* takes a string type on construction, but on the other hand boost::escaped_list_separator does not have an empty_token_policy. In summary, I suggest adding another ctor to boost::char_separator. This will enable the parsing of the above string as <X><Y><><Z> or <X><Y><Z> by the following: #include <boost/tokenizer.hpp> #include <iostream> #include <string> using namespace std; using namespace boost; int main() { typedef tokenizer<boost::char_separator<char> > Tok; string str=string("X\0Y\0\0Z", 6); { char_separator<char> keep_empty(string(1,'\0'), string(), boost::keep_empty_tokens); Tok tokens(str, keep_empty); for(Tok::iterator iter = tokens.begin(); iter != tokens.end(); ++iter) cout << '<' << *iter << '>'; cout << endl; } { char_separator<char> skip_empty(string(1,'\0'), string(), boost::drop_empty_tokens); Tok tokens(str, skip_empty); for(Tok::iterator iter = tokens.begin(); iter != tokens.end(); ++iter) cout << '<' << *iter << '>'; cout << endl; } } This is achieved by a small change to class char_separator in token_functions.hpp included in the attachment. -- Best Regards, Fredrik Hedman

Fredrik Hedman wrote:
Hello,
it seems to me that boost::char_separator needs to have another ctor that can accept delimiters that are string types. For example, given a std::string("X\0Y\0\0Z", 6), it does not seem to be possible to use the current ctor of boost::char_separator so that '\0' can be used as a separator.
It is possible to use boost::escaped_list_separator, since *it* takes a string type on construction, but on the other hand boost::escaped_list_separator does not have an empty_token_policy. In summary, I suggest adding another ctor to boost::char_separator. This will enable the parsing of the above string as
I would prefer the more general: template <class It> // where: typeof(*It) == Char, ++It, It == It char_separator(It delims_begin, It delims_end, empty_token_policy empty_tokens = drop_empty_tokens) Kept and dropped delims get a bit messy, though. Some sort of mask? (vector<bool> kept_delims = vector<bool>()) As in valarray's mask_array.

On Thursday 20 October 2005 00.37, Simon Buchan wrote:
Fredrik Hedman wrote:
Hello,
it seems to me that boost::char_separator needs to have another ctor that can accept delimiters that are string types. For example, given a std::string("X\0Y\0\0Z", 6), it does not seem to be possible to use the current ctor of boost::char_separator so that '\0' can be used as a separator.
It is possible to use boost::escaped_list_separator, since *it* takes a string type on construction, but on the other hand boost::escaped_list_separator does not have an empty_token_policy. In summary, I suggest adding another ctor to boost::char_separator. This will enable the parsing of the above string as
I would prefer the more general: template <class It> // where: typeof(*It) == Char, ++It, It == It char_separator(It delims_begin, It delims_end, empty_token_policy empty_tokens = drop_empty_tokens)
Kept and dropped delims get a bit messy, though. Some sort of mask? (vector<bool> kept_delims = vector<bool>()) As in valarray's mask_array.
Hi Simon, what you are suggesting is certainly possible, but seems to imply another two arguments, so that the dropped delims can be passed into the ctor too. My view is that this solution makes the ctor less convenient to use. The intent of the arguments to the ctor is to tell the char_separator what delimiters to drop and what to keep. So the two arguments are basically two (disjunct?) sets, but with the current interface it does not seem to be possible to pass in a '\0' as a delimiter. Hence my suggestion to add a ctor that takes delimiters that are of string type. -- Best Regards, Fredrik Hedman
participants (4)
-
CN
-
Fredrik Hedman
-
me22
-
Simon Buchan