Hi,
I've been using the boost tokenizer successfully in the past and I've been quite happy with it. I was using it with std::string as my token type, but now I need to use it differently because of performance reasons (the input string is a raw UTF8 buffer (const unsigned char*) and output is a specific UTF16 string class). So I thought: maybe I can just tokenize the unsigned char buffer in place using boost::iterator_range<const unsigned char*> as my token type.
And it almost worked! With a hack:
the tokenizer attempts to call assign on my TokenType but boost::iterator_range doesn't have such member function. I created a wrapper class that simply delegates to the iterator_range's assignment operator and it now works!
This is great because I have no more useless string constructions: I can go directly from a raw UTF8 buffer to my output string type (UTF16 based) with only one conversion and no extra allocations! I still have the nice syntax of boost tokenizer and the maximum efficiency!
I think this solution should be mentioned in the tutorial docs because it might not be obvious for everybody. Also, maybe we can eliminate the hack I did by adding an assign() to the boost range interface (this seems simpler to me than modifying the tokenizer to not call assign).
Thanks for the great work you guys put into this library!
Best regards,
Florin.