|
Boost : |
From: David Abrahams (dave_at_[hidden])
Date: 2004-10-22 10:57:49
"Erik Wien" <wien_at_[hidden]> writes:
> "Rogier van Dalen" <rogiervd_at_[hidden]> wrote in message
>> I hadn't yet looked at it this way, but you are right from a
>> theoretical point of view at least. To get more to practical matters,
>> what do you think this should do:
>>
>> unicode::string s = ...;
>> s += 0xDC01; // An isolated surrogate, which is nonsense
>>
>> ?
>> Should it throw, or convert the isolated surrogate to U+FFFD
>> REPLACEMENT CHARACTER (Unicode standard 4 Section 2.7), or something
>> else? And what should the member function with the opposite behaviour
>> be called?
>
> The best solution would be to never append single code units, but instead
> code points. The += operator would determine how many code units is required
> for the given code point.
Is this going to be illegal for most fs, then?
std::copy(
std::istream_iterator<char>(f), std::istream_iterator<char>(),
std::back_inserter(my_utf8_string));
I think it pretty much has to work.
-- Dave Abrahams Boost Consulting http://www.boost-consulting.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk