Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2004-10-22 10:57:49


"Erik Wien" <wien_at_[hidden]> writes:

> "Rogier van Dalen" <rogiervd_at_[hidden]> wrote in message
>> I hadn't yet looked at it this way, but you are right from a
>> theoretical point of view at least. To get more to practical matters,
>> what do you think this should do:
>>
>> unicode::string s = ...;
>> s += 0xDC01; // An isolated surrogate, which is nonsense
>>
>> ?
>> Should it throw, or convert the isolated surrogate to U+FFFD
>> REPLACEMENT CHARACTER (Unicode standard 4 Section 2.7), or something
>> else? And what should the member function with the opposite behaviour
>> be called?
>
> The best solution would be to never append single code units, but instead
> code points. The += operator would determine how many code units is required
> for the given code point.

Is this going to be illegal for most fs, then?

   std::copy(
        std::istream_iterator<char>(f), std::istream_iterator<char>(),
        std::back_inserter(my_utf8_string));

I think it pretty much has to work.

-- 
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk