Boost logo

Boost :

From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2004-10-21 16:00:34


On Thu, 21 Oct 2004 20:31:24 +0200, Erik Wien <wien_at_[hidden]> wrote:
> The best solution would be to never append single code units, but instead
> code points. The += operator would determine how many code units is required
> for the given code point.

I fully agree with you on that; I was considering what should happen
if the user appended something invalid (e.g., an isolated surrogate).
Sorry for any confusion caused.

I made a second mistake in mixing up the two levels in an unclear way.
I very much like Peter's suggestion of using free functions converting
invalid values to valid ones. Using that I suggest:

unicode::codepoint_string should throw when an invalid codepoint is
appended to it (e.g., an isolated surrogate).
unicode::correct_codepoint() should convert an invalid codepoint into
U+FFFD, and could be used to "safely" insert codepoints.

char32_t correct_codepoint (char32_t);

unicode::string should take a unicode::character for appending. A
unicode::character object may be constructed with a single codepoint,
which will be its base character. If this codepoint is invalid, it
should throw. If the codepoint is a combining mark, it should also
throw.
unicode::correct() should convert an invalid codepoint into U+FFFD,
and if it is input a combining mark, it should use U+0020 SPACE as a
base character.

character correct (char32_t);

Regards,
Rogier


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk