|
Boost : |
From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2004-10-22 15:38:30
On Fri, 22 Oct 2004 12:46:00 -0400 (EDT), Rob Stewart <stewart_at_[hidden]> wrote:
> From: Rogier van Dalen <rogiervd_at_[hidden]>
> >
> > unicode::string should take a unicode::character for appending. A
> > unicode::character object may be constructed with a single codepoint,
> > which will be its base character. If this codepoint is invalid, it
> > should throw. If the codepoint is a combining mark, it should also
> > throw.
> > unicode::correct() should convert an invalid codepoint into U+FFFD,
> > and if it is input a combining mark, it should use U+0020 SPACE as a
> > base character.
>
> Why not have unicode::character's ctor invoke unicode::correct()?
unicode::correct() replaces every encoding error in the input by a
replacement character. This loses information and it is not
recoverable. The combining character bit is only slightly better. When
I proposed a policy I called it workaround_encoding_error; maybe we
need a better name than "correct".
I agree with Peter Dimov, however, that the default should be to throw
rather than to throw away information and pretend nothing happened.
Regards,
Rogier
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk