Boost logo

Boost :

From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2004-10-21 09:28:37


On Wed, 20 Oct 2004 23:05:08 +0300, Peter Dimov <pdimov_at_[hidden]> wrote:
> Rogier van Dalen wrote:
> >
> > // The actual Unicode string
> > template <class CodeUnits, class NormalisationForm,
> > class ErrorChecking>
> > class string
>
> By using ErrorChecking as a template parameter, you are encoding it as part
> of the string type, but this is not necessary, because there is no
> difference between values of strings with different ErrorChecking policies
> (ErrorChecking does not change the invariant). You should just provide
> different member functions for the two ErrorChecking behaviors, or pass the
> ErrorChecking parameter to the member functions that require it.

I hadn't yet looked at it this way, but you are right from a
theoretical point of view at least. To get more to practical matters,
what do you think this should do:

unicode::string s = ...;
s += 0xDC01; // An isolated surrogate, which is nonsense

?
Should it throw, or convert the isolated surrogate to U+FFFD
REPLACEMENT CHARACTER (Unicode standard 4 Section 2.7), or something
else? And what should the member function with the opposite behaviour
be called?

Rogier


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk