Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2004-10-21 10:02:11

Rogier van Dalen wrote:
> On Wed, 20 Oct 2004 23:05:08 +0300, Peter Dimov <pdimov_at_[hidden]>
> wrote:
>> Rogier van Dalen wrote:
>>> // The actual Unicode string
>>> template <class CodeUnits, class NormalisationForm,
>>> class ErrorChecking>
>>> class string
>> By using ErrorChecking as a template parameter, you are encoding it
>> as part of the string type, but this is not necessary, because there
>> is no difference between values of strings with different
>> ErrorChecking policies (ErrorChecking does not change the
>> invariant). You should just provide different member functions for
>> the two ErrorChecking behaviors, or pass the ErrorChecking parameter
>> to the member functions that require it.
> I hadn't yet looked at it this way, but you are right from a
> theoretical point of view at least. To get more to practical matters,
> what do you think this should do:
> unicode::string s = ...;
> s += 0xDC01; // An isolated surrogate, which is nonsense
> ?
> Should it throw, or convert the isolated surrogate to U+FFFD
> REPLACEMENT CHARACTER (Unicode standard 4 Section 2.7), or something
> else?

Whatever is most common. My choice would probably be 'throw', but I haven't
used Unicode strings enough to have a strong opinion.

> And what should the member function with the opposite behaviour
> be called?

s.append( 0xDC01 ); // default (throw), += alias

// pick your favorite from the list below

s.append_and_correct( 0xDC01 );
s.append( 0xDC01, unicode::convert_on_error );
s.append<unicode::convert_on_error>( 0xDC01 );

I'd go with the first option based on general principles, all else being
equal. There is also

unicode::append_and_correct( s, 0xDC01 );

if the operation can be performed in "user space", i.e. doesn't need to be a
friend of the string class. Or

s += unicode::correct( 0xDC01 );

if the automatic correction does not depend on the left side.

Boost list run by bdawes at, gregod at, cpdaniel at, john at