Boost :

Date view	Thread view	Subject view	Author view

From: Peter Dimov (pdimov_at_[hidden])
Date: 2004-10-21 10:02:11

Next message: Rogier van Dalen: "Re: [boost] Re: Re: Re: Any interest in adding unicode support to boost?"
Previous message: Rogier van Dalen: "Re: [boost] Re: Any interest in adding unicode support to boost?"
In reply to: Rogier van Dalen: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Erik Wien: "[boost] Re: Re: Any interest in adding unicode support to boost?"

Rogier van Dalen wrote:
> On Wed, 20 Oct 2004 23:05:08 +0300, Peter Dimov <pdimov_at_[hidden]>
> wrote:
>> Rogier van Dalen wrote:
>>>
>>> // The actual Unicode string
>>> template <class CodeUnits, class NormalisationForm,
>>> class ErrorChecking>
>>> class string
>>
>> By using ErrorChecking as a template parameter, you are encoding it
>> as part of the string type, but this is not necessary, because there
>> is no difference between values of strings with different
>> ErrorChecking policies (ErrorChecking does not change the
>> invariant). You should just provide different member functions for
>> the two ErrorChecking behaviors, or pass the ErrorChecking parameter
>> to the member functions that require it.
>
> I hadn't yet looked at it this way, but you are right from a
> theoretical point of view at least. To get more to practical matters,
> what do you think this should do:
>
> unicode::string s = ...;
> s += 0xDC01; // An isolated surrogate, which is nonsense
>
> ?
> Should it throw, or convert the isolated surrogate to U+FFFD
> REPLACEMENT CHARACTER (Unicode standard 4 Section 2.7), or something
> else?

Whatever is most common. My choice would probably be 'throw', but I haven't
used Unicode strings enough to have a strong opinion.

> And what should the member function with the opposite behaviour
> be called?

s.append( 0xDC01 ); // default (throw), += alias

// pick your favorite from the list below

s.append_and_correct( 0xDC01 );
s.append( 0xDC01, unicode::convert_on_error );
s.append<unicode::convert_on_error>( 0xDC01 );

I'd go with the first option based on general principles, all else being
equal. There is also

unicode::append_and_correct( s, 0xDC01 );

if the operation can be performed in "user space", i.e. doesn't need to be a
friend of the string class. Or

s += unicode::correct( 0xDC01 );

if the automatic correction does not depend on the left side.

Next message: Rogier van Dalen: "Re: [boost] Re: Re: Re: Any interest in adding unicode support to boost?"
Previous message: Rogier van Dalen: "Re: [boost] Re: Any interest in adding unicode support to boost?"
In reply to: Rogier van Dalen: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Erik Wien: "[boost] Re: Re: Any interest in adding unicode support to boost?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk