Boost logo

Boost :

Subject: Re: [boost] Review Request: Boost.Locale
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2010-05-24 11:24:42


On 05/24/2010 05:06 PM, Artyom wrote:
>>
> - There is absolutely no information given about std::mbstate_t that
> should save intermediate data between conversions so, there is actually
> no way to pass anything between sequential calls of
> std::locale::codecvt<...>::in/out. So even if I observe first surrogate
> pair there is no way to pass this information for next call and thus
> I loose this information

Well, that's not exactly true. mbstate_t is defined by the C standard,
and indeed, it says pretty much nothing about its nature, except that
it's not an array. But on any platform I worked with (including Windows)
it's an integer. I think, it is perfectly fair to assume that it is at
least a POD and sizeof(mbstate_t) >= 1, which makes it possible to store
information about surrogate pairs in it.

The C++ standard does give some hints regarding how the conversion state
shall be handled by the stream. In particular, it specifies that the
state will be value-initialized at the beginning of the conversion, and
it will call `shift` at the end of the conversion in order to finalize
the converted character sequence and return the state to its initial value.

Not that it makes it easier to use mbstate_t with UCI under the hood,
but it seems possible (theoretically, at least) to implement the
complete UTF-16 <-> char conversion with it.

PS: I don't pretend that I'd learned the standards by heart. All the
references are off the top of my head. :)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk