Boost logo

Boost :

Subject: Re: [boost] Review Request: Boost.Locale
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2010-05-25 13:12:26


On 05/24/2010 11:15 PM, Artyom wrote:
>>
>> Well, that's not exactly true. mbstate_t is defined by the
>> C standard, and indeed, it says pretty much nothing about
>> its nature, except that it's not an array. But on any
>> platform I worked with (including Windows) it's an integer.
>
>
> Ö¹Under Linux it is structure and AFAIK gcc uses iconv for conversion.
>
> So I'm not sure how safe is to write anything to it.

Ah, right. I forgot about Linux. But still it's POD and can hold an
integral value. How it is used by the standard facet is not relevant as
long as you don't interchange states between your facet and the standard
one.

>> The C++ standard does give some hints regarding how the
>> conversion state shall be handled by the stream. In
>> particular, it specifies that the state will be
>> value-initialized at the beginning of the conversion, and it
>> will call `shift` at the end of the conversion in order to
>> finalize the converted character sequence and return the
>> state to its initial value.
>
> I was thinking about it but unfortunately standard does not specify
> how mbstate_t initialized. If I could assume that it is at leaset
> POD filled with zeros I could do something but I actually can't.

It is POD since it's defined by the C standard.

> At least I didn't find any reference for this.

The C standard describes that the zero-valued mbstate_t shall count as
an initial state. From n1256:

   7.24.6 Extended multibyte/wide character conversion utilities

   ...

   3 The initial conversion state corresponds, for a conversion in
     either direction, to the beginning of a new multibyte character in
     the initial shift state. A zero-valued mbstate_t object is (at
     least) one way to describe an initial conversion state.
     A zero-valued mbstate_t object can be used to initiate conversion
     involving any multibyte character sequence, in any LC_CTYPE
     category setting.

   ...

Also, there is the mbsinit function that allows to detect if the state
has the initial value (just in case there are other initial values,
other than zero-filled).

Next, for do_in/do_out the C++ standard says (22.2.1.5.2):

   1 Preconditions: [...] state initialized,
     if at the beginning of a sequence, or else equal to the result of
     converting the preceding characters in the sequence.

and further on, in the paragraph 5 (regarding do_unshift), there is a
footnote that explains that the method is intended to return the state
to the initial value (typically, stateT()).


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk