Boost logo

Boost :

Subject: Re: [boost] [Locale] Preview of 3rd version
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-09-11 15:57:25


On 11/09/2010 20:34, Artyom wrote:

> Ahh I see, I do following:
>
> When I read for example 4 byes of UTF-8 that go to codepoint> 0xFFFF
> I do following:
>
> 1. I write first surrogate pair to output stream,
> I update the state to reflect that first part of the pair was written and
> **I do not consume input**
> 2. Same 4 utf-8 bytes again and see that state is marked to
> that first part of pair was written so I write the second and consume the
> input.
>
> So actually do_in called twice for same input.

The code in question is in loop that keeps on going until from reaches
from_end or the conversion fails (due to insufficient input or
otherwise), so both surrogates should be written in the same do_in
invocation.

> Actually the mbstate_t is POD type that should be initialized to 0. I must make
> sure that
> sizeof(mbstate_t)>= 2, and then I use it as temporary storage for state.

I'm not talking about that, I meant the reinterpret casting between
uchar and uint_type, but actually I suppose they're the same, maybe just
different signedness, so that should be somewhat ok.
It's still not allowed by the strict aliasing rules though.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk