From: Vladimir Prus (ghost_at_[hidden])
Date: 2004-08-02 10:27:55
Tilman Kuepper <kuepper <at> xgraphic.de> writes:
> Hello world,
> I took a closer look at the UTF-8 codecvt facet which is part
> of the program_options library. A test program is attached.
> The last assert (in the Read-function) fails with g++ (GCC)
> 3.3.3 (Debian 20040429).
> After some debugging I think I found the problem:
Could you clarify where the problem is? Does it break program_options, or does
it break some use of UTF-8 that you make?
> The function
> utf8_codecvt_facet_wchar_t::do_in() converts only valid (com-
> plete) UTF-8 sequences into internal (wchar_t) characters. In
> case the input buffer ends with an incomplete UTF-8 character,
> do_in() returns codecvt_base::partial and points from_next at
> the beginning of this incomplete UTF-8 sequence.
Oh.... this 'partial' is messy thing. I think I though it means 'partial
character found', but later figured out it means something different. I
think I even fixed a bug with incorrectly returned 'partial' in that facet
some time ago.
> Obviously the library (libstdc++) is surprised by the fact
> that the codecvt facet stops the translation, although there
> is still room in the output buffer (i. e. to_next != to_end)
> and not all input characters have been processed (from_next
> != from_end).
> As a consequence the for-loop in the test program stops too
> early (wifstream not "good" any longer) and assert(pos ==
> wstr.size()) fails.
> Is this a known issue with the GNU library or with the UTF-8
> conversion facet? And what can be done?
Unless somebody else can shed some light, there are two choices:
1. You can wait until I'm back from vacation.
2. You can figure out the exact meaning of 'partial' and send a patch.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk