|
Boost Users : |
Subject: Re: [Boost-users] boost::filesystem::path in UTF-8 on Windows
From: Andrey Moshbear (andrey.vul_at_[hidden])
Date: 2011-11-05 07:43:54
On Fri, Nov 4, 2011 at 22:54, Andrey Moshbear <andrey.vul_at_[hidden]> wrote:
> On Fri, Nov 4, 2011 at 11:28, Igor R <boost.lists_at_[hidden]> wrote:
>>> If I have a string that is in UTF-8, how do I tell the path constructor?
>>>
>>> Â Â path p1 ("my utf8 data", SOME_CODECVT);
>>>
>>> I think it is a matter of passing the right SOME_CODECVT. What is it?
>>> The path::value_type is wchar_t, according to the docs.
>>
>> On Windows you should convert it to utf16.
>
> Word of warning: the boost utf8 codecvt will cause undefined
> operations if you have and cps above U+FFFF. You'll have to hack do_in
> to and do_out in order to emit/parse surrogate pairs. Also, hack
> do_length to increment the counter by 2 for cp>0xFFFF.
>
For my rewrite of UTF-8 to UTF-16/32, look at
https://github.com/moshbear/fastcgipp/blob/master/src/utf8_cvt.cpp.
While it can still decode above U+10FFFF, it's still more RFC 3629
compliant than utf8_codecvt_facet. It also supports true UTF-16.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net