Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Artyom Beilis (artyom.beilis_at_[hidden])
Date: 2017-06-19 19:57:24


On Mon, Jun 19, 2017 at 8:19 PM, Zach Laine via Boost
<boost_at_[hidden]> wrote:
> On Sun, Jun 11, 2017 at 11:20 PM, Frédéric Bron via Boost <
> boost_at_[hidden]> wrote:
>
>
>> - What is your evaluation of the design?
>>
>
> 1) I'd really much rather have an iterator-based interface for the
> narrow/wide conversions. There's an existing set of iterators in
> Boost.Regex already, and I've recently written one here:
>
> https://github.com/tzlaine/text/blob/master/boost/text/utf8.hpp

As well in Boost.Locale - Boost.Nowide uses UTF conversions
from header only part of Boost.Locale.

>
> The reliance on a new custom string type is a substantial mistake, IMO
> (boost::nowide::basic_stackstring).
> [snip]
> (possibly cribbing the one of the two implementations above) would negate
> the need for this new string type -- I could use the existing std::string,
> MyString, QString, a char buffer, or whatever. Also, I'd greatly prefer
> that the new interfaces be defined in terms of string_view instead of
> string/basic_stackstring (there's also a string_view implementation already
> Boost.Utility). string_view is simply far more usable, since it binds
> effortlessly to either a char const * or a string.
>
stackstring is barely on-stack buffer optimization - it isn't general string
and never intended to be so.

It isn't in details because it can actually be useful outside library scope.

> Providing an iterator interface

There is no iterator interface in communication with OS
so no sense to add it there. If you want character by character
interface it exists in Boost.Locale

> 2) I don't really understand what happens when a user passes a valid
> Windows filename that is *invalid* UTF-16 to a program using Nowide. Is
> the invalid UTF-16 filename just broken in the process of trying to convert
> it to UTF-8? This is partially a documentation problem, but until I
> understand how this is intended to work, I'm also counting it as a design
> issue.
>

You are right it wasn't clear in the docs. My policy was return a error in
case of invalid encoding but after comments I received I'll rather
replace invalid sequences with U-FFFD Replacement Character
since it was what WinAPI usually does.

https://github.com/artyom-beilis/nowide/issues/16

It is something easy to fix.

>> - What is your evaluation of the documentation?
>
> I think the documentation needs a bit of work. The non-reference portion
> is quite thin, and drilling down into the reference did not answer at least
> one question I had (the one above, about invalid UTF-16):
>

Actually documentation points out that in case of invalid character
encoding a error is returned and how (in most of places - but
there are some missies)

However I agree that it can be improved
and a special section regarding bad encoding conversion policy
should and will be added.

>
> It tells me nothing about what happens when invalid UTF-16 is encountered.
> Is there an exception? Is 0xfffd inserted? If the latter, am I just
> stuck? I should not have to read any source code to figure this out, but
> it looks like I have to.

Valid point.

>
>> And most importantly:
>> - Do you think the library should be accepted as a Boost library? Be
>> sure to say this explicitly so that your other comments don't obscure
>> your overall opinion.
>>
>
> I do not think the library should be accepted in its current form. It
> seems not to handle malformed UTF-16, which is a requirement for processing
> Windows file names (as I understand it -- please correct this if I'm
> wrong).

I strongly disagree about it. Same as converting invalid ANSI encoding
shouldn't lead to invalid UTF-16 or the other way around.

Since there is no way to present invalid UTF-16 using valid UTF-8
same as there is no way to do it in ANSI<->UTF-16 there is
no non-ambiguous way to provide such a support.

>
> Zach
>

Thank You for your review.

Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk