Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Frédéric Bron (frederic.bron_at_[hidden])
Date: 2017-06-19 17:37:20


Question to all:

Why should we try to handle wrong UTF-16 (or wrong UTF-8)?
1. such files should not exist
2. if they exist, why?
  - if it is because it is an old file, can the user just rename it properly?
  - if it is because it was produced by a program, why should this
program continue to work without fixing? Isn't it the best way that we
get wrong filenames forever?

I do not understand why we cannot just issue an error.

Thanks for explanations,

Frédéric

2017-06-19 19:19 GMT+02:00 Zach Laine via Boost <boost_at_[hidden]>:
> On Sun, Jun 11, 2017 at 11:20 PM, Frédéric Bron via Boost <
> boost_at_[hidden]> wrote:
>
>> Hi Everyone,
>>
>> The formal review of Artyom Beilis' Nowide library starts today and
>> will last until Wed. 21st of June.
>> [snip]
>> Please post your comments and review to the boost mailing list
>> (preferably), or privately to the Review Manager (to me ;-). Here are
>> some questions you might want to answer in your review:
>>
>
>
>
>> - What is your evaluation of the design?
>>
>
> 1) I'd really much rather have an iterator-based interface for the
> narrow/wide conversions. There's an existing set of iterators in
> Boost.Regex already, and I've recently written one here:
>
> https://github.com/tzlaine/text/blob/master/boost/text/utf8.hpp
>
> The reliance on a new custom string type is a substantial mistake, IMO
> (boost::nowide::basic_stackstring). Providing an iterator interface
> (possibly cribbing the one of the two implementations above) would negate
> the need for this new string type -- I could use the existing std::string,
> MyString, QString, a char buffer, or whatever. Also, I'd greatly prefer
> that the new interfaces be defined in terms of string_view instead of
> string/basic_stackstring (there's also a string_view implementation already
> Boost.Utility). string_view is simply far more usable, since it binds
> effortlessly to either a char const * or a string.
>
> 2) I don't really understand what happens when a user passes a valid
> Windows filename that is *invalid* UTF-16 to a program using Nowide. Is
> the invalid UTF-16 filename just broken in the process of trying to convert
> it to UTF-8? This is partially a documentation problem, but until I
> understand how this is intended to work, I'm also counting it as a design
> issue.
>
>> - What is your evaluation of the implementation?
>
> I did not look.
>
>> - What is your evaluation of the documentation?
>
> I think the documentation needs a bit of work. The non-reference portion
> is quite thin, and drilling down into the reference did not answer at least
> one question I had (the one above, about invalid UTF-16):
>
> Looking at some example code in the "Using the Library" section, I saw this:
>
> "
> To make this program handle Unicode properly, we do the following changes:
>
> #include <boost/nowide/args.hpp>
> #include <boost/nowide/fstream.hpp>
> #include <boost/nowide/iostream.hpp>
> int main(int argc,char **argv)
> {
> boost::nowide::args a(argc,argv); // Fix arguments - make them UTF-8
> "
>
> Ok, so I clicked "boost::nowide::args", hoping for an answer. The detailed
> description for args says:
>
> "
> args is a class that fixes standard main() function arguments and changes
> them to UTF-8 under Microsoft Windows.
>
> The class uses GetCommandLineW(), CommandLineToArgvW() and
> GetEnvironmentStringsW() in order to obtain the information. It does not
> relates to actual values of argc,argv and env under Windows.
>
> It restores the original values in its destructor
> "
>
> It tells me nothing about what happens when invalid UTF-16 is encountered.
> Is there an exception? Is 0xfffd inserted? If the latter, am I just
> stuck? I should not have to read any source code to figure this out, but
> it looks like I have to.
>
> This criticism can be applied to most of the documentation. My preference
> is that the semantics of primary functionality of the library should be
> explained in tutorials or other non-reference formats. The current state
> of the docs doesn't even explain things in the references. This must be
> fixed before this library can be accepted.
>
>> - What is your evaluation of the potential usefulness of the library?
>
> I think this library is attempting to address a real and important issue.
> I just can't figure out if it's a complete solution, because how invalid
> UTF-16 is treated remains a question.
>
>> - Did you try to use the library? With what compiler? Did you have any
>
>> problems?
>>
>
> I did not.
>
> - How much effort did you put into your evaluation? A glance? A quick
>> reading? In-depth study?
>>
>
> A quick reading, plus a bit of discussion on the list.
>
>
>> - Are you knowledgeable about the problem domain?
>>
>
> I understand the UTF-8 issues reasonably well, but am ignorant of the
> Windows-specific issues.
>
>
>> And most importantly:
>> - Do you think the library should be accepted as a Boost library? Be
>> sure to say this explicitly so that your other comments don't obscure
>> your overall opinion.
>>
>
> I do not think the library should be accepted in its current form. It
> seems not to handle malformed UTF-16, which is a requirement for processing
> Windows file names (as I understand it -- please correct this if I'm
> wrong). Independent of this, I don't find the docs to be sufficient.
>
> Zach
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- 
Frédéric Bron
-----------------------------------------------------------
Frédéric Bron (frederic.bron_at_[hidden])
Villa des 4 chemins, Centre Hospitalier, BP 208
38506 Voiron Cedex
tél. fixe : +33 4 76 67 17 27, tél. port.: +33 6 67 02 77 35

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk