Boost logo

Boost :

Subject: [boost] [review] Boost.Nowide is Accepted into Boost
From: Frédéric Bron (frederic.bron_at_[hidden])
Date: 2017-06-23 13:34:09


Dear all,

I would like to thank all the people who took part in the discussions
during the review. All contributors admitted that Boost.Nowide
addresses a real issue and solves it, at least partially. Moreover we
received 8 official reviews with 6 “accept”, 1 “do not accept as is”
and 1 “I am not against it”.

So Boost.Nowide is accepted for inclusion in Boost.
Congratulation and many thanks to Artyom Beilis for this contribution.

However, 2 major issues arose in the discussions and they will have to
be fixed before inclusion:

1. handling of ill-formed Unicode input should be improved/clarified
2. the documentation needs improvements

In particular, this needs to be fixed:

* Design:

There was a lot of discussions on what to do with ill-formed UTF-16
strings on Windows, in particular coming from the file system while
converting them to narrow UTF-8 strings. Basically, 3 options:
1. allow the roundtrip ill formed UTF-16 > UTF-8 > ill formed UTF-16,
this is not possible with UTF-8 encoding, WTF-8 (not a standard) may
be an option with some issues
2. issue an error (today’s nowide option, even if the author would
like to move to 3.)
3. convert as much as possible to fully conformant UTF-8 with
character replacement U+FFFD for invalid input. This is what the NT
kernel does with functions like RtlUTF8ToUnicodeN. This would allow
cout to continue to work after invalid output instead of failing.

After reviewing what was chosen for std::filesystem in C++17, I think
3. is the right think to do:
- std::filesystem::path::u8string does not convert strings on Posix.
If a conversion is done (on Windows) the output is fully conformant
UTF-8.
- the roundtrip is not guaranteed in std::filesystem::path conversions
(implementation defined) but the roundtrip of a converted paths is
guaranteed to work which is the case with Boost.Nowide: once in fully
conformant UTF-8, narrow > wide > narrow works fine.
- the use of the replacement character U+FFFD is what is done by the
NT kernel (question: do you have to reimplement RtlUTF8ToUnicodeN and
RtlUnicodeToUTF8N, why not using them)?

It should been made clear in the doc that this choice does not
guarantees the roundtrip on Windows so that ill-formed filename may
not be opened.

* Documentation:

- the main part of the doc (the non-reference part) should be more detailed
- it should be clear to the reader that Windows and Posix platforms
are not handled the same; on Windows, we get UTF-8 strings while on
Posix platforms, the user should not expect UTF-8 encoding, just
narrow strings, even if UTF-8 is today the most common encoding
(addressed in Q/A but is probably too far away). It must be clear from
the beginning that on Posix platforms, the library does nothing
(including no checking of UTF-8 conformance), just forward to the
standard library.
- review the use of UTF-8 so that it do not let think that narrow
strings are always UTF-8 as this is not necessary the case on Posix
platform; maybe just use narrow string?
- clarify what is converted to what and when: for each function, we
need to know precisely what is the output for all possible input; in
particular, better explain for each function what the library does
with ill-formed input (either UTF-16 input or narrow input), in
particular for args and getenv; say clearly what happens (failbit,
error, exception, invalid character…)
- clarify what is done for Windows and what is (not) done for Posix
- explain better what is behind nowide::cout, cin, cerr, what is done?
- clarify what does the filesystem integration on Windows vs Posix
- check if basic_stackstring::swap needs to swap buffer_ when both
strings are on the stack
- The doc/ directory is missing a Jamfile, and there's also no
meta/libraries.json.

* Implementation:

- The global BOOST_USE_WINDOWS_H macro should probably be respected.

When windows.h is not used, the WinAPI prototypes are declared in the
global namespace. This can cause issues when windows.h is also
included before/after the Nowide headers, and the prototypes don't
match exactly (e.g. short vs. wchar_t). Also it pollutes the global
namespace. Nowide should do what other Boost libraries do, and declare
the prototypes in a private detail namespace instead.

Example in <boost/thread/win32/thread_primitives.hpp>. Need to add a
namespace around your definitions.

- MinGW gcc: fix warnings in c++03 mode and errors in c++11 and above
- Cygwin gcc/clang (errors because of missing ::getenv and friends)
- when this is fixed, an error remains:
in file included from test\test_system.cpp:11:0:
..\../boost/nowide/cenv.hpp: In function 'int
boost::nowide::putenv(char*)': ..\../boost/nowide/cenv.hpp:104:41:
warning: comparison between pointer and zero character constant
[-Wpointer-compare] while(*key_end!='=' && key_end!='\0')
- Under MSYS, one test failed
Fail: Error boost::nowide::cin.putback(c) in test_iostream.cpp:17 main
- filesystem integration should return somehow the previous locale for
potential undoing

* Misc:

- There are no .travis.yml and appveyor.yml files in the root.
- a number of copy/paste references to Boost.System (test/Jamfile) and
Boost.Locale (index.html), should be corrected

This are just suggestions and are not mandatory for acceptance:
- implement stat/opendir/readir
- integration with boost::interprocess::file_lock
- integration with boost::program_options (in particular for correct
line breaks when showing options)

Frédéric


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk