Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Groke, Paul (paul.groke_at_[hidden])
Date: 2017-06-13 09:30:46
Artyom Beilis wrote:
> On Tue, Jun 13, 2017 at 12:05 AM, Peter Dimov via Boost
> <boost_at_[hidden]> wrote:
>> Artyom Beilis wrote:
>>> - User creates a file with invalid UTF-16
>>> - System monitors the file system and adds it to the XML report
>>> WTF-8 format
>>> - The central server does not accept the XML since it fails UTF-8
>>> - User does whatever he wants without monitoring
>>> - It removes the file
>>> - There were no reports generated during the period user needed
>>> -DOS attack
>> I can't help but note that the same attack would work under Unix. The
>> user can easily create a file with an invalid UTF-8 name. And, since
>> the library doesn't enforce valid UTF-8 on POSIX (right?) it would pass
> Note, under POSIX user takes strings as is and can't trust the source.
> Under Windows it need to convert them using nowide which can give him
> false assumption that it receives valid UTF-8.
OK, thanks for explaining, I understand your concern now. I don't agree that a library like Nowide should necessarily try to address that kind of issue though.
Regarding your example... isn't the whole point of Nowide to make it possible to write portable applications without #ifdef-ing stuff? So ... for this to create a problem on Windows (and at the same time not create a problem on POSIX), a bunch of things would have to coincide
- Programmer didn't read manual and therefor assumes to get UTF-8 on Windows
- Application then wants to write some cmdline argument/path/... into some document that requires valid UTF-8
- The software component that builds/writes that document fails to validate the UTF-8 itself
- Programmer decides to skip the UTF-8 validation on Windows, because he assumes it has already been done
Assuming that the application doesn't need an #ifdef _WIN32 in that part of the code anyway, that would mean that the programmer deliberately introduced an #ifdef _WIN32, just to selectively skip some validation on Windows, because he thinks that it might already have been done.
No, I wouldn't say that's something that should be driving the design decisions of a library like Nowide.
Additionally I don't see what validating the UTF-8 would improve here. In that case the user would still create a file with invalid UTF-16. Only then the application would even fail to enumerate the directory or convert the path. And I'm not convinced that kind of error is more likely to be handled correctly.
> Once again I have no problem providing wtf8 to wide and other way around
> functions when user EXPLICITLY says it.
> But it shell not be default behavior or some behavior you turn on with some
> global define.
The whole appeal of a library like Nowide is that it works more or less transparently - i.e. without having to write conversion calls all over the place. And duplicating all APIs is something that I would deem a very bad choice - that would be confusing as hell.