Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2017-06-12 19:02:29

On Mon, Jun 12, 2017 at 9:51 PM, Peter Dimov via Boost <
boost_at_[hidden]> wrote:

> ... whereas under Windows if invalid UTF-8 is allowed many different byte
> sequences may map to the same file name.

This is a false presumption. Nobody here proposes allowing absolutely ANY
byte sequences, only using WTF-8 as means of guaranteeing a round-trip. And
as far as WTF-8 goes there is a unique representation for every 16-bit
codeunit sequence.

> With all that said, I don't quite see the concern with WTF-8. What's the
> attack we're defending from by disallowing it?

There are some concerns with WTF-8, specifically if you concatenate two
WTF-8 strings where one ends in an unpaired surrogate whereas the other
begins with one, then the result is an invalid WTF-8 string. Filenames are
usually parsed and concatenated on ASCII separators, so I don't see a
problem in the typical use-case. As for the non-typical use cases, I would
argue that they are beyond the responsibility of this library.

Yakov Galka

Boost list run by bdawes at, gregod at, cpdaniel at, john at