From: Peter Dimov (pdimov_at_[hidden])
Date: 2020-01-08 01:16:01
Gavin Lambert wrote:
> But the conversion from WTF-8 to UCS-16 can interpret the joining point as
> a different character, resulting in a different sequence. Unless I've
> misread something, this could occur if the first string ended in an
> unpaired high surrogate and the second started with an unpaired low
> surrogate (or rather the WTF-8 equivalents thereof).
I don't see why do you think this would present a problem. The conversion of
the first string will end in an unpaired high surrogate. The conversion of
the second string will start with an unpaired low surrogate. The two, when
concatenated, will form a valid UTF-16 encoding of a non-BMP character.
Where is the issue here?