Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Artyom Beilis (artyom.beilis_at_[hidden])
Date: 2017-06-14 04:51:48


>> This is a very good question.
>>
>> On windows I need to convert for obvious reason.
>>
>> Now the question is what I accept as valid and what is not valid and
>> where do I draw the line.
>> ---------------------------------------------------------------------------------------------------------------------------
>>
>> Now as you have seen there are many possible "non-standard" UTF-8 variants.
>>
>> What should I accept?
>
> I still strongly suggest you simply call RtlUTF8ToUnicodeN()
> (https://msdn.microsoft.com/en-us/library/windows/hardware/ff563018(v=vs.85).aspx)
> to do the UTF-8 conversion. Do **nothing** else.
>
> Niall
>

Actually I think you provided me a good direction I hadn't considered before.

RtlUTF8ToUnicodeN and other way around function does something very simple:

It substitutes invalid codepoints/encoding with U+FFFD - REPLACEMENT CHARACTER
which is standard Unicode way to say I failed to convert a thing.

It is something similar to current ANSI / Wide conversions creating ? instead.

It looks like it is better way to do it instead of failing to convert
entire string all together.

If you get invalid string conversion will success but you'll get
special characters (that are usually marked as � in UI)
that will actually tell you something was wrong.

This way for example getenv on valid key will not return NULL and
create ambiguity of what happened and it is actually
something that is more common behavior in Windows.

I like it and I think I'll change the behavior of the conversion
functions in Boost.Nowide to this one

Thanks!

    Artyom Beilis


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk