Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2017-06-13 20:59:13


> > Now as you have seen there are many possible "non-standard" UTF-8 variants.
> >
> > What should I accept?
>
> I still strongly suggest you simply call RtlUTF8ToUnicodeN()
> (https://msdn.microsoft.com/en-us/library/windows/hardware/ff563018(v=vs.85).aspx
> <https://msdn.microsoft.com/en-us/library/windows/hardware/ff563018(v=vs.85).aspx>)
> to do the UTF-8 conversion. Do **nothing** else.
>
>
> Niall, could you explain why? I don't know any of the Windows-relevant
> details.

1. RtlUTF8ToUnicodeN() is what the NT kernel uses and isn't polluted by
Win32.

2. RtlUTF8ToUnicodeN() has a well designed API unlike the awful Win32
MultiByteToWideChar() function.

3. RtlUTF8ToUnicodeN() is close to as fast as any implementation of the
same thing, unlike MultiByteToWideChar() and some STL implementations of
<codecvt>.

4. RtlUTF8ToUnicodeN() treats invalid input in the way which the rest of
the NT kernel is built to expect. I caveat this with saying that Win32
functions can mangle input in a really unhelpful way, I would only trust
RtlUTF8ToUnicodeN() being fed directly to NT kernel APIs. That I know
works well.

5. I've been using RtlUTF8ToUnicodeN() in my own code for years and have
found it unsurprising and unproblematic. Unlike MultiByteToWideChar() or
C++ 11's UTF-8 support which doesn't quite work right on some standard
libraries.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk