Boost logo

Boost :

Subject: Re: [boost] boost utf-8 code conversion facet has security problems
From: Artyom (artyomtnk_at_[hidden])
Date: 2010-10-18 05:52:13


Few points:

1. wchar_t comes from C and was defined first time when it was indeed enough 16
bits and the size wasn't defined
   specifically
2. Keeping compatible ABI is very important, so you should just get used to fact
that wchar_t may be 2 or 4
   bytes
3. Ignoring non-BMP places is very bad idea, so you should assume that
std::wstring is UTF-16 or UTF-32
   and not UCS-2 or UCS-4!
4. If you write new applications, don't use wchar_t, use std::string and UTF-8
this is perfectly good
   and right solution.
5. If you support wchar_t - support UTF-16 and support is well.

So IMHO the facet we talk about should support UTF-16 as well and should work
with correct definition
of UTF-8.

You can blame standards, you can blame Microsoft or even Unicode but you still
have to live with it,
and as long as you live with it - do it right.

Artyom

P.S.: I think UTF-16 should die:
     
 http://stackoverflow.com/questions/1049947/should-utf-16-be-considered-harmful
P.P.S.: As long as Boost supports wchar_t I believe it should support UTF-16
even
        it is a nightmare.

----- Original Message ----
> From: Sebastian Redl <sebastian.redl_at_[hidden]>
> To: boost_at_[hidden]
> Sent: Mon, October 18, 2010 9:36:17 AM
> Subject: Re: [boost] boost utf-8 code conversion facet has security problems
>
> On 18.10.2010 08:07, Patrick Horgan wrote:
> > On 10/16/2010 06:10 AM, Sebastian Redl wrote:
> >> On 16.10.2010, at 00:23, Patrick Horgan wrote:
> >>
> >>> Support of the recent C++ drafts requires a char32_t basic type anyway, so
>I can't imagine anyone using a 16-bit wchar_t going forward,
> >> There's absolutely no way Windows programming will ever change wchar_t away
>from 16 bits, and people will continue to use it.
> > Then that implies that it can only hold UCS2. That's a choice. In C99, the
>type wchar_t is officially intended to be used only for 32-bit ISO 10646
>values, independent of the currently used locale. C99 subclause 6.10.8
>specifies that the value of the macro __STDC_ISO_10646__
> > shall be "an integer constant of the form yyyymmL (for example, 199712L),
>intended to indicate that values of type wchar_t are the coded representations
>of the characters defined by ISO/IEC 10646, along with all amendments and
>technical corrigenda as of the specified year and month." Of course Microsoft
>isn't able to define that, since you can't hold 20 bits in a 16 bit data type.
>
> Microsoft defines wchar_t to be a UTF-16 2-byte unit, screw the standards.
>
> Sebastian
> _______________________________________________
> Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost
>

      


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk