Subject: Re: [boost] boost utf-8 code conversion facet has security problems
From: Artyom (artyomtnk_at_[hidden])
Date: 2010-10-15 05:18:56
Actually I want to mention that UTF-8 codecvt facet implementation
has several other problems:
1. When sizeof(wchar_t)==2 it supports only UCS-2 and not full UTF-16
2. It is indeed does not strictly assumes that maximal encoding of
single UTF-8 character is 4.
In Boost.Locale I had implemented the full UTF-8 codecvt facet
that supports both UTF-16 and UTF-32 I assume that this code
can replace current implementation, even thou it should
be extracted from Boost.Locale iw this facet is more generic
and supoorts other encodings as well.
Note, this UTF-8 facet does not depend on external library.
> I've been meaning to mention this for some time. The boost utf-8 code
> conversion facet implements an early spec of utf-8 that allows up to 6
> byte representations but current specs, and security issues suggest it
> should only support up to four. See
> http://en.wikipedia.org/wiki/UTF-8 and in particular the section on
> invalid byte sequences. It also has some stuff wrong, like do_length()
> is supposed to only tell you length of valid code sequences, but the
> boost implementation doesn't check for validity.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk