Subject: [Boost-bugs] [Boost C++ Libraries] #9473: make_u32regex() accepts illegal UTF-8
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2013-12-05 11:23:06
#9473: make_u32regex() accepts illegal UTF-8
-----------------------------------------+-------------------------
Reporter: Peter Klotz <peter.klotz@â¦> | Owner: johnmaddock
Type: Bugs | Status: new
Milestone: To Be Determined | Component: regex
Version: Boost 1.54.0 | Severity: Problem
Keywords: |
-----------------------------------------+-------------------------
The attached example shows that make_u32regex() accepts two kinds of
illegal UTF-8.
It accepts codepoints reserved for UTF-16 surrogate pairs encoded as
3-byte UTF-8 characters, e.g. "\xed\xa0\x80" representing U+D800.
It accepts overlong UTF-8 encodings where the codepoint value has been
extended to the left with additional zero bits, e.g. "\xc0\x80"
representing U+0000 whereas its correct 1-byte encoding is "\x00".
Boost.Locale already contains code to protect against overlong encodings
(see method width() in
https://svn.boost.org/svn/boost/trunk/boost/locale/utf.hpp).
-- Ticket URL: <https://svn.boost.org/trac/boost/ticket/9473> Boost C++ Libraries <http://www.boost.org/> Boost provides free peer-reviewed portable C++ source libraries.
This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:14 UTC