[Boost-bugs] [Boost C++ Libraries] #9473: make_u32regex() accepts illegal UTF-8

Subject: [Boost-bugs] [Boost C++ Libraries] #9473: make_u32regex() accepts illegal UTF-8
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2013-12-05 11:23:06


#9473: make_u32regex() accepts illegal UTF-8
-----------------------------------------+-------------------------
 Reporter: Peter Klotz <peter.klotz@…> | Owner: johnmaddock
     Type: Bugs | Status: new
Milestone: To Be Determined | Component: regex
  Version: Boost 1.54.0 | Severity: Problem
 Keywords: |
-----------------------------------------+-------------------------
 The attached example shows that make_u32regex() accepts two kinds of
 illegal UTF-8.

 It accepts codepoints reserved for UTF-16 surrogate pairs encoded as
 3-byte UTF-8 characters, e.g. "\xed\xa0\x80" representing U+D800.

 It accepts overlong UTF-8 encodings where the codepoint value has been
 extended to the left with additional zero bits, e.g. "\xc0\x80"
 representing U+0000 whereas its correct 1-byte encoding is "\x00".

 Boost.Locale already contains code to protect against overlong encodings
 (see method width() in
 https://svn.boost.org/svn/boost/trunk/boost/locale/utf.hpp).

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/9473>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:14 UTC