[Boost-bugs] [Boost C++ Libraries] #7744: make_u32regex() performs insufficient UTF-8 validation

Subject: [Boost-bugs] [Boost C++ Libraries] #7744: make_u32regex() performs insufficient UTF-8 validation
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2012-11-28 08:31:12


#7744: make_u32regex() performs insufficient UTF-8 validation
------------------------------+---------------------------------------------
 Reporter: anonymous | Owner: johnmaddock
     Type: Bugs | Status: new
Milestone: To Be Determined | Component: regex
  Version: Boost 1.52.0 | Severity: Problem
 Keywords: |
------------------------------+---------------------------------------------
 The program below shows a segfault for regular expression ".*\xf6.*".
 AFAIK the maximum value allowed as leading byte for 4-byte sequences is
 0xF4. I would expect an exception.

 Regular expression ".*\xe4.*" is created without exception. However 0xE4
 starts a 3-byte character and no trailing bytes are present. I would
 expect an exception here too.

 We use Boost 1.52.0 together with ICU 50.1. The behavior is the same in
 Linux and Windows.

 {{{
 #include <boost/regex/icu.hpp>

 int main(void)
 {
     // this line does not throw an exception although this is not valid
 UTF-8
     boost::u32regex(boost::make_u32regex(".*\xe4.*"));
     // this line segfaults
     boost::u32regex(boost::make_u32regex(".*\xf6.*"));
     return 0;
 }
 }}}

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/7744>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:11 UTC