Subject: [Boost-bugs] [Boost C++ Libraries] #7744: make_u32regex() performs insufficient UTF-8 validation
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2012-11-28 08:31:12
#7744: make_u32regex() performs insufficient UTF-8 validation
------------------------------+---------------------------------------------
Reporter: anonymous | Owner: johnmaddock
Type: Bugs | Status: new
Milestone: To Be Determined | Component: regex
Version: Boost 1.52.0 | Severity: Problem
Keywords: |
------------------------------+---------------------------------------------
The program below shows a segfault for regular expression ".*\xf6.*".
AFAIK the maximum value allowed as leading byte for 4-byte sequences is
0xF4. I would expect an exception.
Regular expression ".*\xe4.*" is created without exception. However 0xE4
starts a 3-byte character and no trailing bytes are present. I would
expect an exception here too.
We use Boost 1.52.0 together with ICU 50.1. The behavior is the same in
Linux and Windows.
{{{
#include <boost/regex/icu.hpp>
int main(void)
{
// this line does not throw an exception although this is not valid
UTF-8
boost::u32regex(boost::make_u32regex(".*\xe4.*"));
// this line segfaults
boost::u32regex(boost::make_u32regex(".*\xf6.*"));
return 0;
}
}}}
-- Ticket URL: <https://svn.boost.org/trac/boost/ticket/7744> Boost C++ Libraries <http://www.boost.org/> Boost provides free peer-reviewed portable C++ source libraries.
This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:11 UTC