Boost logo

Boost :

Subject: Re: [boost] [regex] How robust are the <boost/regex/pending/unicode_iterator.hpp> adapters?
From: Dave Abrahams (dave_at_[hidden])
Date: 2011-07-19 10:39:10

on Tue Jul 19 2011, John Maddock <> wrote:

>>>> Boost.Filesystem needs the UTF-32 to UTF-16 and UTF-16 to UTF-32
>>>> adapters to implement char16_t and char32_t support. Do they have any
>>>> known bugs or other outstanding problems?
>>> Yes, they can read past the end of your input range if it contains
>>> invalid
>>> data at the end.
>> Interesting. Would a fix be difficult?
> I was about to say there aren't any known issues, but yes that is a
> problem - and a fix would mean changing the interface - the problem
> comes because the iterators only store the current position in the
> underlying sequence and assumes that they can increment or decrement
> over a complete multi-byte sequence. So if your underlying sequence
> contains a *truncated* multibye sequence at the start or end of the
> string then they can read past-the-end or even past-the-start :-(
> The only real fix is to redesign them to be range-based, so we can add
> the additional checks necessary, but of course this also makes them
> more heavyweight than they are at present. I guess I was hoping we
> would have had a proper Unicode library for this by now (in Boost that
> is, not the sandbox ;)

What about just asking people who aren't sure if they're processing
invalid unicode to add some sentinel bytes? Wouldn't that work?

Dave Abrahams
BoostPro Computing

Boost list run by bdawes at, gregod at, cpdaniel at, john at