Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [regex] How robust are the <boost/regex/pending/unicode_iterator.hpp> adapters?
From: Dave Abrahams (dave_at_[hidden])
Date: 2011-07-19 10:39:10

Next message: Dave Abrahams: "Re: [boost] [contract] oldof failure not in N1962?"
Previous message: Edward Diener: "Re: [boost] [TTI] Review for The Type Traits Introspection library by Edward Diener **extended**"
In reply to: John Maddock: "Re: [boost] [regex] How robust are the <boost/regex/pending/unicode_iterator.hpp> adapters?"
Next in thread: Soares Chen Ruo Fei: "Re: [boost] [regex] How robust are the <boost/regex/pending/unicode_iterator.hpp> adapters?"

on Tue Jul 19 2011, John Maddock <boost.regex-AT-virgin.net> wrote:

>>>> Boost.Filesystem needs the UTF-32 to UTF-16 and UTF-16 to UTF-32
>>>> adapters to implement char16_t and char32_t support. Do they have any
>>>> known bugs or other outstanding problems?
>>>
>>> Yes, they can read past the end of your input range if it contains
>>> invalid
>>> data at the end.
>>
>> Interesting. Would a fix be difficult?
>
> I was about to say there aren't any known issues, but yes that is a
> problem - and a fix would mean changing the interface - the problem
> comes because the iterators only store the current position in the
> underlying sequence and assumes that they can increment or decrement
> over a complete multi-byte sequence. So if your underlying sequence
> contains a *truncated* multibye sequence at the start or end of the
> string then they can read past-the-end or even past-the-start :-(
>
> The only real fix is to redesign them to be range-based, so we can add
> the additional checks necessary, but of course this also makes them
> more heavyweight than they are at present. I guess I was hoping we
> would have had a proper Unicode library for this by now (in Boost that
> is, not the sandbox ;)

What about just asking people who aren't sure if they're processing
invalid unicode to add some sentinel bytes? Wouldn't that work?

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Next message: Dave Abrahams: "Re: [boost] [contract] oldof failure not in N1962?"
Previous message: Edward Diener: "Re: [boost] [TTI] Review for The Type Traits Introspection library by Edward Diener **extended**"
In reply to: John Maddock: "Re: [boost] [regex] How robust are the <boost/regex/pending/unicode_iterator.hpp> adapters?"
Next in thread: Soares Chen Ruo Fei: "Re: [boost] [regex] How robust are the <boost/regex/pending/unicode_iterator.hpp> adapters?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk