Subject: Re: [boost] [regex] How robust are the <boost/regex/pending/unicode_iterator.hpp> adapters?
From: Soares Chen Ruo Fei (crf_at_[hidden])
Date: 2011-07-20 05:14:13
On Tue, Jul 19, 2011 at 4:24 PM, John Maddock <boost.regex_at_[hidden]> wrote:
>>> Yes, they can read past the end of your input range if it contains
>>> data at the end.
>> Interesting. Would a fix be difficult?
> I was about to say there aren't any known issues, but yes that is a problem
> - and a fix would mean changing the interface - the problem comes because
> the iterators only store the current position in the underlying sequence and
> assumes that they can increment or decrement over a complete multi-byte
> sequence. So if your underlying sequence contains a *truncated* multibye
> sequence at the start or end of the string then they can read past-the-end
> or even past-the-start :-(
> The only real fix is to redesign them to be range-based, so we can add the
> additional checks necessary, but of course this also makes them more
> heavyweight than they are at present. I guess I was hoping we would have
> had a proper Unicode library for this by now (in Boost that is, not the
> sandbox ;)
> Oh well, maybe I should just bite the bullet and change/fix this hole.
In my GSoC project I am currently developing a Unicode string adapter
library that wraps and add Unicode awareness to conventional string
types such as std::string. Not sure if that helps but if you are
developing new library APIs I think this might be useful. I still have
not completed the documentation but you can look at the draft at
http://crf.scriptmatrix.net/ustr/. The code repository is available at
(Sorry, no means to hijack the thread but hope that helps.)
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk