Boost logo

Boost :

Subject: Re: [boost] [regex] How robust are the<boost/regex/pending/unicode_iterator.hpp> adapters?
From: John Maddock (boost.regex_at_[hidden])
Date: 2011-07-23 13:44:31


>> Actually, I'm thinking that the fix may be easier than I thought after
>> all -
>> if I add a 2-arg "range-checked" constructor as an overload, then the
>> iterator's constructor can validate the end-points of the underlying
>> sequence during construction,
>
> Doesn't that make construction of an iterator over N bytes an O(N)
> operation?

No, it only checks that each *end* of the sequence contains a valid
multibyte sequence - effectly these can then act as sentinels - if there are
invalid sequences within the range (not at the endpoints) then we can catch
these anyway already.

>> and there's no need to otherwise change the implementation or add
>> overhead by checking every increment/decrement for movement
>> out-of-range because we'll know that it can't happen.
>
> To be precise—I think—it can't happen unless the bytes in the underlying
> buffer are changed after construction. It's not *quite* the same
> guarantee, but it's probably good enough, and maybe even preferable.
>
> I would suggest that anyone needing the other kind of check adapt an
> underlying iterator that contains the check.

Changing the underlying bytes after construction of the adapters is a big
no-no anyway. It's an invariant that the adapter must always point between
two multibyte sequences, and never be left stranded in the middle of one,
that could be broken if we allowed the underlying sequence to change at
arbitrary moments in time.

John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk