From: John Maddock (john_maddock_at_[hidden])
Date: 2002-11-03 06:46:06
> Working with the boost.regex component, I have
> noticed that regex_merge requires the input iterator
> to have a -- operation.
> This unfortunately breaks compatiblity with the
> std::istream_iterator and disallows a simple cin to
> regex to cout stream.
> Attached is a simple sed look-alike program that
> will not compile due to this iterator requirement. It
> also contains a commented out workaround that copies
> cin.rdbuf() to memory.
> I tried to remove the -- requirement in the code and
> was unable to. I found some of it quite confusing.
> The errors mostly came from
> Has anyone else had this problem? Could anyone give
> me some more information as to why the -- operator is
> used, if it is necessary, and ultimately how to change
> the regex code to not use it?
Ultimately you can't use a regex with an input iterator, here's the
rationale I included with the standardisation proposal:
While all the algorithms discussed so far are iterator-based, no mention has
been made so far of iterator requirements. The following options are
available along with their pro's and con's:
InputIterator: this is discounted as an option, input iterators can not
indicate what matched (as many of the algorithms must do), nor can they be
used to match back-references.
ForwardIterator: this is a real option, forward iterators can be used to
indicate what matched. In addition there is in practice a real world use
case: multibyte character sequences can be converted to a single wide
character "on the fly" using a forward iterator adapter. Note that
multi-byte sequences can not usually be represented by a bi-directional or
random access iterator. There are downsides as well: requiring forward
iterator support places a burden on the implementation, in particular
matching boundary conditions (such as ^ or $) require that the
implementation remembers the last character visited as well as the current
position. In addition perl-style backtracking algorithms become much less
efficient for single character repeats (a very common construct), since
every backtrack-position has to be saved, rather than computed by
decrementing an iterator. There are also some perl features (zero width
look-behind assertions) that can not be matched using forward iterators,
however these features are not required by the ECMAScript standard. It is
unclear just how much of a burden forward iterator support would be in
practice; some experimental evidence would be of help here.
BidirectionalIterator: This requirement is much easier to implement, but it
offers no real benefits.
RandomAccessIterator: As with bidirectional iterators, this requirement is
easy to implement, it also offers the potential for significant performance
gains: for POSIX style matches, computing "leftmost longest" matches is much
more efficient, while for perl style matches, skipping through greedy ".*"
repeats are much more efficient, and for some expressions Boyer-Moore like
search heuristics are possible.
Of the two template libraries discussed here, Boost requires bidirectional
iterators, while GRETA doesn't document it's requirements (but appears to
also require bidirectional iterators). In practice there are only two
requirements worth considering - random access iterators (on grounds of
efficiency), or forward iterators (on grounds of user friendliness).
Experience with Boost suggests that the bidirectional iterator requirement
option offers only the disadvantages of random access iterators, but without
their potential performance gains. In addition, practice suggests that there
are no containers that are likely to be used with character strings that
have a bidirectional interface, rather than a random access one. This
proposal currently documents that all iterators must be random access, this
can be tightened to a requirement to support forward iterators if practice
shows that this is indeed possible (in the meantime implementations should
be encouraged to support the widest range of iterator types possible)."
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk