Boost logo

Boost :

From: John Maddock (john_at_[hidden])
Date: 2004-04-06 05:59:00


> I was just writing up a simple tutorial example; finding the subject
> in a set of email headers. Here's what I got:
>
> std::string line;
> boost::regex pat("^Subject: (Re: )?(.*)");
> boost::smatch matches;
>
> while (std::cin)
> {
> std::getline(std::cin, line);
> if (boost::regex_match(line,matches, pat))
> std::cout << matches[2];
> }
>
> 1. There's no way to search a stream for a match because a regex
> requires bidirectional iterators, so I have to do this totally
> frustrating line-by-line search. I think Spirit has some kind of
> iterator that turns an input iterator into something forward by
> holding a cache of the data starting with the earliest copy of the
> original iterator. Could something like that be added?

Yes, but it's a more general iterator type rather than just regex specific,
incidentally I also have a use for a "fileview" class which presents a files
contents as a pair of random access iterators. If you want me to provide
these though, you'll need to wait until I've finished the next round of
regex internal changes / refactoring.

> 2. Seems to me that if match objects could be converted to bool, we
> might be able to:
>
> std::string line;
> boost::regex pat("^Subject: (Re: )?(.*)");
>
> while (std::cin)
> {
> std::getline(std::cin, line);
> if (boost::smatch m = boost::regex_match(line, pat))
> std::cout << m[2];
> }
>
> which would be much smoother to the touch. Are match objects
> expensive to construct?

Currently, expensive'ish. Originally these were reference counted, and
cheap to copy, but I ran into problems with thread safety (it's not uncommon
to obtain a match with one thread, then hand off a copy to another thread
for processing). Now that we have a thread safe shared_ptr though I need to
revisit this, it just makes my head hurt trying to analyse concurrent code
:-|

One other thing - the current regex_match overload that doesn't take a
match_results as a parameter currently returns bool - the intent is that if
the user doesn't need the info generated in the match_results, then some
time can be saved by not storing it. Boost.Regex doesn't currently take
advantage of that, but I was planning to in the next revision (basically you
can cut out memory allocation altogether, and that's an order or magnitude
saving).

> >> 2. Seems to me that if match objects could be converted to bool, we
> >> might be able to:
> >
> > I can only second that, I am currently using my own regex library
> > (some of my reasoning to be found in this c.l.c++.m thread:
> > <http://tinyurl.com/2xnbd>), here I also allow implicit conversion to
> > the iterator type, which allow code like:
> >
> > iterator it = regex:find(first, last, ptrn);
> >
> > Although I already did propose it for boost, but was told that it
> > poses a problem with the ambiguity of an "empty" match at the end of
> > the string and "no match at all" -- my argument here is that if one
> > knows that the pattern might generate such a match (and one is
> > interested in knowing about it), one just declares the result to be
> > the match object. The former generally allows to code w/o all those
> > if's to see if something was actually matched -- at least it has made
> > much of my code simpler/shorter.
>
> Sounds good to me. John?

So we make match_results implicitly convertible to it's iterator type? I'm
not necessarily against that, but there are dangers: mainly as Alan stated
that you can easily miss corner cases (when the regex matches a zero-length
string).

John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk