Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2004-04-06 13:40:54

"John Maddock" <john_at_[hidden]> writes:

>> 1. There's no way to search a stream for a match because a regex
>> requires bidirectional iterators, so I have to do this totally
>> frustrating line-by-line search. I think Spirit has some kind of
>> iterator that turns an input iterator into something forward by
>> holding a cache of the data starting with the earliest copy of the
>> original iterator. Could something like that be added?
> Yes, but it's a more general iterator type rather than just regex
> specific,

Of course.

> incidentally I also have a use for a "fileview" class which presents a files
> contents as a pair of random access iterators. If you want me to provide
> these though, you'll need to wait until I've finished the next round of
> regex internal changes / refactoring.

Note that we need some additional guarantees from libraries in order
to use such an iterator. It isn't possible to move such an iterator N
positions forward and then N positions backward unless there's another
iterator pointing at or before the iterator's position in the
sequence. Libraries have to guarantee that they won't do that, or
describe their invalidation expectations.

>> 2. Seems to me that if match objects could be converted to bool, we
>> might be able to:
>> std::string line;
>> boost::regex pat("^Subject: (Re: )?(.*)");
>> while (std::cin)
>> {
>> std::getline(std::cin, line);
>> if (boost::smatch m = boost::regex_match(line, pat))
>> std::cout << m[2];
>> }
>> which would be much smoother to the touch. Are match objects
>> expensive to construct?
> Currently, expensive'ish. Originally these were reference counted, and
> cheap to copy

Actually, I was asking about initial construction cost, in particular
of an object representing a failed match. The acceptance of N1610
means that copy costs should be insignificant for cases like this one,
provided that the smatch author puts in the required effort to make it
moveable. ;-)

> but I ran into problems with thread safety (it's not uncommon to
> obtain a match with one thread, then hand off a copy to another
> thread for processing). Now that we have a thread safe shared_ptr
> though I need to revisit this, it just makes my head hurt trying to
> analyse concurrent code :-|

Well if we could solve problem #1, the expense of the initial
construction becomes a non-issue for my case, because I'd only have
to search once. And regardless of all that, often convenience is
*way* more important than efficiency.

That said, as long as the match object is immutable, there's little
to worry about w.r.t. thread safety.

> One other thing - the current regex_match overload that doesn't take
> a match_results as a parameter currently returns bool - the intent
> is that if the user doesn't need the info generated in the
> match_results, then some time can be saved by not storing it.
> Boost.Regex doesn't currently take advantage of that, but I was
> planning to in the next revision (basically you can cut out memory
> allocation altogether, and that's an order or magnitude saving).

But I do need the match results, when the match succeeds.

>> >> 2. Seems to me that if match objects could be converted to bool, we
>> >> might be able to:
>> >
>> > I can only second that, I am currently using my own regex library
>> > (some of my reasoning to be found in this c.l.c++.m thread:
>> > <>), here I also allow implicit conversion to
>> > the iterator type, which allow code like:
>> >
>> > iterator it = regex:find(first, last, ptrn);
>> >
>> > Although I already did propose it for boost, but was told that it
>> > poses a problem with the ambiguity of an "empty" match at the end of
>> > the string and "no match at all" -- my argument here is that if one
>> > knows that the pattern might generate such a match (and one is
>> > interested in knowing about it), one just declares the result to be
>> > the match object. The former generally allows to code w/o all those
>> > if's to see if something was actually matched -- at least it has made
>> > much of my code simpler/shorter.
>> Sounds good to me. John?
> So we make match_results implicitly convertible to it's iterator type? I'm
> not necessarily against that, but there are dangers: mainly as Alan stated
> that you can easily miss corner cases (when the regex matches a zero-length
> string).

I guess my original suggestion of making it implicitly convertible to
some safe_bool solves that problem. I guess I prefer that idea,
though Allan probably has more experience with this than I do.

Dave Abrahams
Boost Consulting

Boost list run by bdawes at, gregod at, cpdaniel at, john at