Boost logo

Boost Users :

From: Paul Davis (pjdavis_at_[hidden])
Date: 2006-08-30 16:02:54


Taking a quick look at the docs, the regex you want is:

"Resurfacing(.*?)Home"

Just a thought. Seems like quite the thread for a regex pattern.

And like John says, it should match from the first Resurfacing to the second
Home. If it didn't, I'd be concerned.

The * operator by itself is greedy. It wants to make matches as long as
possible. By using the *? notation, it makes it a non-greedy modifier, ie,
making the match as short as possible.

http://www.boost.org/libs/regex/doc/syntax_perl.html
Under the heading 'Non greedy repeats' pretty much explains things.

(Note: This applys to the perl style regex, I'm not entirely sure about the
other behaviors.)

Cheers,
Paul

On 8/30/06, John Maddock <john_at_[hidden]> wrote:
>
> kiran wrote:
> > Why is the second one not picked ? This was my question.
>
> It is picked for me: I modified your sample program (see below) so that it
> actually compiled, and didn't reply on external files, and I see exactly
> the
> output expected: everything from the first "Resurfacing" to the last
> "home".
>
> #include "boost/regex.hpp"
> using namespace boost;
> using namespace std;
> #include<fcntl.h>
> #include<sys/types.h>
> #include <iostream>
>
> int main()
> {
> char buf[10000];
> //int fd = open("glass.htm", O_RDONLY);
> //int size = read(fd, buf, 10000);
> string line = "<!-- saved from url=(0022)http://internet.e-mail -->\n"
> "<html><head>\n"
> "<title>UGlassIt Fibre-Shelkote Pool Resurfacing for Swimming
> Pools</title>\n"
> "<meta name=\"robots\" content=\"index,follow\">Home\n"
> "<meta name=\"keywords\" content=\"pool
> Resurfacing,uglassit,fibre-shelkote,Uglassit,Fibre-shelkote,swimming pool
> resurfacing\">Home";
> //close(fd);
> regex expr("Resurfacing(.|\n)*Home" , boost::regex::icase |
> boost::regex::perl);
> try
> {
> sregex_iterator itr(line.begin(), line.end(), expr,
> boost::match_not_dot_newline);
> sregex_iterator i;
> while(itr != i)
> {
> cout<<string((*itr)[0].first, (*itr)[0].second)<<"
> "<<(*itr).position(0)<<endl;
> itr++;
> }
> }
> catch(std::runtime_error e)
> {
> cout<<e.what()<<endl<<flush;
> }
> }
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net