Taking a quick look at the docs, the regex you want is:

"Resurfacing(.*?)Home"

Just a thought.  Seems like quite the thread for a regex pattern.

And like John says, it should match from the first Resurfacing to the second Home.  If it didn't, I'd be concerned.

The * operator by itself is greedy.  It wants to make matches as long as possible. By using the *? notation, it makes it a non-greedy modifier, ie, making the match as short as possible.

http://www.boost.org/libs/regex/doc/syntax_perl.html
Under the heading 'Non greedy repeats' pretty much explains things.

(Note: This applys to the perl style regex, I'm not entirely sure about the other behaviors.)

Cheers,
Paul


On 8/30/06, John Maddock <john@johnmaddock.co.uk> wrote:
kiran wrote:
> Why is the second one not picked ? This was my question.

It is picked for me: I modified your sample program (see below) so that it
actually compiled, and didn't reply on external files, and I see exactly the
output expected: everything from the first "Resurfacing" to the last "home".

#include "boost/regex.hpp"
using namespace boost;
using namespace std;
#include<fcntl.h>
#include<sys/types.h>
#include <iostream>

int main()
{
char buf[10000];
//int fd = open("glass.htm", O_RDONLY);
//int size = read(fd, buf, 10000);
string line = "<!-- saved from url=(0022)http://internet.e-mail -->\n"
"<html><head>\n"
"<title>UGlassIt Fibre-Shelkote Pool Resurfacing for Swimming
Pools</title>\n"
"<meta name=\"robots\" content=\"index,follow\">Home\n"
"<meta name=\"keywords\" content=\"pool
Resurfacing,uglassit,fibre-shelkote,Uglassit,Fibre-shelkote,swimming pool
resurfacing\">Home";
//close(fd);
regex expr("Resurfacing(.|\n)*Home" , boost::regex::icase |
boost::regex::perl);
try
{
sregex_iterator itr(line.begin(), line.end(), expr,
boost::match_not_dot_newline);
sregex_iterator i;
while(itr != i)
{
cout<<string((*itr)[0].first, (*itr)[0].second)<<"
"<<(*itr).position(0)<<endl;
itr++;
}
}
catch(std::runtime_error e)
{
cout<<e.what()<<endl<<flush;
}
}

_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users