Boost logo

Boost Users :

From: John Maddock (john_at_[hidden])
Date: 2006-08-29 08:19:47


david v wrote:
> Hello,
> I'm using the boost regex library to search short expressions in large
> strings. The code below works fine but it almost does what ever i
> want. The thing i'm missing and that i'm not sure to handle are the
> mismatches. Basically when i look for a regex in a string i want to
> have the possibility to get the number of mismatches and their
> respective positions. I read through the docs and saw that the
> boost_check may do the work. Well the think is that i'm not sure how
> to use it with my code and since the boost library is quite complex
> it would be great if somebody could help to move forward.
>
> How can i use the code below to get the mismatches and their
> positions with boost_check ???

What's boost_check and what's it got to do with regex?

> #####Setting the regex
> .....
> re = "test";
> boost::regex *regex = new boost::regex(re,boost::regbase::icase);
> .....
>
> #######Getting the matched
> .......
> std::string::const_iterator start = in.begin();
> std::string::const_iterator end = in.end();
> boost::smatch match;
> while
> (boost::regex_search(start,end,match,*regex,boost::match_extra))
> {
> cout << match.[0] << endl;
> start = match[0].second;
> }
>
> #############Getting the mismatches ???
> How ???
>
> Any help would be extemely helpful !!!

What do you mean by mismatches? The sections of the string that didn't
match? That would be:

std::string::const_iterator start = in.begin();
std::string::const_iterator end = in.end();
boost::smatch match;
while(boost::regex_search(start,end,match,*regex))
{
   cout << match.prefix() << endl;
   start = match[0].second;
}
// avoid a fencepost error:
cout << match.suffix() << endl;

Note that the match_extra flag is unnecessary unless you really need to
record repeated captures. Normal capture information - of the kind you get
from Perl and other regex engines don't need this - and it slows down
matching as well as requiring building the lib in a non-default mode.

You can also do this rather easier with regex_token_iterator:

std::string::const_iterator start = in.begin();
std::string::const_iterator end = in.end();
boost::sregex_token_iterator i(start, end, *regex, -1), j;
while(i != j)
{
   cout << *i << endl;
}

HTH, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net