Boost logo

Boost Users :

From: John Maddock (john_at_[hidden])
Date: 2006-07-12 12:44:05


Line Oddskool wrote:
> Hi boost.regex gurus,
>
> I'm stuck with a problem dealing with some kind of regex merging
> (using boost 1.33). I don't know if the way I took is viable, so any
> ideas and advice will be appreciated.
>
> To give you some insight, i have a set of a hundred matching (ei) and
> formating (ri) "rules" e.g.
>
> e1 : (a)(?=ll)
> r1 : (?1o)
>
> e1/r1 should mean "matching 'a' of some string like 'all' should be
> replaced by 'o'"
>
> I merge all my e/r into one big regex using regex_merge (for
> performance), so the resulting matching/formating regex is like :
>
> e : e1|e2|...|en
> r: r1r2...rn
>
> I'm getting weird behaviour with this, as the resulting string is
> sometimes filled with sequences like 'u4u5u6u7u8u9u' or other "trash".
>
> So to debug this, I'd like to know which rule (i.e. which ei) matched
> on what part of the string.
>
> I'm unsure if it's possible to get some kind of iterator on the rules
> that have matched using regex_merge ?
>
> I also looked at the match_results returned by the simpler method
> regex_match(), but I can't figure out how to know which part of my
> matching regex matched (i.e. which ei) ?

Unless you really meant it, regex_search would be analogous to regex_replace
(the new name for regex_merge).

The way to find out which sub-expression matched is simply:

match_results<something> what;
...
for(unsigned i = 1; i < what.size(); ++i)
{
  if(what[i].matched)
    std:cout << "sub-expression " << i << " matched " << what[i] <<
std::endl;
}

> Otherwise, is there a way to analyse or dump the matching/replacing
> behaviour of such a complex regex ?

'Fraid not, you would likely be swamped with so much data that it probably
wouldn't be that useful in anycase :-(

You could also try a binary-search-reduction on the problem: split the regex
in two and find which half has the issue, then split again and so on...

HTH,
John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net