|
Boost Users : |
Subject: [Boost-users] [Regex] Accessing match count
From: ninti_at_[hidden]
Date: 2010-03-10 08:01:25
Newbie here trying to reimplement PHP's preg_match_all() function in C++. I'm
rewriting a PHP class which functions as a Porter Stemmer. I seem to have the
regex_replace code working nicely but am now stuck on regex_match/regex_search. I
want to count the number of times a regex matches within a string.
The regex attempts to match all occurrences of one+ vowels followed by one+
consonants (known as 'm' or measure, roughly equivalent to a syllable).
The original PHP function is as follows:
private static function m($str)
{
$c = '(?:[bcdfghjklmnpqrstvwxz]|(?<=[aeiou])y|^y)';
$v = '(?:[aeiou]|(?<![aeiou])y)';
$str = preg_replace("#^$c+#", '', $str);
$str = preg_replace("#$v+$#", '', $str);
preg_match_all("#($v+$c+)#", $str, $matches);
return count($matches[1]);
}
So all I'm interested in here is the number of matches. I'm afraid I'm not even
really sure which boost regex function I should be using, it seems like
regex_search may be the correct choice as I'm not matching a whole string. This
is what I've got so far:
using namespace std;
using namespace boost;
string regex_vowel = "(?:[aeiou]|(?<![aeiou])y)";
string regex_consonant = "(?:[bcdfghjklmnpqrstvwxz]|(?<=[aeiou])y|^y)";
string get_m(string word)
{
regex c("^" + regex_consonant);
regex v(regex_vowel + "$");
string replacement = "";
word = regex_replace(word, c, replacement);
word = regex_replace(word, v, replacement);
[ ... so far so good, this does the same as the first four lines of the PHP code
above ... ]
string re = "(" + regex_vowel + "+" + regex_consonant + "+)";
regex expression(re);
string::const_iterator start, end;
start = word.begin();
end = word.end();
match_results<string::const_iterator> what;
match_flag_type flags = match_default | match_partial;
regex_search(start, end, what, expression, flags);
string thing = what[1];
int size = thing.size();
cout << what[1] << endl;
string s;
stringstream out;
out << size;
s = out.str();
return s;
}
This does not replicate the last two lines of PHP code above, instead returning
the number of characters in a substring of sorts. So yes, I'm lost! Have tried
several variations/possibilities but can't find a good example to work with. Have
tried some much simpler variations they're not working either.
TIA for any tips, even on which regex function I should be using.
Mick
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net