Boost logo

Boost Users :

Subject: [Boost-users] [Regex] Accessing match count
From: ninti_at_[hidden]
Date: 2010-03-10 08:01:25


Newbie here trying to reimplement PHP's preg_match_all() function in C++. I'm
rewriting a PHP class which functions as a Porter Stemmer. I seem to have the
regex_replace code working nicely but am now stuck on regex_match/regex_search. I
want to count the number of times a regex matches within a string.
The regex attempts to match all occurrences of one+ vowels followed by one+
consonants (known as 'm' or measure, roughly equivalent to a syllable).
 
The original PHP function is as follows:
 
private static function m($str)
{
    $c = '(?:[bcdfghjklmnpqrstvwxz]|(?<=[aeiou])y|^y)';
    $v = '(?:[aeiou]|(?<![aeiou])y)';
 
    $str = preg_replace("#^$c+#", '', $str);
    $str = preg_replace("#$v+$#", '', $str);
     
    preg_match_all("#($v+$c+)#", $str, $matches);
 
    return count($matches[1]);
}
 
So all I'm interested in here is the number of matches. I'm afraid I'm not even
really sure which boost regex function I should be using, it seems like
regex_search may be the correct choice as I'm not matching a whole string. This
is what I've got so far:
 
 
using namespace std;
using namespace boost;
 
string regex_vowel = "(?:[aeiou]|(?<![aeiou])y)";
string regex_consonant = "(?:[bcdfghjklmnpqrstvwxz]|(?<=[aeiou])y|^y)";
 
string get_m(string word)
{
    regex c("^" + regex_consonant);
    regex v(regex_vowel + "$");
    string replacement = "";
    word = regex_replace(word, c, replacement);
    word = regex_replace(word, v, replacement);
 
 
[ ... so far so good, this does the same as the first four lines of the PHP code
above ... ]
 
 
    string re = "(" + regex_vowel + "+" + regex_consonant + "+)";
    regex expression(re);
 
    string::const_iterator start, end;
    start = word.begin();
    end = word.end();
    match_results<string::const_iterator> what;
    match_flag_type flags = match_default | match_partial;
    regex_search(start, end, what, expression, flags);
    string thing = what[1];
    int size = thing.size();
 
    cout << what[1] << endl;
    string s;
    stringstream out;
    out << size;
    s = out.str();
 
    return s;
}
 
This does not replicate the last two lines of PHP code above, instead returning
the number of characters in a substring of sorts. So yes, I'm lost! Have tried
several variations/possibilities but can't find a good example to work with. Have
tried some much simpler variations they're not working either.
 
TIA for any tips, even on which regex function I should be using.
 
Mick
 
 
 


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net