
Newbie here trying to reimplement PHP's preg_match_all() function in C++. I'm rewriting a PHP class which functions as a Porter Stemmer. I seem to have the regex_replace code working nicely but am now stuck on regex_match/regex_search. I want to count the number of times a regex matches within a string. The regex attempts to match all occurrences of one+ vowels followed by one+ consonants (known as 'm' or measure, roughly equivalent to a syllable). The original PHP function is as follows: private static function m($str) { $c = '(?:[bcdfghjklmnpqrstvwxz]|(?<=[aeiou])y|^y)'; $v = '(?:[aeiou]|(?<![aeiou])y)'; $str = preg_replace("#^$c+#", '', $str); $str = preg_replace("#$v+$#", '', $str); preg_match_all("#($v+$c+)#", $str, $matches); return count($matches[1]); } So all I'm interested in here is the number of matches. I'm afraid I'm not even really sure which boost regex function I should be using, it seems like regex_search may be the correct choice as I'm not matching a whole string. This is what I've got so far: using namespace std; using namespace boost; string regex_vowel = "(?:[aeiou]|(?<![aeiou])y)"; string regex_consonant = "(?:[bcdfghjklmnpqrstvwxz]|(?<=[aeiou])y|^y)"; string get_m(string word) { regex c("^" + regex_consonant); regex v(regex_vowel + "$"); string replacement = ""; word = regex_replace(word, c, replacement); word = regex_replace(word, v, replacement); [ ... so far so good, this does the same as the first four lines of the PHP code above ... ] string re = "(" + regex_vowel + "+" + regex_consonant + "+)"; regex expression(re); string::const_iterator start, end; start = word.begin(); end = word.end(); match_results<string::const_iterator> what; match_flag_type flags = match_default | match_partial; regex_search(start, end, what, expression, flags); string thing = what[1]; int size = thing.size(); cout << what[1] << endl; string s; stringstream out; out << size; s = out.str(); return s; } This does not replicate the last two lines of PHP code above, instead returning the number of characters in a substring of sorts. So yes, I'm lost! Have tried several variations/possibilities but can't find a good example to work with. Have tried some much simpler variations they're not working either. TIA for any tips, even on which regex function I should be using. Mick
participants (1)
-
nintiļ¼ internode.on.net