Hi,
I am just starting to use Boost.Regex, porting from the Linux system regex library to Boost.Regex, and after doing initial testing on my first change I see a major performance impact, where Boost.Regex is about 30 times slower than the system regex library.
 
The Boost.Regex documentation seems to imply that its performance has a positive comparsion with existing libraries so I expect that I am doing something wrong but can't see it.
 
My regex strings are relatively short normal strings, e.g. "Authentication-Info", that is they contain no RE like syntax.
 
The str passed to regex_search() (also tried regex_match()) can be an exact match to the regex, but may differ by case or surrounded by white space.
 
Boost.Regex is used as follows:
 
// I have a heap pointer to 44 boost::regex objects, all stored in a list
// also associated with each boost::regex object is a function unique to each
boost::regex* myRegex = new boost::regex("Authentication-Info", boost::regex::icase|boost::regex::nosubs);
...

/* boost::regex_search() is called in a loop on the above list of boost::regex objects, for each test iteration there are 9 different values of name, all of which have a match in the above list, that is boost::regex_search() can be called up 396 times per test iteration, I performed a 1000 iteration test.
*/
// const char* name could be " authentication-info  " and should match positively (and does) with the regex
if(boost::regex_search(name, *myRegex, boost::match_nosubs)==true) {
    // do stuff on match - that is call associated function
    // then exit loop
}
I have also tried flags other than does listed, all with the same result.
 
I put the test through a profiler and see that calls on match_results and sub_match are called a number of times (over 9 million) which makes me pretty sure that this is where the problem is, I don't need any match_results/sub_match, I just need validation that the string exists. Obviously a match needs to be made, but approx 51 match_results object constructions per search seems over the top. 
 
Fyi, here is an entry from the profiler output calling boost::regex_search() 186,000 times, which is expected:

[5]     94.9    0.00    2.46  186000         bool boost::regex_search<char const*, char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >(char const*, char const*, boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > > const&, boost::regex_constants::_match_flags) [5]
 
But this entry where boost::match_results::match_results() is called 9,486,000 times is unexpected:
 
[9]     35.3    0.18    0.73 9486000         boost::match_results<char const*, std::allocator<boost::sub_match<char const*> > >::match_results(std::allocator<boost::sub_match<char const*> > const&) [9]

The same code executes 30 times faster if I replace new boost::regex(...) with a call to regcomp(&m_preg, "Authentication-Info", REG_EXTENDED | REG_ICASE) where m_preg is wrapped in a UDT that is heap allocated and likewise boost::regex_search(..) is replaced with a call to regexec() where nmatch is 0.
 
 


Hotmail: Trusted email with powerful SPAM protection. Sign up now.