Boost logo

Boost Users :

Subject: [Boost-users] [Regex] Major performance difference between Boost.Regex and Linux regex library
From: Kieran O'Donohoe (kodonohoe_at_[hidden])
Date: 2010-04-04 07:58:48


Hi,
I am just starting to use Boost.Regex, porting from the Linux system regex library to Boost.Regex, and after doing initial testing on my first change I see a major performance impact, where Boost.Regex is about 30 times slower than the system regex library.

 

The Boost.Regex documentation seems to imply that its performance has a positive comparsion with existing libraries so I expect that I am doing something wrong but can't see it.

 

My regex strings are relatively short normal strings, e.g. "Authentication-Info", that is they contain no RE like syntax.

 

The str passed to regex_search() (also tried regex_match()) can be an exact match to the regex, but may differ by case or surrounded by white space.

 

Boost.Regex is used as follows:

 

// I have a heap pointer to 44 boost::regex objects, all stored in a list
// also associated with each boost::regex object is a function unique to each
boost::regex* myRegex = new boost::regex("Authentication-Info", boost::regex::icase|boost::regex::nosubs);
...

/* boost::regex_search() is called in a loop on the above list of boost::regex objects, for each test iteration there are 9 different values of name, all of which have a match in the above list, that is boost::regex_search() can be called up 396 times per test iteration, I performed a 1000 iteration test.
*/

// const char* name could be " authentication-info " and should match positively (and does) with the regex
if(boost::regex_search(name, *myRegex, boost::match_nosubs)==true) {
    // do stuff on match - that is call associated function
    // then exit loop
}

I have also tried flags other than does listed, all with the same result.

 

I put the test through a profiler and see that calls on match_results and sub_match are called a number of times (over 9 million) which makes me pretty sure that this is where the problem is, I don't need any match_results/sub_match, I just need validation that the string exists. Obviously a match needs to be made, but approx 51 match_results object constructions per search seems over the top.

 

Fyi, here is an entry from the profiler output calling boost::regex_search() 186,000 times, which is expected:

[5] 94.9 0.00 2.46 186000 bool boost::regex_search<char const*, char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >(char const*, char const*, boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > > const&, boost::regex_constants::_match_flags) [5]

 

But this entry where boost::match_results::match_results() is called 9,486,000 times is unexpected:

 

[9] 35.3 0.18 0.73 9486000 boost::match_results<char const*, std::allocator<boost::sub_match<char const*> > >::match_results(std::allocator<boost::sub_match<char const*> > const&) [9]

The same code executes 30 times faster if I replace new boost::regex(...) with a call to regcomp(&m_preg, "Authentication-Info", REG_EXTENDED | REG_ICASE) where m_preg is wrapped in a UDT that is heap allocated and likewise boost::regex_search(..) is replaced with a call to regexec() where nmatch is 0.

 

 
                                               
_________________________________________________________________
Hotmail: Trusted email with powerful SPAM protection.
https://signup.live.com/signup.aspx?id=60969



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net