
We have encountered a problem in the regex++ package, which reports having exhausted memory after examining a very short string with a regular expression of only modest complexity. I realize that the documentation for the package does not specify how much memory usage is too much, but since the same combination of regular expression and test string works without problems with a number of other programming toolsets (e.g., Python, Perl, C#) I'm guessing that the maintainers of the package would be interested in tracking down the problem (I would if it were my software). Here's a repro case which boils the problem down to the tiniest example: #include <boost/regex.hpp> int main() { boost::wregex e(L"^[^\\s]( ?([^\\s]+ ?)*[^\\s])?$"); boost::wcmatch m; boost::regex_match(L"codeine phosphate ", m, e); return 0; } I have confirmed that the behavior is present in the most recent version of the Boost code by retrieving and building the latest set of sources from CVS this morning. (I understand the usefulness of having the user perform this check, but making this a requirement, as the web instructions for submitting bug reports do, may be eliminating a substantial number of valuable reports; it took a number of attempts, with the connection to the CVS server hanging several times, before I could even get to the very lengthy build step.) The failure is not triggered if a version of regex_match is used which does not take the match_results argument, but then of course we don't get access to the match results. I'm pretty sure that the expression is boiled down to the least ambiguous form (without changing the semantics). In plain English, it's looking for strings that have no leading or trailing whitespace, and for which any internal whitespace runs are comprised solely of a single blank character. Doesn't seem like a very esoteric pattern. We have reproduced the behavior on Linux and on Windows. Hope this is useful. Feel free to contact me if you need any further information. Bob Kline