regular expression too complex

Hello, I'm generating a complex regular expression for removing word of textes, but now I get this exception: terminate called after throwing an instance of 'boost ::exception_detail ::clone_impl <boost::exception_detail::error_info_injector<std::runtime_error> >' what(): The complexity of matching the regular expression exceeded predefined bounds. Try refactoring the regular expression to make each choice made by the state machine unambiguous. This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate. Is there a solution to create any removing operation? I have got a vector with strings and I must remove each element on the texts, so I create a regular expression with "or" and case-insensitive search ans use regex_replace to remove the words Thanks Phil

Is there a solution to create any removing operation? I have got a vector with strings and I must remove each element on the texts, so I create a regular expression with "or" and case-insensitive search ans use regex_replace to remove the words
If you care about performance, write your own matching routine. I'd build a tree/forest of chars from your matching words, one pointer goes through original string, N pointers may follow the matching tree, the original text gets copied char by char (one pass), the matching pointers runs on tree/forest, if one of the matching pointers goes through, you have the match and move the writing output pointer back. The details like longest match or encoding support are up to you how to handle. Should be much faster than general regexp. -- Slava

Am 24.05.2011 um 09:02 schrieb Viatcheslav.Sysoltsev@h-d-gmbh.de:
Is there a solution to create any removing operation? I have got a vector with strings and I must remove each element on the texts, so I create a regular expression with "or" and case-insensitive search ans use regex_replace to remove the words
If you care about performance, write your own matching routine. I'd build a tree/forest of chars from your matching words, one pointer goes through original string, N pointers may follow the matching tree, the original text gets copied char by char (one pass), the matching pointers runs on tree/forest, if one of the matching pointers goes through, you have the match and move the writing output pointer back. The details like longest match or encoding support are up to you how to handle. Should be much faster than general regexp.
Performance is not my primary aspect. I would like to use a component that can do this, because the remove only runs one time. Is there a framework of the Boost that I can use like state machines or anything else? But the idea with tree / forest is very nice Thx Phil

On 5/24/2011 5:13 PM, Alexander Mingalev wrote:
On 24.05.2011 12:58, Kraus Philipp wrote:
Performance is not my primary aspect. I would like to use a component that can do this, because the remove only runs one time.
Maybe, Boost.Xpressive will work for you.
Yes, it can. Static xpressive has "symbol tables" (a search trie). You put all the string into a std::map. It'd look something like this: #include <string> #include <iostream> #include <boost/xpressive/xpressive_static.hpp> #include <boost/xpressive/regex_actions.hpp> using namespace boost::xpressive; int main() { std::map<std::string, char const *> number_map; number_map["one"] = "1"; number_map["two"] = "2"; number_map["three"] = "3"; local<std::string> sub; sregex rx = icase(a1 = number_map)[sub = a1]; std::string input = "This ONE has tWo some thrEE strings"; input = regex_replace(input, rx, sub); std::cout << input << '\n'; } The above program displays the following: This 1 has 2 some 3 strings It should be pretty quick, too. -- Eric Niebler BoostPro Computing http://www.boostpro.com
participants (4)
-
Alexander Mingalev
-
Eric Niebler
-
Kraus Philipp
-
Viatcheslav.Sysoltsev@h-d-gmbh.de