|
Boost Users : |
From: Dave DeLong (davedelong_at_[hidden])
Date: 2008-03-12 00:07:45
Hi everyone,
I'm trying to parse an HTML page using the Regex library and am
running in to errors.
In the following snippets, "pageSource" is a string pointer to the
contents of an html file.
This code causes my app to crash:
void Page::removeScriptTags() {
boost::regex tagRegex("<[sS][cC][rR][iI][pP][tT][\\w\\W]*?>[.]*?</\
\s*?[sS][cC][rR][iI][pP][tT]\\s*?>");
string replaced = boost::regex_replace(*pageSource, pageSource,
tagRegex, " ", boost::match_default);
delete pageSource;
pageSource = new string(replaced);
}
and this code crashes when attempting to destruct "matches":
void Page::findTitleSummary() {
boost::cmatch matches;
boost::regex bodyRegex("<[tT][iI][tT][lL][eE][\\w\\W]*?>([^<]*)</\\s*?
[tT][iI][tT][lL][eE]\\s*?>");
if (boost::regex_search(pageSource->c_str(), matches, bodyRegex)) {
pageSummary = new string(matches[1]);
hasFoundSummary = true;
}
}
What am I missing?
Thanks,
Dave
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net