Boost logo

Boost Users :

From: Dave DeLong (davedelong_at_[hidden])
Date: 2008-03-12 00:07:45


Hi everyone,

I'm trying to parse an HTML page using the Regex library and am
running in to errors.

In the following snippets, "pageSource" is a string pointer to the
contents of an html file.

This code causes my app to crash:

void Page::removeScriptTags() {
        boost::regex tagRegex("<[sS][cC][rR][iI][pP][tT][\\w\\W]*?>[.]*?</\
\s*?[sS][cC][rR][iI][pP][tT]\\s*?>");
        string replaced = boost::regex_replace(*pageSource, pageSource,
tagRegex, " ", boost::match_default);
        delete pageSource;
        pageSource = new string(replaced);
}

and this code crashes when attempting to destruct "matches":

void Page::findTitleSummary() {
        boost::cmatch matches;
        boost::regex bodyRegex("<[tT][iI][tT][lL][eE][\\w\\W]*?>([^<]*)</\\s*?
[tT][iI][tT][lL][eE]\\s*?>");
        if (boost::regex_search(pageSource->c_str(), matches, bodyRegex)) {
                pageSummary = new string(matches[1]);
                hasFoundSummary = true;
        }
}

What am I missing?

Thanks,

Dave



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net