|
Boost Users : |
From: yg-boost-users_at_[hidden]
Date: 2003-02-22 02:17:32
Thanks for all the responses, and I am reading them with interest.
Meanwhile, I thought I would post the way I am currently solving the
specific problem of finding all HTML img tags in a string and
replacing them with the value of the alt attribute.
Instead of trying to match "not the word alt" I matched anything (
".*" ) and then the word alt.
static const boost::regex
find_imgs_with_alt("<\\s*" // matches < followed by 0 ore more
// whitespace
"img\\s+" // matches IMG followed by
// at least 1 whitespace
".*" // any number of stuff
"alt\\s*" // ALT, 0 or more whitespace
"=\\s*\"" // =, 0 or more whitespace, and a quote
"([^\"]*)\"" // any number of non-quotes in a
// sub-match, then a quote
"[^>]*>", // anything not >, then a >
boost::regbase::normal | boost::regbase::icase);
string tmp; // This is what holds the html; just pretend it is
// filled, that code doesn't matter here
match_results<string::iterator> img; // will hold the whole img tag
string::iterator b = tmp.begin();
string::iterator e = tmp.end();
while (regex_search(b,e, img, find_imgs_with_alt, flags)) {
string img_str = string(img[0].first, img[0].second); // string of
// whole img tag-- printed out for debugging, not really used
string alt_contents = string(img[1].first, img[1].second);
// For some reason I have to do these erases in this order, if I
// flip them it doesn't work
// Erases everything after the contents of the alt attribute, to
// the end of the img tag, and then erases from the front to the
// begining of the contents of the alt attribute
tmp.erase(img[1].second,img[0].second);
tmp.erase(img[0].first,img[1].first);
b = img[0].second; // This is what makes the regexp_search call in
// the while loop go on to the rest of the string
flags |= boost::match_prev_avail; // These two additions to flags
// make the rest of the search
// faster, but you can't use them
// the first time.
flags |= boost::match_not_bob;
}
Maybe that will be useful to someone.
Back to the issue of if there should be a way to refer to one regular
expression within another. I have read John Maddock's and Edward
Dienar's posts.
I am trying to come up with a good example of a way that using a named
regular expression inside another would be any more complicated than a
string replacement that could be done in pre-processor language. If
the refered-to expression had sub-matches labled, then the person
writing the larger expression might mis-count what sub-match he wanted
to index.
It might be a good first step to collect a sampling of regex problems
that are easy to write understandably and bug-free with the ability to
refer to other expressions, and hard to write without that ability.
Then perhaps someone will understand how to properly craft what we
want.
--Rob
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net