Boost logo

Boost Users :

From: yg-boost-users_at_[hidden]
Date: 2003-02-22 02:17:32


Thanks for all the responses, and I am reading them with interest.
Meanwhile, I thought I would post the way I am currently solving the
specific problem of finding all HTML img tags in a string and
replacing them with the value of the alt attribute.

Instead of trying to match "not the word alt" I matched anything (
".*" ) and then the word alt.

static const boost::regex
  find_imgs_with_alt("<\\s*" // matches < followed by 0 ore more
                                  // whitespace
                     "img\\s+" // matches IMG followed by
                                  // at least 1 whitespace
                     ".*" // any number of stuff
                     "alt\\s*" // ALT, 0 or more whitespace
                     "=\\s*\"" // =, 0 or more whitespace, and a quote
                     "([^\"]*)\"" // any number of non-quotes in a
                                  // sub-match, then a quote
                     "[^>]*>", // anything not >, then a >
   boost::regbase::normal | boost::regbase::icase);

string tmp; // This is what holds the html; just pretend it is
// filled, that code doesn't matter here
    
match_results<string::iterator> img; // will hold the whole img tag
string::iterator b = tmp.begin();
string::iterator e = tmp.end();

while (regex_search(b,e, img, find_imgs_with_alt, flags)) {
  string img_str = string(img[0].first, img[0].second); // string of
  // whole img tag-- printed out for debugging, not really used

  string alt_contents = string(img[1].first, img[1].second);

  // For some reason I have to do these erases in this order, if I
  // flip them it doesn't work

  // Erases everything after the contents of the alt attribute, to
  // the end of the img tag, and then erases from the front to the
  // begining of the contents of the alt attribute
  tmp.erase(img[1].second,img[0].second);
  tmp.erase(img[0].first,img[1].first);

  b = img[0].second; // This is what makes the regexp_search call in
                      // the while loop go on to the rest of the string
  flags |= boost::match_prev_avail; // These two additions to flags
                                     // make the rest of the search
                                     // faster, but you can't use them
                                     // the first time.
  flags |= boost::match_not_bob;
}

Maybe that will be useful to someone.

Back to the issue of if there should be a way to refer to one regular
expression within another. I have read John Maddock's and Edward
Dienar's posts.

I am trying to come up with a good example of a way that using a named
regular expression inside another would be any more complicated than a
string replacement that could be done in pre-processor language. If
the refered-to expression had sub-matches labled, then the person
writing the larger expression might mis-count what sub-match he wanted
to index.

It might be a good first step to collect a sampling of regex problems
that are easy to write understandably and bug-free with the ability to
refer to other expressions, and hard to write without that ability.
Then perhaps someone will understand how to properly craft what we
want.

--Rob


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net