Boost logo

Boost Users :

From: yg-boost-users_at_[hidden]
Date: 2003-02-18 18:26:59


Hello folks,

        I believe my question applies to regular expression libraries
        generally and not just to regexp++. I want to know if it is
        possible to refer to or use one regular expression within
        another.

        What I wish to do is parse a string of html code for img tags,
        and if they have the clause ALT="whatever" replace the whole
        image tag with 'whatever'.

        So I decided to make regular expression for an image tag which
        had an ALT part, and have a sub-match on the contents of
        quoted part of the ALT.

        (I broke this up a bit, explaining what each part is for
        . . .)

        This will match img tags:

static const boost::regex find_imgs("
  <\\s*img Matches <, 0 or many whitespace, IMG
  \\s+src\\s* Matches 1 or more whitespace, SRC, 0 or many whitespace
  =\\s* Matches = followed by 0 or more whitespace
  \"\\s* Matches " followed by 0 or more whitespace
  ([^\"]*) Matches any number of characters that are not "
  ([^>]*>) Matches any number of chars that not >, followed by a >
  ",
  boost::regbase::normal | boost::regbase::icase);

        Ok, so now look at this one. I'm trying to do the same as
        above except I want to sub-match the quoted part of the alt
        part so I can use it. I can't do "anything except the word
        'alt'" because it will interpret the [^(alt)] as "anything
        except 'a', 'l', or 't'".

static const boost::regex find_imgs_with_alt("
  <\\s*img Matches <, 0 or many whitespace, IMG
  \\s+src\\s* Matches 1 or more whitespace, SRC, 0 or many whitespace
  =\\s* Matches = followed by 0 or more whitespace
  \"\\s* Matches " followed by 0 or more whitespace
  [^\"]* Matches any number of chars not "
  \\s+ Matches 1 or more whitespace
  [^alt]* I would like match anything except the word ALT, but the
              regexp stuff interprets this as anything but 'a', 'l',
              or 't'
  alt\\s*= Matches ALT, 0 or whitespace, =
  \"(^\")\" Matches ", anything except a " as a group that I can
              reference, then another "
  [^>]*> Matches any number of chars not >, followed by a >
  ",
  boost::regbase::normal | boost::regbase::icase);

        So what I want to do is make another regular expression which
        matches "alt", and in the part that says

        [^alt]*

        do instead something like

        [^@alt]*

        where '@' would indicate that 'alt' was the name of another
        regular expression, such as

static const boost::regex alt("alt",
  boost::regbase::normal | boost::regbase::icase);

        I can see how to do what I want to do without this; I would
        get the whole IMG tag and do a separate regexp_search on the
        match. But it seems to make it so much easier if it were
        possible, especially leaving me with fewer lines of regular
        expression code to have bugs in.

        If this is possible I'd like to know. Thanks in advance, and
        I'll post the regular expressions I end up using here if
        anyone might find them of use.

--Rob
        


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net