|
Boost Users : |
From: yg-boost-users_at_[hidden]
Date: 2003-02-18 18:26:59
Hello folks,
I believe my question applies to regular expression libraries
generally and not just to regexp++. I want to know if it is
possible to refer to or use one regular expression within
another.
What I wish to do is parse a string of html code for img tags,
and if they have the clause ALT="whatever" replace the whole
image tag with 'whatever'.
So I decided to make regular expression for an image tag which
had an ALT part, and have a sub-match on the contents of
quoted part of the ALT.
(I broke this up a bit, explaining what each part is for
. . .)
This will match img tags:
static const boost::regex find_imgs("
<\\s*img Matches <, 0 or many whitespace, IMG
\\s+src\\s* Matches 1 or more whitespace, SRC, 0 or many whitespace
=\\s* Matches = followed by 0 or more whitespace
\"\\s* Matches " followed by 0 or more whitespace
([^\"]*) Matches any number of characters that are not "
([^>]*>) Matches any number of chars that not >, followed by a >
",
boost::regbase::normal | boost::regbase::icase);
Ok, so now look at this one. I'm trying to do the same as
above except I want to sub-match the quoted part of the alt
part so I can use it. I can't do "anything except the word
'alt'" because it will interpret the [^(alt)] as "anything
except 'a', 'l', or 't'".
static const boost::regex find_imgs_with_alt("
<\\s*img Matches <, 0 or many whitespace, IMG
\\s+src\\s* Matches 1 or more whitespace, SRC, 0 or many whitespace
=\\s* Matches = followed by 0 or more whitespace
\"\\s* Matches " followed by 0 or more whitespace
[^\"]* Matches any number of chars not "
\\s+ Matches 1 or more whitespace
[^alt]* I would like match anything except the word ALT, but the
regexp stuff interprets this as anything but 'a', 'l',
or 't'
alt\\s*= Matches ALT, 0 or whitespace, =
\"(^\")\" Matches ", anything except a " as a group that I can
reference, then another "
[^>]*> Matches any number of chars not >, followed by a >
",
boost::regbase::normal | boost::regbase::icase);
So what I want to do is make another regular expression which
matches "alt", and in the part that says
[^alt]*
do instead something like
[^@alt]*
where '@' would indicate that 'alt' was the name of another
regular expression, such as
static const boost::regex alt("alt",
boost::regbase::normal | boost::regbase::icase);
I can see how to do what I want to do without this; I would
get the whole IMG tag and do a separate regexp_search on the
match. But it seems to make it so much easier if it were
possible, especially leaving me with fewer lines of regular
expression code to have bugs in.
If this is possible I'd like to know. Thanks in advance, and
I'll post the regular expressions I end up using here if
anyone might find them of use.
--Rob
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net