Boost Users :

Date view	Thread view	Subject view	Author view

From: John Maddock (john_at_[hidden])
Date: 2007-03-13 05:32:02

Next message: Misiu: "Re: [Boost-users] boost::ref and boost:::function"
Previous message: Joaquín Mª López Muñoz: "[Boost-users] [boost] Review of Intrusive library begins today March 12"
In reply to: llwaeva_at_[hidden]: "[Boost-users] help extracting TAG with boost::regex"

llwaeva_at_[hidden] wrote:
> hi there,
> I am working with a TAG-oriented text with boost:regex. For example,
> the following pattern might occur in the text
>
> <before> <pre><p>Some Text</p></pre> <after> <pre> ddd </pre>
>
> In this case, I would like to extract everything between <pre> </pre>.
> Meanwhile, everything outside <pre> </pre> should be unchanged except
> that < is replaced by < and > is replaced by >
>
> For that purpose, I tried the following code

I don't see anything obviously too wrong based on a quick glance except that
\s* should be \\s*.

If that doesn't fix things, post a self contained test case and I'll take a
look.

> In a far more complicated case, a nested <pre></pre> might occur as
> follow
>
> <before> <pre><pre><p>Some Text</p></pre></pre> <after> <pre> ddd
> </pre>
>
> For this case, I only want to handle the outermost <pre></pre> and
> keep everything inside it unchanged, i.e., the inner <pre></pre> will
> be extracted as common text.

Hmmmm, traditional regexes don't handle that all that well, how deep will
the nesting go? You handle a finite number of nested occurences using
something like:

<\s*pre[^>]*>(<\s*pre[^>]*>.*?</\s*pre\s*>|.)*?</\s*pre\s*>

and so on, but remember to double those \'s if you embed this in a C++
string.

John.

Next message: Misiu: "Re: [Boost-users] boost::ref and boost:::function"
Previous message: Joaquín Mª López Muñoz: "[Boost-users] [boost] Review of Intrusive library begins today March 12"
In reply to: llwaeva_at_[hidden]: "[Boost-users] help extracting TAG with boost::regex"

Date view	Thread view	Subject view	Author view

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net