Boost logo

Boost Users :

From: John Maddock (john_at_[hidden])
Date: 2007-03-13 05:32:02


llwaeva_at_[hidden] wrote:
> hi there,
> I am working with a TAG-oriented text with boost:regex. For example,
> the following pattern might occur in the text
>
> <before> <pre><p>Some Text</p></pre> <after> <pre> ddd </pre>
>
> In this case, I would like to extract everything between <pre> </pre>.
> Meanwhile, everything outside <pre> </pre> should be unchanged except
> that < is replaced by &lt; and > is replaced by &gt;
>
> For that purpose, I tried the following code

I don't see anything obviously too wrong based on a quick glance except that
\s* should be \\s*.

If that doesn't fix things, post a self contained test case and I'll take a
look.

> In a far more complicated case, a nested <pre></pre> might occur as
> follow
>
> <before> <pre><pre><p>Some Text</p></pre></pre> <after> <pre> ddd
> </pre>
>
> For this case, I only want to handle the outermost <pre></pre> and
> keep everything inside it unchanged, i.e., the inner <pre></pre> will
> be extracted as common text.

Hmmmm, traditional regexes don't handle that all that well, how deep will
the nesting go? You handle a finite number of nested occurences using
something like:

<\s*pre[^>]*>(<\s*pre[^>]*>.*?</\s*pre\s*>|.)*?</\s*pre\s*>

and so on, but remember to double those \'s if you embed this in a C++
string.

John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net