Boost logo

Boost Users :

From: llwaeva_at_[hidden]
Date: 2007-03-12 16:31:17


hi there,
  I am working with a TAG-oriented text with boost:regex. For example,
the following pattern might occur in the text

<before> <pre><p>Some Text</p></pre> <after> <pre> ddd </pre>

In this case, I would like to extract everything between <pre> </pre>.
Meanwhile, everything outside <pre> </pre> should be unchanged except
that < is replaced by &lt; and > is replaced by &gt;

For that purpose, I tried the following code

boost::regex regexp("<\s*pre[^>]*>(.*?)<\s*/pre\s*>", boost::regex::icase);
boost::match_results<std::string::const_iterator> what;
if (regex_search(sometext, what, regexp,boost::match_default|boost::format_first_only))
{
  std::string between_tag = std::string(what[1].first, what[2].second) + "\r\n";
  MessageBox(0, between_tag.c_str(), "", 0);

  std::string left_tag(sometext.begin(), what[1].first);
  std::string right_tag(what[1].second, sometext.end());
  replace_all<string, LPCSTR, LPCSTR>( left_tag, "<", "&lt;" );
  replace_all<string, LPCSTR, LPCSTR>( right_tag, ">", "&gt;");

  sometext = left_tag + between_tag + right_tag + "\r\n";
}

However, the code seems not works propely unless nothing outside <pre> </pre>. In addition, when
the text being handled contains the \r\n, the code return error no matter if left_tag and right_rag is
null or not.

In a far more complicated case, a nested <pre></pre> might occur as follow

<before> <pre><pre><p>Some Text</p></pre></pre> <after> <pre> ddd </pre>

For this case, I only want to handle the outermost <pre></pre> and keep everything inside it unchanged,
i.e., the inner <pre></pre> will be extracted as common text.

Any idea?

Thanks in advance.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net