Boost logo

Boost Users :

From: llwaeva_at_[hidden]
Date: 2006-07-28 02:23:32


Hi there,
  I am using regex_replace to find_replace a pattern. The code is shown below

  string src = "xxx%R__xy\r\n%\r\nRyyyy% A__%%A\r\n%Rzzz%C%Appp_%C0123\r\n%Rooo";
  re = "(%R)|(%A)|(%C)";
  format = "(?1#R)(?2#A)(?3#C)";
  cout << "SOURCE:" << endl << src << endl << endl;
  regex_replace( src.begin(), src.begin(), src.end(), re, format, format_all );
  cout << "OUTPUT:" << endl << src << endl << endl;

1) For replacing %R with #R, %A with #A and %C with #C, and the output string save back to the
source string, the above code do a good job. And the output is

xxx#R__xy
%
Ryyyy% A__%#A
#Rzzz#C#Appp_#C0123
#Rooo

NOTE that %%A is replaced with %#A

2) If I change the format so that the length of format longer than that of subsitute string, e.g.
  re = "(%R)|(%A)|(%C)";
  format = "(?1#RRR)(?2#AAA)(?3#CCC)";
  
  regex_replace raise an error. I think the error is come from the original string is not long
enough to store the output string. The problem can be solved by the following code

  string src = "xxx%R__xy\r\n%\r\nRyyyy% A__%%A\r\n%Rzzz%C%Appp_%C0123\r\n%Rooo";
  string output=src;
  re = "(%R)|(%A)|(%C)";
  format = "(?1#RRR)(?2#AAA)(?3#CCC)";
  cout << "SOURCE:" << endl << src << endl << endl;
  regex_replace( output.begin(), src.begin(), src.end(), re, format, format_all);
  cout << "OUTPUT:" << endl << output << endl << endl;

 But I do want the output store in the original string. Reassigning
the source string slove the problem, i.e. src = output, but for a large
input string (for my case , >5M), this way is not that good. I am looking
for a better approach.

BTW, if the length of the output string shorter than the original string,
the output carry some other extra characerts. e.g.

  string src = "xxx%Ry%Az%Ce";
  string output=src;
  re = "(%R)|(%A)|(%C)";
  format = "(?1R)(?2A)(?3C)";
 
 The output is "xxxRyAzCe%Ce" where the last %Ce are extra characters
from source string. How can I kill the extra characters?

3) Finally, I will modify the search condition to make sure that only %X rather than %%X (X can
be R, A or C) is replaced. I try the following regular expression

  re = "([^%]%R)|([^%]%A)|([^%]%C)";

but it doesn't work properly for my problem. e.g. for src = "xxx%R %%R",
for the format "(?1#R)(?2#A)(?3#C)"; the regular expression will kill the 'x' before %R, i..e.
the output is

"xx#R %%R" rather than "xxx#R %%R"

Please help!

Thanks in advance.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net