|
Boost Users : |
From: llwaeva_at_[hidden]
Date: 2006-07-28 02:23:32
Hi there,
I am using regex_replace to find_replace a pattern. The code is shown below
string src = "xxx%R__xy\r\n%\r\nRyyyy% A__%%A\r\n%Rzzz%C%Appp_%C0123\r\n%Rooo";
re = "(%R)|(%A)|(%C)";
format = "(?1#R)(?2#A)(?3#C)";
cout << "SOURCE:" << endl << src << endl << endl;
regex_replace( src.begin(), src.begin(), src.end(), re, format, format_all );
cout << "OUTPUT:" << endl << src << endl << endl;
1) For replacing %R with #R, %A with #A and %C with #C, and the output string save back to the
source string, the above code do a good job. And the output is
xxx#R__xy
%
Ryyyy% A__%#A
#Rzzz#C#Appp_#C0123
#Rooo
NOTE that %%A is replaced with %#A
2) If I change the format so that the length of format longer than that of subsitute string, e.g.
re = "(%R)|(%A)|(%C)";
format = "(?1#RRR)(?2#AAA)(?3#CCC)";
regex_replace raise an error. I think the error is come from the original string is not long
enough to store the output string. The problem can be solved by the following code
string src = "xxx%R__xy\r\n%\r\nRyyyy% A__%%A\r\n%Rzzz%C%Appp_%C0123\r\n%Rooo";
string output=src;
re = "(%R)|(%A)|(%C)";
format = "(?1#RRR)(?2#AAA)(?3#CCC)";
cout << "SOURCE:" << endl << src << endl << endl;
regex_replace( output.begin(), src.begin(), src.end(), re, format, format_all);
cout << "OUTPUT:" << endl << output << endl << endl;
But I do want the output store in the original string. Reassigning
the source string slove the problem, i.e. src = output, but for a large
input string (for my case , >5M), this way is not that good. I am looking
for a better approach.
BTW, if the length of the output string shorter than the original string,
the output carry some other extra characerts. e.g.
string src = "xxx%Ry%Az%Ce";
string output=src;
re = "(%R)|(%A)|(%C)";
format = "(?1R)(?2A)(?3C)";
The output is "xxxRyAzCe%Ce" where the last %Ce are extra characters
from source string. How can I kill the extra characters?
3) Finally, I will modify the search condition to make sure that only %X rather than %%X (X can
be R, A or C) is replaced. I try the following regular expression
re = "([^%]%R)|([^%]%A)|([^%]%C)";
but it doesn't work properly for my problem. e.g. for src = "xxx%R %%R",
for the format "(?1#R)(?2#A)(?3#C)"; the regular expression will kill the 'x' before %R, i..e.
the output is
"xx#R %%R" rather than "xxx#R %%R"
Please help!
Thanks in advance.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net