Boost logo

Boost Users :

From: llwaeva_at_[hidden]
Date: 2006-07-27 23:36:22


on 2006-7-28 1:42:39, "John Maddock" <john_at_[hidden]> wrote:
> llwaeva_at_[hidden] wrote:
> > Hi there,
> > I am using regex_replace for string replacing. I am working with a
> > very large string (about 3M) and need to do the following replacment
> >
> > %R% --> #R#
> > %A% --> #R1#
> > %C% --> #R2#
> >
> > and so on.
> >
> > I am new to regular express and regex. After having read the
> > document, I
> > know how regex works, but still not sure if regex_replace meet my
> > demand
> >
> > 1) I invoke regex_replace several times to replace difference
> > patterns.
> > For a large input string, the operation is very slow. Can I call
> > regex_replace once to do the several replacments. i.e. calling
> > regex_replace( output, begin, end, "%R ,%A ,%C", "#R#, #R1#, #R2#",
> > match_default); for replacing
> > %R% with #R#
> > %A% with #R1#
> > %C% with #R2#
>
> You can, use something like:
>
> "(%R%)|(%A%)|(%C%)"
>
> as the regex, and then use the replace string:
>
> "(?1#R#)(?2#R1#)(?3#R2#)"
>
> with the flag "formal_all" set (this enables Boost-specific format-string
> extensions that allow you to do conditional search and replace like this).
>
>
> > 2) According to the document, the output fashion depends on the
> > match_flag_type,
> > for my case, I hope the output string, including both matched and
> > nonmatched part, replace the whole input string. e.g.
> >
> > SourceStr = "xxxx%R%yyy%A%zzz%C%"
> > Apply regex_replace( output_string, SourceStr.begin, SourceStr.end,
> > ... )
> > We have , output_string = "xxxx#R#yyy#R1#zzz#R2#"
> > rather than #R#, #R1# and #R2#
>
> Use "format_no_copy" to suppress copying unmatched parts to the output.
> Otherwise the default behaviour is to always copy unmatched parts of the
> input to output.
>
> John.
>
Thanks for your help. But the result still not what I want. Here is my
program and output, please check it for me

  string src = "xxx%R%__xy\r\n%\r\nRyyyy%A__%A\r\n%R%zzz%C%%A%ppp_%C%0123\r\n%R%ooo";
  string output=src;
  re = "(%R%)|(%A%)|(%C%)";
  string format = "(?1#R#)(?2#A#)(?3#C#)";

  cout << "SOURCE:" << endl << src << endl << endl;
  regex_replace( output.begin(), src.begin(), src.end(), re, format, format_all | format_no_copy );
  cout << "OUTPUT:" << endl << output << endl << endl;
  cout << "SRC:" << endl << src << endl;

The output is

SOURCE:
xxx%R%__xy
%
Ryyyy% A__%A
%R%zzz%C%%A%ppp_%C%0123
%R%ooo

OUTPUT:
#R##R##C##A##C##R#yy% A__%A
%R%zzz%C%%A%ppp_%C%0123
%R%ooo

SRC:
xxx%R%__xy
%
Ryyyy% A__%A
%R%zzz%C%%A%ppp_%C%0123
%R%ooo

It is not what I want. I hope the output with matching pattern replaced
also copy back to the input string

For that purpose, I modify the program like

 string src = "xxx%R%__xy\r\n%\r\nRyyyy%A__%A\r\n%R%zzz%C%%A%ppp_%C%0123\r\n%R%ooo";
 re = "(%R%)|(%A%)|(%C%)";
 format = "(?1#R#)(?2#A#)(?3#C#)";
 cout << "SOURCE:" << endl << src << endl << endl;
 regex_replace( src.begin(), src.begin(), src.end(), re, format, format_all);
 cout << "OUTPUT:" << endl << src << endl;

Now, the output with the mataching pattern pattern replaced will exactly
copy to the input string, . i.e. after having replaced src, src became

src =
xxx#R#__xy
%
Ryyyy% A__%A
#R#zzz#C##A#ppp_#C#0123
#R#ooo

NOTICE that in above code, there is NO format_no_copy flag!

Thanks.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net