Boost logo

Boost Users :

From: llwaeva_at_[hidden]
Date: 2006-07-30 04:58:31


Hi there,
  I wrote a piece of code to handle MIME header based on boost. The program subsitite the MIME
header (from, to, subject, etc.) into a template with specific macro. Because the macro may
occur everywhere in the template once or more, so I apply replace_all_regex. First of all, I
need to transform the substitute string to escape special characters

// input is a substitute string that will replace the macro
string& escape_format( string& input )
{
  replace_all( input, "$", "$$" );
  replace_all( input, "\\", "\\\\" );
  replace_all( input, ":", "\\:" );
  replace_all( input, "?", "\\?" );
  replace_all( input, "(", "\\(" );
  replace_all( input, ")", "\\)" );
  return input;
}

There are some macros definied, some of them are listed below
%s represents the subject (e.g. Re: hi philip!)
%S represents the subject without prefix "Re:" (e.g hi philip!)
%f represents the "form" (e.g. tom_at_[hidden])
%t represents the "to" (e.g. philip_at_[hidden])
...

With regex, I can replace all the macros with the help of format, something like
replace_all_regex( source, boost::regex("(%f)|(%t)|(%s)|(%S)"), "(?1 xxx)(?2 yyy)(?3 aaa)(?4 bbb)",
format_all );

In order to get the subject without prefix "Re:", I use erase_regex as follow

string& deRe(string& input)
{
  boost::regex regexp("^[[:space:]]*Re:[[:space:]]*");
  erase_regex( input, regexp );
  return input;
}

So simple!!! However, the order and number of macro will be changed sometimes. So I need a
better way to build regex and format. I've got a function to do that

// rformat is a stringstream for building a format
// text is a substitute string
// re_text is a string for building a regular expression
// re is a string or regular expression, i.e. the macro
// format_num is an global integer to hold the current number of format items (initialize to zero)
void format_text(stringstream& rformat, string& text, string& re_text, const string& re)
{
  format_num++;
  rformat << "(?" << format_num << " " << escape_format(text) << ")";
  if (format_num>1) re_text += "|";
  re_text += "(" + re + ")";
}

Now, it's ready to build the regular expression and subsitiude formation dynamically. Here is an example,

string regex_text;
stringstream reformat;
string source; // source is a string containing text and macros

// spHeader is char* which is obtained from MIME parser.
//
if (spHeader)
{
  format_text( reformat, string(spHeader), regex_text, "%s" ); // for %s
  format_text( reformat, deRe(string(spHeader)), regex_text, "%S" ); // for %S

  // Suppose spHeader is "Re: hi philp!", then
  // regex_text = "(%s)|(%S)"
  // reformat.str() = "(?1 Re: hi philip!)(?2 hi philip!)"
  replace_all_regex( source, boost::regex(regex_text), reformat.str(), format_all );
}

Everything is fine except that there are some meaningless characters at the end of the output
string (still source). By checking the code carefully, I found that the problem maybe caused by
the format_text for %S. Hiding the second format_text solve the problem. However, just hiding
the first format_text also solve the problem. It is confusing me! Someone please give me a direction.

Thanks.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net