|
Boost Users : |
From: llwaeva_at_[hidden]
Date: 2006-07-30 04:58:31
Hi there,
I wrote a piece of code to handle MIME header based on boost. The program subsitite the MIME
header (from, to, subject, etc.) into a template with specific macro. Because the macro may
occur everywhere in the template once or more, so I apply replace_all_regex. First of all, I
need to transform the substitute string to escape special characters
// input is a substitute string that will replace the macro
string& escape_format( string& input )
{
replace_all( input, "$", "$$" );
replace_all( input, "\\", "\\\\" );
replace_all( input, ":", "\\:" );
replace_all( input, "?", "\\?" );
replace_all( input, "(", "\\(" );
replace_all( input, ")", "\\)" );
return input;
}
There are some macros definied, some of them are listed below
%s represents the subject (e.g. Re: hi philip!)
%S represents the subject without prefix "Re:" (e.g hi philip!)
%f represents the "form" (e.g. tom_at_[hidden])
%t represents the "to" (e.g. philip_at_[hidden])
...
With regex, I can replace all the macros with the help of format, something like
replace_all_regex( source, boost::regex("(%f)|(%t)|(%s)|(%S)"), "(?1 xxx)(?2 yyy)(?3 aaa)(?4 bbb)",
format_all );
In order to get the subject without prefix "Re:", I use erase_regex as follow
string& deRe(string& input)
{
boost::regex regexp("^[[:space:]]*Re:[[:space:]]*");
erase_regex( input, regexp );
return input;
}
So simple!!! However, the order and number of macro will be changed sometimes. So I need a
better way to build regex and format. I've got a function to do that
// rformat is a stringstream for building a format
// text is a substitute string
// re_text is a string for building a regular expression
// re is a string or regular expression, i.e. the macro
// format_num is an global integer to hold the current number of format items (initialize to zero)
void format_text(stringstream& rformat, string& text, string& re_text, const string& re)
{
format_num++;
rformat << "(?" << format_num << " " << escape_format(text) << ")";
if (format_num>1) re_text += "|";
re_text += "(" + re + ")";
}
Now, it's ready to build the regular expression and subsitiude formation dynamically. Here is an example,
string regex_text;
stringstream reformat;
string source; // source is a string containing text and macros
// spHeader is char* which is obtained from MIME parser.
//
if (spHeader)
{
format_text( reformat, string(spHeader), regex_text, "%s" ); // for %s
format_text( reformat, deRe(string(spHeader)), regex_text, "%S" ); // for %S
// Suppose spHeader is "Re: hi philp!", then
// regex_text = "(%s)|(%S)"
// reformat.str() = "(?1 Re: hi philip!)(?2 hi philip!)"
replace_all_regex( source, boost::regex(regex_text), reformat.str(), format_all );
}
Everything is fine except that there are some meaningless characters at the end of the output
string (still source). By checking the code carefully, I found that the problem maybe caused by
the format_text for %S. Hiding the second format_text solve the problem. However, just hiding
the first format_text also solve the problem. It is confusing me! Someone please give me a direction.
Thanks.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net