|
Boost Users : |
Subject: [Boost-users] [Regex] Some help with regular expression
From: Etienne Philip Pretorius (icewolfhunter_at_[hidden])
Date: 2009-07-19 19:26:51
Hello List.
I have been trying to get this regular expression to work without any
success. Perhaps if someone could just point out my mistake in the
following regexp:
/* http://www.w3.org/TR/REC-xml/#NT-PITarget */
entity grammar::process_instruction_target(
std::string("(?!(X|x)(M|m)(L|l))(") +
name.str() + std::string(")"));
The 'name' regexp looks like this (Tested and working):
/* http://www.w3.org/TR/REC-xml/#NT-Name */
entity grammar::name(std::string("(") + name_start_character.str() +
std::string(")(") + name_character.str() +
std::string(")*"));
The others (name_start_character and name_character) are as follows
(UTF8 encoding assumed):
/* http://www.w3.org/TR/REC-xml/#NT-NameChar */
entity grammar::name_character(name_start_character.str() +
std::string("|-|\\.|[0-9]|\\x{C2}\\x{B7}|") +
std::string("\\x{CC}[\\x{80}-\\x{BF}]|") +
std::string("\\x{CD}[\\x{80}-\\x{AF}]|") +
std::string("\\x{E2}(\\x{80}\\x{BF})|(\\x{81}\\x{80})"));
/* http://www.w3.org/TR/REC-xml/#NT-NameStartChar */
entity grammar::name_start_character(
":|[A-Z]|_|[a-z]|"
"(\\x{C3}[\\x{80}-\\x{96}]|[\\x{98}-\\x{B6}]|[\\x{B8}-\\x{BF}])|"
"([\\x{C4}-\\x{CB}][\\x{80}-\\x{BF}])|"
"(\\x{CD}[\\x{B0}-\\x{BD}]|\\x{BF})|"
"([\\x{CE}-\\x{DF}][\\x{80}-\\x{BF}])|"
"(\\x{E0}[\\x{A0}-\\x{BF}][\\x{80}-\\x{BF}])|"
"(\\x{E1}[\\x{80}-\\x{BF}]{2})|"
"(\\x{E2}(\\x{80}[\\x{8C}-\\x{8D}])|"
"(\\x{81}[\\x{B0}-\\x{BF}])|"
"([\\x{82}-\\x{85}][\\x{80}-\\x{BF}])|"
"(\\x{86}[\\x{80}-\\x{8F})|"
"([\\x{B0}-\\x{BE}][\\x{80}-\\x{BF}])|"
"(\\x{BF}[\\x{80}-\\x{AF}]))|"
"([\\x{E3}-\\x{EC}][\\x{80}-\\x{BF}]{2})(?!\\x{E3}\\x{80}{2})|"
"(\\x{ED}[\\x{80}-\\x{9F}][\\x{80}-\\x{BF}])|"
"(\\x{EF}[\\x{A4}-\\x{B6}][\\x{80}-\\x{BF}])|"
"(\\x{EF}\\x{B7}([\\x{80}-\\x{8F}]|[\\x{B0}-\\x{BF}]))|"
"(\\x{EF}[\\x{B8}-\\x{BE}][\\x{80}-\\x{BF}])|"
"(\\x{EF}\\x{BF}[\\x{80}-\\x{BD}])|"
"(\\x{F0}[\\x{90}-\\x{BF}][\\x{80}-\\x{BF}])|"
"([\\x{F1}-\\x{F2}][\\x{80}-\\x{BF}]{3})|"
"(\\x{F3}[\\x{80}-\\x{AF}][\\x{80}-\\x{BF}]{2})");
Any help will be apreciated. You see the 'process_instruction_target'
regexp keeps on matching for xml and any variant thereof... while it
should match all except for the variants of xml.
Thank you,
Etienne
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net