Boost logo

Boost Users :

Subject: [Boost-users] [Regex] Some help with regular expression
From: Etienne Philip Pretorius (icewolfhunter_at_[hidden])
Date: 2009-07-19 19:26:51


Hello List.

I have been trying to get this regular expression to work without any
success. Perhaps if someone could just point out my mistake in the
following regexp:

/* http://www.w3.org/TR/REC-xml/#NT-PITarget */
        entity grammar::process_instruction_target(
                std::string("(?!(X|x)(M|m)(L|l))(") +
                name.str() + std::string(")"));

The 'name' regexp looks like this (Tested and working):
/* http://www.w3.org/TR/REC-xml/#NT-Name */
        entity grammar::name(std::string("(") + name_start_character.str() +
                std::string(")(") + name_character.str() +
std::string(")*"));

The others (name_start_character and name_character) are as follows
(UTF8 encoding assumed):
/* http://www.w3.org/TR/REC-xml/#NT-NameChar */
        entity grammar::name_character(name_start_character.str() +
                std::string("|-|\\.|[0-9]|\\x{C2}\\x{B7}|") +
                std::string("\\x{CC}[\\x{80}-\\x{BF}]|") +
                std::string("\\x{CD}[\\x{80}-\\x{AF}]|") +
                std::string("\\x{E2}(\\x{80}\\x{BF})|(\\x{81}\\x{80})"));

/* http://www.w3.org/TR/REC-xml/#NT-NameStartChar */
        entity grammar::name_start_character(
                ":|[A-Z]|_|[a-z]|"
                
"(\\x{C3}[\\x{80}-\\x{96}]|[\\x{98}-\\x{B6}]|[\\x{B8}-\\x{BF}])|"
                "([\\x{C4}-\\x{CB}][\\x{80}-\\x{BF}])|"
                "(\\x{CD}[\\x{B0}-\\x{BD}]|\\x{BF})|"
                "([\\x{CE}-\\x{DF}][\\x{80}-\\x{BF}])|"
                "(\\x{E0}[\\x{A0}-\\x{BF}][\\x{80}-\\x{BF}])|"
                "(\\x{E1}[\\x{80}-\\x{BF}]{2})|"
                "(\\x{E2}(\\x{80}[\\x{8C}-\\x{8D}])|"
                "(\\x{81}[\\x{B0}-\\x{BF}])|"
                "([\\x{82}-\\x{85}][\\x{80}-\\x{BF}])|"
                "(\\x{86}[\\x{80}-\\x{8F})|"
                "([\\x{B0}-\\x{BE}][\\x{80}-\\x{BF}])|"
                "(\\x{BF}[\\x{80}-\\x{AF}]))|"
                
"([\\x{E3}-\\x{EC}][\\x{80}-\\x{BF}]{2})(?!\\x{E3}\\x{80}{2})|"
                "(\\x{ED}[\\x{80}-\\x{9F}][\\x{80}-\\x{BF}])|"
                "(\\x{EF}[\\x{A4}-\\x{B6}][\\x{80}-\\x{BF}])|"
                "(\\x{EF}\\x{B7}([\\x{80}-\\x{8F}]|[\\x{B0}-\\x{BF}]))|"
                "(\\x{EF}[\\x{B8}-\\x{BE}][\\x{80}-\\x{BF}])|"
                "(\\x{EF}\\x{BF}[\\x{80}-\\x{BD}])|"
                "(\\x{F0}[\\x{90}-\\x{BF}][\\x{80}-\\x{BF}])|"
                "([\\x{F1}-\\x{F2}][\\x{80}-\\x{BF}]{3})|"
                "(\\x{F3}[\\x{80}-\\x{AF}][\\x{80}-\\x{BF}]{2})");

Any help will be apreciated. You see the 'process_instruction_target'
regexp keeps on matching for xml and any variant thereof... while it
should match all except for the variants of xml.

Thank you,
Etienne


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net