Boost logo

Boost Users :

Subject: Re: [Boost-users] [regex] character sets
From: John Maddock (john_at_[hidden])
Date: 2009-02-09 11:23:33


> I'm working with CBuilder6 and boost boost_1_34_1 and found, that
>
> regex expression1("[[:print:]]+");
>
> matches with regex_search the whole string:
>
> testtext = "ab\r\ncd ef";
>
> including the linebreak and the white space. I had assumed, that the
> print-class would consist exactly in the set of characters for which
> the standard function isprint returns true but isprint returns false
> for the space characters. Is there there any documentation with the
> exact enumerations of the different character sets?

On Windows the default behavior is to use ::GetStringTypeEx to determine
character classifications: this puts \r and \n as "space" characters which
certainly are printable.

If you wish you could use

basic_regex<char, c_regex_traits<char> >

as the regex type, and get the behavior you're expecting as internally it
calls std::isctype, or you could use:

basic_regex<char, cpp_regex_traits<char> >

as the regex type, and character classification would then depend on
std::locale.

In both these cases though, exactly how \r and \n are classified would
depend on the platform and the locale in effect at the time - unless you're
explicitly using the "C" locale in which case you *should* get consistent
behavior, modulo any C std lib bugs.

HTH, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net