Boost logo

Boost Users :

From: John Maddock (john_at_[hidden])
Date: 2008-03-12 14:23:55


Phil Hystad wrote:
>> In follow up to the message and response quoted below. Boost regex
>> seems
>> to work fine on Mac OS X and on our Linux platforms. But, on Windows
>> 32 bit
>> we have the following situation. Note this message is a little bit
>> on the long side given that I am including a short program and the
>> output from running on Windows and Linux platforms.
>>
>> The brief program shown below illustrates this problem. The results
>> are from the Linux and Windows 32-bit machine. You can see on
>> Windows when using the Posix API, I get the right offset only if I
>> use boost::REG_PERL or boost::REG_PERLEX. On Linux, it works fine
>> for all flags.

Right this is by design in order to be std conformant but confusing: for
POSIX regular expressions the behaviour of [x-y] is implementation defined
in the latest POSIX std, while for the previous std it was *required* to be
locale sensitive. Therefore Boost.Regex is locale sensitive for POSIX
regular expressions by default - which means that [A-Z] will match any
single character that collates in the range 'A' to 'Z' in the current
locale. On Win32 that's the default user locale - so [A-Z] will typically
match "b" for example. On Linux what happens depends on the setting of
LC_CTYPE. For Perl regular expressions the default is to not be locale
sensitive on character ranges as it confuses too many people!

For POSIX style regexes you can turn off locale dependent behaviour by
passing REG_NOCOLLATE in combination with whatever other flags you may be
using in regcomp.

For POSIX regexes with boost::regex then use the flags

posix & ~collate

to disable locale specific collation with POSIX regexes.

HTH, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net