Boost logo

Boost Users :

From: Tomaž Šolc (tomaz.solc_at_[hidden])
Date: 2007-11-15 18:29:27


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi

Documentation says that when using wide character strings with
boost::wregex a character class like [[:alpha:]] depends on the system's
implementation of iswalpha() function.

My system seems to have a working implementation of iswalpha() function,
but [[:alpha:]] still only seems to match ASCII alphabet characters.

For example the following code:

#define UNICODE
#include <boost/regex.h>
#include <stdio.h>
#include <wctype.h>
#include <locale.h>
int main() {
        regex_t r;

        setlocale(LC_ALL, "en_US.utf8");

        regcomp(&r, L"^[[:alpha:]]$", REG_EXTENDED);

        printf("%d\n", iswalpha(L'A'));
        printf("%d\n\n", regexec(&r, L"A", 0, NULL, 0));

        printf("%d\n", iswalpha(L'\x160'));
        printf("%d\n\n", regexec(&r, L"\x160", 0, NULL, 0));

        printf("%d\n", iswalpha(L'1'));
        printf("%d\n", regexec(&r, L"1", 0, NULL, 0));

        regfree(&r);
        return 0;
}

Returns

1
0

1
1

0
1

In the second pair, iswalpha() correctly recognizes Unicode "S WITH
CARON" character, however regular expression with [[:alpha:]] doesn't
match it.

I'm using Debian GNU/Linux with Boost 1.33.1. I also tried a similar
program using boost::wregex and std::iswalpha() classes instead of the
POSIX interface with the same results.

Can anyone give me some advice on what I'm doing wrong here?

Thanks
Tomaž Šolc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHPNZXsAlAlRhL9q8RAlymAKDktxV+FWCTvBEBKwMNfr9yus5rgQCfc3N1
WoCdr+9zgBSEXPORSLAJiUM=
=18dB
-----END PGP SIGNATURE-----


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net