|
Boost Users : |
From: Tomaž Šolc (tomaz.solc_at_[hidden])
Date: 2007-11-15 18:29:27
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi
Documentation says that when using wide character strings with
boost::wregex a character class like [[:alpha:]] depends on the system's
implementation of iswalpha() function.
My system seems to have a working implementation of iswalpha() function,
but [[:alpha:]] still only seems to match ASCII alphabet characters.
For example the following code:
#define UNICODE
#include <boost/regex.h>
#include <stdio.h>
#include <wctype.h>
#include <locale.h>
int main() {
regex_t r;
setlocale(LC_ALL, "en_US.utf8");
regcomp(&r, L"^[[:alpha:]]$", REG_EXTENDED);
printf("%d\n", iswalpha(L'A'));
printf("%d\n\n", regexec(&r, L"A", 0, NULL, 0));
printf("%d\n", iswalpha(L'\x160'));
printf("%d\n\n", regexec(&r, L"\x160", 0, NULL, 0));
printf("%d\n", iswalpha(L'1'));
printf("%d\n", regexec(&r, L"1", 0, NULL, 0));
regfree(&r);
return 0;
}
Returns
1
0
1
1
0
1
In the second pair, iswalpha() correctly recognizes Unicode "S WITH
CARON" character, however regular expression with [[:alpha:]] doesn't
match it.
I'm using Debian GNU/Linux with Boost 1.33.1. I also tried a similar
program using boost::wregex and std::iswalpha() classes instead of the
POSIX interface with the same results.
Can anyone give me some advice on what I'm doing wrong here?
Thanks
Tomaž Šolc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHPNZXsAlAlRhL9q8RAlymAKDktxV+FWCTvBEBKwMNfr9yus5rgQCfc3N1
WoCdr+9zgBSEXPORSLAJiUM=
=18dB
-----END PGP SIGNATURE-----
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net