Boost logo

Boost Users :

Subject: Re: [Boost-users] Unicode regex example
From: John Maddock (john_at_[hidden])
Date: 2013-08-14 04:01:53


> here is an unicode regex example, which I want to get matched:
>
> #include <iostream>
> #include <boost/regex.hpp>
> #include <boost/regex/icu.hpp>
>
> int main()
> {
> std::setlocale(LC_ALL, "");
>
> boost::wregex condition(L"\\p{u}");
>
> std::wstring test_word(L"Ü");
>
> if (boost::regex_match(test_word, condition)) {
> std::wcout << L"Matches!" << std::endl;
> }
>
> boost::wregex condition2(L"[[:upper:]]");
>
> if (boost::regex_match(test_word, condition2)) {
> std::wcout << L"Matches!" << std::endl;
> }
>
> boost::u32regex condition3 = boost::make_u32regex(L"\\p{u}");
>
> if (boost::u32regex_match(test_word, condition3)) {
> std::wcout << L"Matches using lib icu!" << std::endl;
> }
>
> return 0;
> }
>
> Compiled with -lboost_regex -licuuc.
>
> Result: Only the last regex condition matches.
>
> So I have a few questions:
>
> 1. Is u32regex + make_u32regex the *only* way to get my regex condition
> matched?

Not necessarily, but it's the only way to get *consistent* Unicode support.

> 2. Why does "upper class" in the second regex condition not match. E.g.
> when I use:
>
> echo "Ü" | grep '[[:upper:]]'
>
> on command line - it works properly :)
>
> Thanks in advance + regards,

The "Ü" is treated as upper case if std::locale treats it as upper case - I
would sort of expect that to be the case - but apparently not :-(

HTH, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net