Boost logo

Boost Users :

From: Eric Niebler (eric_at_[hidden])
Date: 2007-05-01 18:08:56


Eoin wrote:
> Hello,
> I've been playing around with Xpressive and have hit a stumbling block
> when working with Unicode. I really am not sure if what I am trying to
> do is possible and that I'm just doing something silly wrong. Here is
> a very small example-
>
> wstring in = L"A ЏиϊсoδΣ Hello //World";
> wsregex comments =
> wsregex::compile(L"(//[^\\n]*|/\\*.*?\\*/)");
>
> wstring clear(L"");
> wstring out = regex_replace(in, comments, clear);
>
> (Note if the wstring 'in' gets scrambled in email I have attached a
> tiny UTF-8 encoded file of what it should contain.) This code compiles
> but after executing the wstring 'out' only contains "A ". If there are
> no Unicode characters in 'in' then the regex replacement works as
> expected.

I can't reproduce this problem, but I bet I can guess what is going on
for you. I bet after the regex_replace, you're trying to write the
string to std::wcout, like this:

   std::wcout << out << std::endl;

That's the first thing I tried, and the output is "A ". That's on
Windows after building with VC8. But that's just because the Windows
console doesn't know what to do with this Unicode characters, so
std::wcout enters an error state, and nothing further gets displayed. If
you look at the result in a debugger, you can see the Unicode string is
as it should be.

If you don't think that's what's going on, please send me a *complete*
program that reproduces the error, and let me know what
compiler/platform you're on.

Thanks,

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net