Boost logo

Boost Users :

From: Eric Niebler (yg-boost-users_at_[hidden])
Date: 2002-07-02 01:59:49


On Sun, 30 Jun 2002 04:20:32 -0700, John Maddock wrote:

>> I am currently trying to use the boost regex library with Japanese
>> language strings. It appears like DBCS is not supported. For example,
>> using the following code (with compile definition of
>> BOOST_REGEX_USE_C_LOCALE) I get the output strings as
>>
>> 0 = "$B!#(B"
>> 1 = "English"
>>
>> Instead of the expected:
>>
>> 0 = "$B$d$f$h$o$r!<!#(B"
>> 1 = "English"
>>
>> This is due to the fact that the Japanese (SJIS encoding) for one of
>> these characters uses the [ character as one of the characters in the
>> encoding.
>>
[snip]
>>
>> Brodie.
>
> To be honest I know nothing at all about DBCS, but I assumed that very
> code point was represented by *exactly two* characters. If that's the
> case then I think it might be possible
>
>
DBCS encodings like SJIS are variable-width. But the real problem is that
given an iterator into a DBCS string, it is impossible to tell where the
previous character starts without walking back to the beginning of the
string. So you can really only make a forward DBCS iterator, not a
bidirectional one. And I think regex++ requires bidirectional iterators,
right John?

> Otherwise can you use Unicode?
>

Yup, use Unicode.

> John Maddock
> http://ourworld.compuserve.com/homepages/john_maddock/index.htm
>

Eric


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net