Boost logo

Boost Users :

From: moscao_at_[hidden]
Date: 2003-12-12 08:43:37

>> Actually I have a text with a lot of strange characters and japanese
>> one ( Hiragana, Katakana, Kanji everything..!) and I want to find these
>>japanese sentence in order to translate them and replace in the text.
>>I need hence a way in order to identify a japanese sentence . A kind
>>of function const bool isJap( const wchar ) const would be fine.

>Do you need to use regexes? I've not tried boost.regex yet so cannot help

>Is your text just ascii and Japanese? Or do you need to distinguish from
>other languages as well?

>If just ascii and Japanese, you could define a Japanese char as anything
>that is not ascii (beware shift-jis encoding though, as 2nd byte of a double
>byte character is in the ascii range). If your data is unicode it should
>also be easy to treat European characters as non-Japanese as well.


Thanks Darren for your reply,

Well actually I can avoid using regex but my text is more than ascii and
japanese. Actually it is a byte file where some pieces are japanese sentences
and others are byte controls like 0x00 ( which introduces more difficulties
because you cannot parse the text as a string because 0x00 is an end
character... ). So I thinks I have to parse bytes by pair and try to identify
them as Shift-JIS when it is the case. Any idea of a function or program that
does it?



Boost-users list run by williamkempf at, kalb at, bjorn.karlsson at, gregod at, wekempf at