|
Boost Users : |
From: moscao_at_[hidden]
Date: 2003-12-12 08:43:37
>> Actually I have a text with a lot of strange characters and japanese
>> one ( Hiragana, Katakana, Kanji everything..!) and I want to find these
>>japanese sentence in order to translate them and replace in the text.
>>I need hence a way in order to identify a japanese sentence . A kind
>>of function const bool isJap( const wchar ) const would be fine.
>Do you need to use regexes? I've not tried boost.regex yet so cannot help
>there.
>Is your text just ascii and Japanese? Or do you need to distinguish from
>other languages as well?
>If just ascii and Japanese, you could define a Japanese char as anything
>that is not ascii (beware shift-jis encoding though, as 2nd byte of a double
>byte character is in the ascii range). If your data is unicode it should
>also be easy to treat European characters as non-Japanese as well.
>Darren
Thanks Darren for your reply,
Well actually I can avoid using regex but my text is more than ascii and
japanese. Actually it is a byte file where some pieces are japanese sentences
and others are byte controls like 0x00 ( which introduces more difficulties
because you cannot parse the text as a string because 0x00 is an end
character... ). So I thinks I have to parse bytes by pair and try to identify
them as Shift-JIS when it is the case. Any idea of a function or program that
does it?
Thanks,
jschmid
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net