Boost Users :

Date view	Thread view	Subject view	Author view

From: John Maddock (john_at_[hidden])
Date: 2003-12-13 07:03:54

Next message: John Maddock: "Re: [Boost-users] Newbie question: block read inputs to regex"
Previous message: jschmid: "Re: [Boost-users] Re: Re: find japanese character with boost regex++"
In reply to: jschmid: "[Boost-users] find japanese character with boost regex++"
Next in thread: Darren Cook: "Re: [Boost-users] find japanese character with boost regex++"
Reply: Darren Cook: "Re: [Boost-users] find japanese character with boost regex++"

I have just discovered the incredible boost and regex++ libraries but I have
encountered some difficulties...

I read that the japanese special encoding is handled in regex++, specially
using wide char wchar characters. In the regex++ faq, it is presented the
system of class , ex. [[:space]], in order to define a set of characters
with a same property. I have been looking for a kind of [[:Japanese
characters]] class. Actually I have a text with a lot of strange characters
and japanese one ( Hiragana, katakan, Kanji everything..!) and I want to
find these japanese sentence in order to translate them and replace in the
text. I need hence a way in order to identify a japanese sentence . A kind
of function const bool isJap( const wchar ) const would be fine.

So if somebody has any idea or a some links, I would appreciate it! Thanks!

~~~~~~~~~~~~~~~~~~~~~~~

Two options:

1) You can hack the traits class used by boost.regex:

Create your own traits class that inherits from boost::regex_traits and
which implements the following member functions:

uint32_t lookup_classname(const char_type* first, const char_type*
last)const;
bool is_class(char_type c, uint32_t f)const;

The first transforms your character-class name into a constant, the latter
checks to see if a character is a member of that class. Choose a value for
your constant that isn't already in use by regex_traits.

Finally use reg_expression<wchar_t, your_traits_class<wchar_t> > rather than
boost::wregex.

2) Just use a character range - most Japanese characters are confined to a
specific character range (I forget what it is, but the info is publicly
available via the Unicode std).

John.

Next message: John Maddock: "Re: [Boost-users] Newbie question: block read inputs to regex"
Previous message: jschmid: "Re: [Boost-users] Re: Re: find japanese character with boost regex++"
In reply to: jschmid: "[Boost-users] find japanese character with boost regex++"
Next in thread: Darren Cook: "Re: [Boost-users] find japanese character with boost regex++"
Reply: Darren Cook: "Re: [Boost-users] find japanese character with boost regex++"

Date view	Thread view	Subject view	Author view

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net