Boost logo

Boost Users :

From: John Maddock (john_at_[hidden])
Date: 2003-12-14 06:41:12


> Are the existing character-classes following a standard, or are you open
to
> patches to extend them?

Yes, they follow the POSIX and ECMA script standards to give:

"alnum"
"alpha",
"cntrl",
"digit",
"graph",
"lower",
"print",
"punct",
"space",
"upper",
"xdigit",
"blank",
"word",
"unicode",

> It might be nice to have at least:
> [:hiragana:]
> [:katakana:]
> [:hankaku_katakana:]

isn't that just [[:hiragana:][:katakana:]] ?

> [:wide_alpha:]
> [:wide_num:]
> [:wide_alphanum:]

There should be no need for those - [[:alpha:]] will detect wide character
alphabetic characters perfectly well (provided the locale isn't "C").

> Defining the set of Japanese kanji would be harder.

How are they defined?

It might be best to add a facility to add new character classes as a list of
characters and ranges to include, something like:

register_character_class("myname", "d-f");

Then we add all the Unicode block ranges as standard for wide character
regexes.

John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net