|
Boost Users : |
From: John Maddock (john_at_[hidden])
Date: 2003-12-14 06:41:12
> Are the existing character-classes following a standard, or are you open
to
> patches to extend them?
Yes, they follow the POSIX and ECMA script standards to give:
"alnum"
"alpha",
"cntrl",
"digit",
"graph",
"lower",
"print",
"punct",
"space",
"upper",
"xdigit",
"blank",
"word",
"unicode",
> It might be nice to have at least:
> [:hiragana:]
> [:katakana:]
> [:hankaku_katakana:]
isn't that just [[:hiragana:][:katakana:]] ?
> [:wide_alpha:]
> [:wide_num:]
> [:wide_alphanum:]
There should be no need for those - [[:alpha:]] will detect wide character
alphabetic characters perfectly well (provided the locale isn't "C").
> Defining the set of Japanese kanji would be harder.
How are they defined?
It might be best to add a facility to add new character classes as a list of
characters and ranges to include, something like:
register_character_class("myname", "d-f");
Then we add all the Unicode block ranges as standard for wide character
regexes.
John.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net