Subject: Re: [boost] [rfc] Unicode GSoC project
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2009-05-15 10:04:30
> A good reloadable character library is in the vault.
I'll be reviewing it in a while.
I'm not too sure about the memory layout it uses (__uni_char_data could
really be compressed to use less memory for example), nor about the
interface it exposes, but it does seem to work well.
About is_grapheme_break though, isn't the implementation for legacy
grapheme cluster rather than extended ones though?
> I think that a grapheme is more of an iterator concept than a data type
> concept. By specialising it you will unnecessarily complicate any
> library. Don't forget that, for example, the current grapheme may start
> as one character, then suddenly 'grab' the surrounding characters as it
> makes a combined glyph.
> I have never found a use case in practise where specialising the
> grapheme as other than a validated series of code points was helpful.
A grapheme is nothing more than a subrange of code points, at least in
my current design.
> The two cases where graphemes are important is in display [which
> requires intermediate glyph conversion anyway, and works just as well on
> runs of code points, so code points are fine] and in editing - and the
> grapheme-ness here alters during typing.
It's also useful for grapheme-level searching.
Searching for the substring "foo", in the string "foo\u20d7" shouldn't
match anything, because the extremities of the match are not at grapheme
> if you
> can do graphemes then you can do words, paragraphs etc as they are all
> just attributes of the characters with simple rules. Graphemes come in
> to their own for text display and editing and you would need these as
> well to be able to support that.
Those are not as important in my opinion, and given the time I have is
restricted focus won't be on these.
Adding them later makes perfect sense, however.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk