Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] GSoC Unicode library: second preview
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2009-06-21 21:23:24

Next message: Mathias Gaunard: "Re: [boost] GSoC Unicode library: second preview"
Previous message: Mathias Gaunard: "Re: [boost] GSoC Unicode library: second preview"
In reply to: Scott McMurray: "Re: [boost] GSoC Unicode library: second preview"
Next in thread: Mathias Gaunard: "Re: [boost] GSoC Unicode library: second preview"

Scott McMurray wrote:

> Suppose I have "difficult" with the "ffi" ligature codepoint, and I do
> a perl-style split on /i/.

There is no way for "i" to match as being part of that string unless you
replace the "ffi" ligature by the letters "f", "f", "i".
That operation is known as a compatibility decomposition (and will be
provided by the library in due time, of course, along with compatibility
composition, canonical decomposition, canonical composition and the
normalization forms that are defined in terms of them)

You could choose to apply split with arguments normalized according to
normalization form KC, which allows comparison independently of
formatting considerations.
But that also means 5 will match âµ. You could choose that 5 should match
âµ, but âµ should not match 5, so the pattern should be in NFC but the
string to search in in NFKC.

> I should probably be getting "d", the "ff"
> ligature codepoint, and "cult". I know if I tried to code that by
> hand in every application I'd miss all kinds of evil corner cases like
> that.

Unfortunately Unicode is made of a lot of case corners, and there is no
way around it without understanding it.

Next message: Mathias Gaunard: "Re: [boost] GSoC Unicode library: second preview"
Previous message: Mathias Gaunard: "Re: [boost] GSoC Unicode library: second preview"
In reply to: Scott McMurray: "Re: [boost] GSoC Unicode library: second preview"
Next in thread: Mathias Gaunard: "Re: [boost] GSoC Unicode library: second preview"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk