Boost logo

Boost Users :

From: Rainer Deyke (rainerd_at_[hidden])
Date: 2019-10-30 12:59:41


On 26.10.19 18:41, Zach Laine via Boost-users wrote:
> NFC, very close to FCC, is more popular, due to its compactness. I picked
> the normalization form with the most readily available time and space
> optimizations, and then stuck to just that one -- the alternative is many
> text types with different normalizations having to interoperate, which
> sounds like hell.

I can understand that, all other things being equal, the more compact
form might be preferable. I mean, if you know nothing about Unicode
normalization forms other than that one is more compact than the other,
then you might as well pick the more compact one, right?

But all other things are clearly /not/ equal, or you would just use NFC.
  And the difference in compactness between NFC and NFD is completely
trivial. I challenge you to find any real-world text where the
difference is size between NFC and NFD is big enough that I should care
about it, both in absolute and relative terms.

I consider FCC a non-solution to a non-problem. The advantage of NFC
over NFD is not compactness, but compatibility with interfaces that
expect NFC. Since FCC does not provide that advantage, there is no
reason to choose FCC over NFD. On the other hand, there are several
good reasons for choosing NFD over FCC. Aside from the obvious one -
compatibility with interfaces that expect NFD - there's also cleaner,
simpler code with fewer surprises. For example, it is a completely
straightforward operation to replace all acute accents in a NFD text
with grave accents or to remove acute accents entirely, whereas the FCC
equivalent requires effectively transcoding to NFD.

In summary, I think you should support NFD text types. Either in
addition to FCC or instead of it.

-- 
Rainer Deyke (rainerd_at_[hidden])

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net