Subject: Re: [boost] [string] Realistic API proposal
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-01-31 04:22:49
> From: Mathias Gaunard <mathias.gaunard_at_[hidden]>
> Subject: Re: [boost] [string] Realistic API proposal
> On 30/01/2011 08:46, Artyom wrote:
> >> If my strings are valid and normalized, I can compare them with a simple
> >> binary-level comparison;
> >> likewise for substring search, where I may also need to add a boundary
> >> if I want fine-grain search.
> > No you can't
> > For example when you search word ×©××× you want to find ×©Ö¸××Ö¹× as well (with
> > diactrics)
> > that are not normalized.
> Unless I understand that wrong, they're as equal as e is equal to Ã© or a
> is equal to Ã .
Yes, with small exception that "×©Ö¸" is NFC form that consists of two code points
for "base latter" and "vowel mark" which should be equal to "×©" the "base
unlike "Ã " which has one code point in NFC form like "a".
> > Search and Collation require much more complicated levels comparison.
> Right, I'm talking about exact comparison, not collation.
> Exact comparison is what you use in most text processing and parsing.
> You can perform collation folding with the right level if you want those
> two strings to compare equal.
> > The problem that I may want 00e0 (Ã ) and 0061 0300 (a + `) and 0061 (a) to
> > equal for string
> > search as well.
> You may, but that should not be the default behaviour of operator== and
The default behavior is binary comparison, but this is not what I'm
looking for I'm looking for search/comparison algorithm that can
see "Ã " and "a" and "×©Ö¸" and "×©" as equal.