Boost logo

Boost :

Subject: Re: [boost] [string] Realistic API proposal
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-01-31 04:22:49


> From: Mathias Gaunard <mathias.gaunard_at_[hidden]> > Subject: Re: [boost] [string] Realistic API proposal > > On 30/01/2011 08:46, Artyom wrote: > >> > >> If my strings are valid and normalized, I can compare them with a simple > >> binary-level comparison; > >> likewise for substring search, where I may also need to add a boundary >check > >> if I want fine-grain search. > >> > > > > No you can't > > > > For example when you search word שלום you want to find שָלוֹם as well (with > > diactrics) > > that are not normalized. > > Unless I understand that wrong, they're as equal as e is equal to é or a > is equal to à. > Yes, with small exception that "שָ" is NFC form that consists of two code points for "base latter" and "vowel mark" which should be equal to "ש" the "base letter", unlike "à" which has one code point in NFC form like "a". > > > > > Search and Collation require much more complicated levels comparison. > > Right, I'm talking about exact comparison, not collation. > Exact comparison is what you use in most text processing and parsing. > > You can perform collation folding with the right level if you want those > two strings to compare equal. > > > > > > The problem that I may want 00e0 (à) and 0061 0300 (a + `) and 0061 (a) to >be > > equal for string > > search as well. > > You may, but that should not be the default behaviour of operator== and > operator<. > The default behavior is binary comparison, but this is not what I'm looking for I'm looking for search/comparison algorithm that can see "à" and "a" and "שָ" and "ש" as equal. Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk