|
Boost : |
Subject: Re: [boost] [string_algo] ilexicographical_compare()/is_iless() bug?
From: Lars Viklund (zao_at_[hidden])
Date: 2012-04-08 13:55:18
On Sun, Apr 08, 2012 at 06:50:50PM +0200, Olaf van der Spek wrote:
> On Sun, Apr 8, 2012 at 3:46 PM, Dmitry Vinogradov <sraider_at_[hidden]> wrote:
> > I want to share a problem using ilexicographical_compare().
> >
> > I think that ilexicographical_compare() must compare two strings in
> > "alphabetical" order, the order letters appear in alphabet.
> > But ilexicographical_compare() uses is_iless() to compare letters.
> > And such comparison looks like:
> > std::toupper<T1>(Arg1,m_Loc)<std::toupper<T2>(Arg2,m_Loc);
>
> I think this isn't right either. If you've got both "A" and "a", the
> order appears to be undefined.
>
> > So, in fact, letters are compared depending on their position in a charset.
> > It's not always match the alphabet order. Examples are Cyrillic letters "Io"
> > and "i" (Unicode 0451 and 0456).
> >
> > I think the right solution is to compare like this:
> > T1 Ch1 = std::toupper<T1>(Arg1,m_Loc);
> > T2 Ch2 = std::toupper<T2>(Arg2,m_Loc);
> > return std::use_facet< std::collate<typename CharType> >
> > (m_Loc).compare(&Ch1, &Ch1 + 1, &Ch2, &Ch2 + 1);
So how does all this interact with amusing locales like say Turkish
where the dottedness of i is preserved when casing, or glyphs that do
not have upper/lower-case forms?
-- Lars Viklund | zao_at_[hidden]
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk