
Hello! When comparing the following UTF-8 string pairs using Boost.Locale (any backend) at the "identical" level (accents are relevant) and a UTF-8 locale (I tried de_DE.utf-8) on Debian Testing (boost 1.49), I get a result that does not make sense to me. "Muller" is considered less than "Müller" (as expected), but "Muller 2" is considered more than "Müller 1", despite the different result for the names alone. Do I have bug in my code, in the underlying libraries or in my expectations? #include <locale.h> #include <boost/locale.hpp> #include <boost/assign/std/vector.hpp> #include <boost/foreach.hpp> #include <boost/assign/list_of.hpp> #include <boost/algorithm/string/join.hpp> #include <iostream> int main(int argc, char **argv) { setlocale(LC_ALL, ""); std::cout << "backends: " << boost::join(boost::locale::localization_backend_manager::global().get_all_backends(), ", ") << std::endl; boost::locale::localization_backend_manager::global().select(argc > 2 ? argv[2] : "icu"); std::locale loc = boost::locale::generator()(argc > 1 ? argv[1] : "de_DE.UTF-8"); typedef boost::tuple<std::string, std::string> string_pair_t; std::vector<string_pair_t> pairs = boost::assign::tuple_list_of("Muller", "Müller") ("Muller 2", "Müller 1") ("Muller B", "Müller A"); BOOST_FOREACH (const string_pair_t &pair, pairs) { const std::string &a = boost::get<0>(pair), &b = boost::get<1>(pair); int cmp = std::use_facet<boost::locale::collator<char> >(loc). compare(boost::locale::collator_base::identical, a, b); std::cout << a << " and " << b << " are " << (cmp == 0 ? "identical" : "different") << " (" << (cmp < 0 ? '<' : cmp > 0 ? '>' : '=') << ")" << std::endl; } return 0; } The output on my system: $ /tmp/mueller de_DE.utf-8 icu backends: icu, posix, std Muller and Müller are different (<) Muller 2 and Müller 1 are different (>) Muller B and Müller A are different (>) Bye, Patrick