Boost logo

Boost :

Subject: Re: [boost] [General] Treat narrow strings as UTF-8 (compilation flag)
From: Artyom Beilis (artyomtnk_at_[hidden])
Date: 2011-07-22 03:23:12


Hello All, I can suggest following policy. - Boost must deprecate use of ANSI API on Windows anywhere - Boost must use only Wide API explicitly - Boost must treat all narrow strings as UTF-8 regardless   the fact it is not compatible with _some_ other software   that uses ANSI encoding and convert them to Wide onces.   To make things simpler the conversion should be done only   on the last stage - close to OS system calls/C library calls like   CreateFileW or _wfopen, _wremove - I think where it is possible to have an optional backward compatibility   build/compilation flag like BOOST_WINDOWS_USE_ANSI_ENCODING   For thous who want to stick with old API with compatibility And I want to explain why keeping using ANSI API is still not compatible and will remain not-compatible even withing existing software. ---------------------------------------------------- ---------------------------------------------------- ANSI/Narrow API is not compatible with itself, there are several places where encoding is defined and it is used differently in different places even withing the native Windows software like Visual Studio itself. ---------------------------------------------------- ---------------------------------------------------- For example, this program does not do what is expected when is compiled with Microsoft Visual Studio 2008/2010    1  setlocale(LC_ALL,"Russian_Russia.1251") // Set Russian Locale    2  std::ofstream text("Мир.txt"); // encoded as 1251       text << "Hello" << std::endl;       text.close();    3  std::remove("Мир.txt"); // 1251 1. Set the global C locale and encoding to Russian and sets the    code page to 1251 - Cyrillic encoding 2. text stream is being opened. "Мир.txt" is converted from    CP1251 to UTF-16 and file is created 3. std::remove converts "Мир.txt" to UTF-16 according to OS ANSI    code page - it may not be the same code page as was set in (1)    So the file remains on the system and not got removed    Because two different parts of same program use different    narrow encodings. And this happens withing the same runtime and same compiler! --------------------------------------------------------------- 1. ANSI API Must be deprecated 2. UTF-8 should be used by default. Many libraries around had adopted this policy on windows as ASNI encoding keeps us behind and makes cross platform programming nightmare. Example of some libraries that adopted UTF-8 on Windows 1. GTK/GTKmm 2. Sqlite3 3. Boost.Locale - UTF-8 policy was very welcoming by many    reviewers I'd put more libraries into this list but it not comes to my mind right now. I'd suggest to make this policy as official Boost policy and bring it to the formal review. ----------------------------------------------------------- I'm personally would write patches for Boost libraries that still use ANSI API and fix them if required. Yakov - I would be with your on this because current windows/unicode situation is very bad in Boost. ------------------------------------------------------------ Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.sf.net/ CppDB - C++ SQL Connectivity: http://cppcms.sf.net/sql/cppdb/ ----- Original Message ----- > From: Yakov Galka <ybungalobill_at_[hidden]> > To: boost_at_[hidden] > Cc: > Sent: Friday, July 22, 2011 9:49 AM > Subject: Re: [boost] [General] Treat narrow strings as UTF-8 (compilation flag) > > Hello again, > > My previous mail was ignored by the community, and I would like to know why. > If it wasn't clear, I want to hear your opinion on the topic. > > If there is a disagreement, I would like to know what is the reason for the > disagreement. If there are problems in the proposal, perhaps we can fix them > and come to a solution accepted by all. > > If you agree in principle but just don't have the resources for this work, > I'm going to do this work (or part of it). I just don't want to waste my > time on something that is certainly going to be rejected. > > Thank you in advance, > -- > Yakov Galka > > > > On Tue, Jul 5, 2011 at 19:25, Yakov Galka <ybungalobill_at_[hidden]> wrote: > >> Hello All, >> >> About half a year ago there was a long discussion titled "Always treat >> std::strings as UTF-8". The only objection to the proposal was that > making >> an instant switch by assuming UTF-8 by default will give surprising results >> to those who're unaware of the convention (or prefer using legacy > encodings >> instead of UTF-8). This applies almost only to Windows developers. However, >> there are already many projects and organizations that switched to UTF-8 >> even for Windows programming. The company I work in is one of them. >> >> >> Nowadays: >> ========== >> >> All the libraries that accept narrow strings assume the system encoding by >> default. >> * filesystem::path — Can be configured through static imbue() function. >> * system_error_category (windows error description), interprocess (object >> names)... more? — Don't support Unicode at all. They use the narrow API > on >> Windows. >> * program_options — Assumes UTF-8 for internal data (Good!), but uses >> system encoding for paths (parse_config_file) and for environment variables >> (Bad...) . >> >> Note that, e.g. path::imbue(), is a painful solution for two reasons: >> Any global state initialization is problematic in dynamically-linked, >> multi-threaded systems (like the one I'm maintaining now). In such > cases a >> compile time configuration is more attractive. >> I really don't want to have such a function in each boost library (can > be >> solved by having a global boost::imbue though). >> >> >> Proposal: >> ======== >> >> Add a compile-time configuration flag that causes boost to treat all narrow >> strings as UTF-8. The flag will be off by default. >> For example, in filesystem it's a matter of setting `codepage` to > CP_UTF8 >> in just two places. >> >> >> Rationale: >> ========== >> >> Those who are ready to move to the UTF-8 future, they can do it by simply >> setting a compilation flag.. >> Those who don't care about Unicode correctness are not affected by the >> addition. There won't be any complaints to boost, like: "Hey! I > use boost >> with these libraries and it doesn't work. Your encoding is > wrong!". >> >> >> >> -- >> Yakov Galka >> >> > _______________________________________________ > Unsubscribe & other changes: > http://lists.boost.org/mailman/listinfo.cgi/boost >


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk