Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [General] Treat narrow strings as UTF-8 (compilation flag)
From: Artyom Beilis (artyomtnk_at_[hidden])
Date: 2011-07-22 03:23:12

Next message: Frédéric Bron: "Re: [boost] [type trait extension] I hate volatile..."
Previous message: Yakov Galka: "Re: [boost] [General] Treat narrow strings as UTF-8 (compilation flag)"
In reply to: Yakov Galka: "Re: [boost] [General] Treat narrow strings as UTF-8 (compilation flag)"
Next in thread: Beman Dawes: "Re: [boost] [General] Treat narrow strings as UTF-8 (compilation flag)"

Hello All, I can suggest following policy. - Boost must deprecate use of ANSI API on Windows anywhere - Boost must use only Wide API explicitly - Boost must treat all narrow strings as UTF-8 regardless Â the fact it is not compatible with _some_ other software Â that uses ANSI encoding and convert them to Wide onces. Â To make things simpler the conversion should be done only Â on the last stage - close to OS system calls/C library calls like Â CreateFileW or _wfopen, _wremove - I think where it is possible to have an optional backward compatibility Â build/compilation flag like BOOST_WINDOWS_USE_ANSI_ENCODING Â For thous who want to stick with old API with compatibility And I want to explain why keeping using ANSI API is still not compatible and will remain not-compatible even withing existing software. ---------------------------------------------------- ---------------------------------------------------- ANSI/Narrow API is not compatible with itself, there are several places where encoding is defined and it is used differently in different places even withing the native Windows software like Visual Studio itself. ---------------------------------------------------- ---------------------------------------------------- For example, this program does not do what is expected when is compiled with Microsoft Visual Studio 2008/2010 Â Â 1Â setlocale(LC_ALL,"Russian_Russia.1251") // Set Russian Locale Â Â 2Â std::ofstream text("ÐœÐ¸Ñ€.txt"); // encoded as 1251 Â Â Â Â text << "Hello" << std::endl; Â Â Â Â text.close(); Â Â 3Â std::remove("ÐœÐ¸Ñ€.txt"); // 1251 1. Set the global C locale and encoding to Russian and sets the Â Â code page to 1251 - Cyrillic encoding 2. text stream is being opened. "ÐœÐ¸Ñ€.txt" is converted from Â Â CP1251 to UTF-16 and file is created 3. std::remove converts "ÐœÐ¸Ñ€.txt" to UTF-16 according to OS ANSI Â Â code page - it may not be the same code page as was set in (1) Â Â So the file remains on the system and not got removed Â Â Because two different parts of same program use different Â Â narrow encodings. And this happens withing the same runtime and same compiler! --------------------------------------------------------------- 1. ANSI API Must be deprecated 2. UTF-8 should be used by default. Many libraries around had adopted this policy on windows as ASNI encoding keeps us behind and makes cross platform programming nightmare. Example of some libraries that adopted UTF-8 on Windows 1. GTK/GTKmm 2. Sqlite3 3. Boost.Locale - UTF-8 policy was very welcoming by many Â Â reviewers I'd put more libraries into this list but it not comes to my mind right now. I'd suggest to make this policy as official Boost policy and bring it to the formal review. ----------------------------------------------------------- I'm personally would write patches for Boost libraries that still use ANSI API and fix them if required. Yakov - I would be with your on this because current windows/unicode situation is very bad in Boost. ------------------------------------------------------------ Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.sf.net/ CppDB - C++ SQL Connectivity: http://cppcms.sf.net/sql/cppdb/ ----- Original Message ----- > From: Yakov Galka <ybungalobill_at_[hidden]> > To: boost_at_[hidden] > Cc: > Sent: Friday, July 22, 2011 9:49 AM > Subject: Re: [boost] [General] Treat narrow strings as UTF-8 (compilation flag) > > Hello again, > > My previous mail was ignored by the community, and I would like to know why. > If it wasn't clear, I want to hear your opinion on the topic. > > If there is a disagreement, I would like to know what is the reason for the > disagreement. If there are problems in the proposal, perhaps we can fix them > and come to a solution accepted by all. > > If you agree in principle but just don't have the resources for this work, > I'm going to do this work (or part of it). I just don't want to waste my > time on something that is certainly going to be rejected. > > Thank you in advance, > -- > Yakov Galka > > > > On Tue, Jul 5, 2011 at 19:25, Yakov Galka <ybungalobill_at_[hidden]> wrote: > >> Hello All, >> >> About half a year ago there was a long discussion titled "Always treat >> std::strings as UTF-8". The only objection to the proposal was that > making >> an instant switch by assuming UTF-8 by default will give surprising results >> to those who're unaware of the convention (or prefer using legacy > encodings >> instead of UTF-8). This applies almost only to Windows developers. However, >> there are already many projects and organizations that switched to UTF-8 >> even for Windows programming. The company I work in is one of them. >> >> >> Nowadays: >> ========== >> >> All the libraries that accept narrow strings assume the system encoding by >> default. >> * filesystem::path â€” Can be configured through static imbue() function. >> * system_error_category (windows error description), interprocess (object >> names)... more? â€” Don't support Unicode at all. They use the narrow API > on >> Windows. >> * program_options â€” Assumes UTF-8 for internal data (Good!), but uses >> system encoding for paths (parse_config_file) and for environment variables >> (Bad...) . >> >> Note that, e.g. path::imbue(), is a painful solution for two reasons: >> Any global state initialization is problematic in dynamically-linked, >> multi-threaded systems (like the one I'm maintaining now). In such > cases a >> compile time configuration is more attractive. >> I really don't want to have such a function in each boost library (can > be >> solved by having a global boost::imbue though). >> >> >> Proposal: >> ======== >> >> Add a compile-time configuration flag that causes boost to treat all narrow >> strings as UTF-8. The flag will be off by default. >> For example, in filesystem it's a matter of setting `codepage` to > CP_UTF8 >> in just two places. >> >> >> Rationale: >> ========== >> >> Those who are ready to move to the UTF-8 future, they can do it by simply >> setting a compilation flag.. >> Those who don't care about Unicode correctness are not affected by the >> addition. There won't be any complaints to boost, like: "Hey! I > use boost >> with these libraries and it doesn't work. Your encoding is > wrong!". >> >> >> >> -- >> Yakov Galka >> >> > _______________________________________________ > Unsubscribe & other changes: > http://lists.boost.org/mailman/listinfo.cgi/boost >

Next message: Frédéric Bron: "Re: [boost] [type trait extension] I hate volatile..."
Previous message: Yakov Galka: "Re: [boost] [General] Treat narrow strings as UTF-8 (compilation flag)"
In reply to: Yakov Galka: "Re: [boost] [General] Treat narrow strings as UTF-8 (compilation flag)"
Next in thread: Beman Dawes: "Re: [boost] [General] Treat narrow strings as UTF-8 (compilation flag)"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk