Boost logo

Boost :

Subject: Re: [boost] [General] Treat narrow strings as UTF-8 (compilation flag)
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2011-07-22 02:49:19


Hello again,

My previous mail was ignored by the community, and I would like to know why.
If it wasn't clear, I want to hear your opinion on the topic.

If there is a disagreement, I would like to know what is the reason for the
disagreement. If there are problems in the proposal, perhaps we can fix them
and come to a solution accepted by all.

If you agree in principle but just don't have the resources for this work,
I'm going to do this work (or part of it). I just don't want to waste my
time on something that is certainly going to be rejected.

Thank you in advance,

-- 
Yakov Galka
On Tue, Jul 5, 2011 at 19:25, Yakov Galka <ybungalobill_at_[hidden]> wrote:
> Hello All,
>
> About half a year ago there was a long discussion titled "Always treat
> std::strings as UTF-8". The only objection to the proposal was that making
> an instant switch by assuming UTF-8 by default will give surprising results
> to those who're unaware of the convention (or prefer using legacy encodings
> instead of UTF-8). This applies almost only to Windows developers. However,
> there are already many projects and organizations that switched to UTF-8
> even for Windows programming. The company I work in is one of them.
>
>
> Nowadays:
> ==========
>
> All the libraries that accept narrow strings assume the system encoding by
> default.
> * filesystem::path — Can be configured through static imbue() function.
> * system_error_category (windows error description), interprocess (object
> names)... more? — Don't support Unicode at all. They use the narrow API on
> Windows.
> * program_options — Assumes UTF-8 for internal data (Good!), but uses
> system encoding for paths (parse_config_file) and for environment variables
> (Bad...) .
>
> Note that, e.g. path::imbue(), is a painful solution for two reasons:
> Any global state initialization is problematic in dynamically-linked,
> multi-threaded systems (like the one I'm maintaining now). In such cases a
> compile time configuration is more attractive.
> I really don't want to have such a function in each boost library (can be
> solved by having a global boost::imbue though).
>
>
> Proposal:
> ========
>
> Add a compile-time configuration flag that causes boost to treat all narrow
> strings as UTF-8. The flag will be off by default.
> For example, in filesystem it's a matter of setting `codepage` to CP_UTF8
> in just two places.
>
>
> Rationale:
> ==========
>
> Those who are ready to move to the UTF-8 future, they can do it by simply
> setting a compilation flag..
> Those who don't care about Unicode correctness are not affected by the
> addition. There won't be any complaints to boost, like: "Hey! I use boost
> with these libraries and it doesn't work. Your encoding is wrong!".
>
>
>
> --
> Yakov Galka
>
>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk