Boost logo

Boost :

Subject: Re: [boost] [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2012-05-30 03:20:28


Hi Artyom,

On Mon, May 28, 2012 at 2:33 PM, Artyom Beilis <artyomtnk_at_[hidden]> wrote:
>
> I comments on a library that I want to submit for a formal review.
>
> The library provides an implementation of standard C and C++ library
> functions such that their inputs are UTF-8 aware on Windows without
> requiring using Wide API to make program work on Windows.
>

here are my 0.02 Euro:

I completely agree that for general-purpose text storage and handling
(reading lines from text-file/console, reading user input from
GUI, displaying formatted (and localized) messages to the user
in a UI, etc., etc.) UTF-8 should *finally* be adopted.
The other encodings (including UCS-2, UTF-16/32) have their
uses, but should be treated as special cases.

The nowide library is certainly useful within the (limited) scope of working
with text obtained from the OS and passed to the OS where you
can make some assumptions and guess the encoding that the
OS uses and do the conversions from and to UTF8, BUT ...

many text-handling applications tend also use third-party libraries
which also have their own ideas about text encodings and your library
would be *much* more useful if it allowed to "talk" to such libraries
(or devices).

So let me reiterate some points I already mentioned in the earlier
text-related discussions here:

1) Let's use std::string as a encoding-agnostic string as it has
always been - the encoding of the data stored in string should
be application dependent.

2) Let's implement a text storage class (and let's call it) text;
This class would store text (internally in whatever encoding
is the "best" at the specific platform) and would have the following
function defined:

/* UTF-8 encoded */ std::sting str(text t);
- This function would return a std::string containing the text
stored in t encoded in UTF-8.

template <typename SymbolicEncodingTag>
text text::from(std::basic_string<SymbolicEncodingTag::CharT> s)
- This function would convert the string stored in s to text
assuming that s is encoded in encoding specified
by SymbolicEncodingTag.

template <typename SymbolicEncodingTag>
std::basic_string<SymbolicEncodingTag::CharT> text::to(text t);
- This function would convert the text stored in t to
a std::string encoded in encoding specified
by SymbolicEncodingTag.

The encoding tags would specify both concrete encodings
like UTF-16 or ISO-8859-2, etc. and symbolic encodings
like OS (which would autodetect the OS's encoding) or
libFoo which would use libFoo's encoding.

Actually the library would not have to specify many
tags for concrete third-party libraries (maybe only the most
popular). Instead it would provide some means to define
the tags to applications based on their needs.

The text class would be used to store text in class members,
functions parameters, variables, etc. and would be converted
to string (in whatever encoding) only when the contents of the
text has to be examined byte-by-byte, CP-by-CP, etc. or
passed to the OS, library or device requiring a specific encoding.

Also initialization of text from c-string-literals should be handled
correctly on various platforms/compilers.

If I'm not terribly mistaken all the code for conversions between
encodings already is part of Boost.Locale.

Then all the useful things like the nowide::args class and
the wrappers around iostreams, etc. could be implemented
on top of that.

Best,

Matus


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk