Boost logo

Boost :

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2011-08-12 07:08:54


On Fri, Aug 12, 2011 at 12:00, Matus Chochlik <chochlik_at_[hidden]> wrote:

> On Fri, Aug 12, 2011 at 9:57 AM, Daniel James <dnljms_at_[hidden]> wrote:
> > On 11 August 2011 12:57, Artyom Beilis <artyomtnk_at_[hidden]> wrote:
> >>
> >>> There's a lot of existing code which is not based on that assumption -
> >>> we can't just wish it out of existence and boost should be compatible
> >>> with it.
> >>
> >> Then cross platform, Unicode aware programming will always
> >> (I'm sorry) suck with Boost :-)
> >>
> >>
> >> Thats it...
> >
> > Unless a different solution can be found.
>
> I see the old flam .. er discussion on text handling is back :)
>
> >From the previous debate(s) I now accept that it would
> be a bad idea just to force the encoding of std::string to be utf8,
> So a (nearly) ideal text handling class should IMO look like this
> (see usage below):
>
> [...]
>
> // by default expect UTF8
> text(const std::string& str)
> {
> assert(is_utf8(str.begin(), str.end()));
> store(str);
> }
>

What you are doing is, in fact, forcing the assumed encoding of std::string
to UTF-8. You just said you think it's a bad idea.

> [...]
> text t1 = "blahblah"; // must be utf8
>
> // whatever encoding the compiler uses for wide literals
> text t2(L"blablablabl", textenc::compiler());
>
> text t3(some_posix_function(), textenc::posix());
>
> text t4(SomeWinapiFunc(), textenc::winapi());
> text t5(SomeWinapiFuncW(), textenc::winapi());
>

How is it better than:
string t4 = from_narrow(SomeWinapiFuncA()); // use the default encoding used
by system for narrow strings
string t5 = from_wide(SomeWinapiFuncW()); // wchar_t on windows is always
utf16

> text t6(pq_some_func(), textenc::libpq());
>

You don't need it. You're proposing a design that tries to solve a
non-existing problem. There is no such diversity of encodings in the
interfaces. I don't know what is libpq, but it either uses UTF-8 in which
case you write:

string t6 = pq_some_func();

or the default system encoding, in which case you write:

string t6 = from_narrow(pq_some_func());

As you start using more libraries with UTF-8 default encoding, you will use
from_* less frequently.
(It's possible to use a single to_utf8 instead of from_narrow/from_wide
combination.)

[...]
> SomeWinapiFunction(t8.str(textenc::winapi()).c_str());
> SomeWinapiFunctionW(concat(t9, text::newline(),
> t8).wstr(textenc::winapi()).c_str());
>

Same as above. 'text' as a distinct type doesn't play any role here. If t9
is std::string, this becomes:

SomeWinapiFunctionA(to_narrow(t8).c_str()); // to the default narrow
system-encoding.
SomeWinapiFunctionW(to_wide(t9 + "\r\n" + t8).c_str()); // what kind of
newline is expected defined by the API, not the system.

> [...]
> i.e. besides the fact that the string "uses utf8" (there is already
> a whole heap of such strings) it must also handle all the conversions
> between utf8 and whatever the OS and the major libraries and
> APIs expect and use; conveniently (and effectively).
> Otherwise the effort is IMHO wasted.
>

Your 'text' doesn't do this in a transparent way. In fact you cannot do it
in transparent way because 'const char*' doesn't carry the necessary
semantic information. The burden of deciding what encoding to convert
to/from falls on the programmer *anyway*. You don't benefit anything from
defining yet-another string type.

Boost libraries (at the very least those wrapping OS functionality)
> should adopt this text class, and do the conversions, "just-in-time"
> when making the OS API call.
>

In the light of the said above, your 'text' class won't catch bugs like:

char str[1024];
GetWindowTextA(hwnd, str, sizeof(str));
boost::function_with_text_parameter(str);

Therefore, I don't think we should adopt this text class.

-- 
Yakov

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk