Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-01-18 10:09:37

On Tue, 18 Jan 2011 05:35:17 -0800 (PST)
Artyom <artyomtnk_at_[hidden]> wrote:

>> From: Alexander Lamaison <awl03_at_[hidden]>
>>> Yes, in principle. It isn't terribly necessary if everybody is
>>> operating in UTF-8 land though.
>> Which is exactly why it's necessary: everybody _isn't_ operating in
>> UTF-8 land.
> The problem is that you need to pic some encoding and UTF-8 is the
> most universal and useful.

I'll second that. Little wasted space, no byte-order problems, and very
easy to work with (finding the first byte of a character, for instance,
is child's play).

> Otherwise you should:
> 1. Reinvent the string

Or at least wrap it. ;-)

> 2. Reinvent standard library to use new string

Not entirely necessary, for the same reason that very few changes to
the standard library are needed when you switch from char strings to
char16_t strings to char32_t strings -- the standard library, designed
around the idea of iterators, is mostly type-agnostic.

The utf*_t types provide fully functional iterators, so they'll work
fine with most library functions, so long as those functions don't care
that some characters are encoded as multiple bytes. It's just the ones
that assume that a single byte represents all characters that you have
to replace, and you'd have to replace those regardless of whether you're
using a new string type or not, if you're using any multi-byte encoding.

> 3. Reinvent 1001 other libraries to use the new string.

Again, seldom necessary. Just use a type system that can translate
between your internal coding and what the library wants, at the
boundaries. If the other library you want to use can't handle
multi-byte encodings, you'd have to modify or reinvent it anyway.

> It is just neither feasible no necessary.

My code says it's perfectly feasible. ;-) Whether it's necessary or not
is up to the individual developer, but the type-safety it offers is
more in line with the design philosophy of C++ than using std::string
for everything. I hate to harp on the same tired example, but why do
you really need any pointer type other than void*? It's the same idea.

Chad Nelson
Oak Circle Software, Inc.

Boost list run by bdawes at, gregod at, cpdaniel at, john at