Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-01-18 20:37:44


On Tue, 18 Jan 2011 19:46:41 +0200
"Peter Dimov" <pdimov_at_[hidden]> wrote:

> Dave Abrahams wrote:
>
>>> Yes, in principle. It isn't terribly necessary if everybody is
>>> operating in UTF-8 land though.
>>
>> But they won't be. That's not today's reality.
>
> They should be, though. As a practical matter, the difference between
> taking/returning a string and taking/returning an utf8_t is to force
> people to write an explicit conversion. This penalizes people who are
> already in UTF-8 land because it forces them to use utf8_t( s,
> encoding_utf8 ) and s.c_str( encoding_utf8 ) everywhere, without any
> gain or need. [...]

It doesn't have to. So long as the utf8_t class can easily determine
what encoding it's being fed, it can be set up to do the conversion
itself if necessary. That's how my utf*_t classes are designed; feed a
utf8_t to a function that interfaces with the Windows API and takes a
utf16_t parameter, and the classes will transparently convert it. If
that function returns a utf16_t, and your internal storage type is
utf8_t, just assign it directly to the utf8_t.

If you're still using std::string, then the UTF classes would have to
either make some assumptions or force you to add explicit conversions.
But only library functions that care about the encoding would need to
be written with utf*_t parameters, everything else could be left using
std::string without any problem. My utf8_t class lets you get the
std::string with operator*, so it's easy to use with such
encoding-agnostic functions as well.

-- 
Chad Nelson
Oak Circle Software, Inc.
*
*
*



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk