Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-01-17 09:39:20


On Mon, 17 Jan 2011 11:14:26 +0000
Alexander Lamaison <awl03_at_[hidden]> wrote:

> On Sun, 16 Jan 2011 21:41:25 -0500, Chad Nelson wrote:
>
>>> If so (and this is what I see in code) ASCII is misleading.
>>> It should be called Latin1/ISO-8859-1 but not ASCII.
>>
>> Probably, but latin1_t isn't very obvious, and iso_8859_1_t is a
>> little awkward to type. ;-) As I've said, this code was written
>> solely for my company, I'd make a number of changes if I were going
>> to submit it to Boost.
>
> I'm a little concerned by this talk of ASCII and Latin1. When, say,
> utf8_t is given a char* does it not treat is as OS-default encoded
> rather than ASCII/Latin1? I've skimmed to code but havn't managed to
> work out how the classes treat this case.

Right now, the utf*_t classes assume that any std::string fed directly
into them is meant to be translated as-is. It's assumed to consist of
characters that should be directly encoded as their unsigned values.
That works perfectly for seven-bit ASCII text, but may be problematic
for values with the high-bit set.

I've done some research, and it looks like it would require little
effort to create an os::string_t type that uses the current locale, and
assume all raw std::strings that contain eight-bit values are coded in
that instead.

Design-wise, ascii_t would need to change slightly after this, to throw
on anything that can't fit into a *seven*-bit value, rather than
eight-bit. I'll add the default-character option to both types as well,
and maybe make other improvements as I have time.

With this change, the os::native_t typedef would either be completely
redundant or simply wrong, so I'll remove it.

I should be able to find the time for that sometime this week, if all
goes well.

Artyom, since you seem to have more experience with this stuff than I,
what do you think? Would those alterations take care of your objections?

-- 
Chad Nelson
Oak Circle Software, Inc.
*
*
*



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk