|
Boost : |
Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Alexander Lamaison (awl03_at_[hidden])
Date: 2011-01-17 10:12:48
On Mon, 17 Jan 2011 09:39:20 -0500, Chad Nelson wrote:
>
> Right now, the utf*_t classes assume that any std::string fed directly
> into them is meant to be translated as-is. It's assumed to consist of
> characters that should be directly encoded as their unsigned values.
> That works perfectly for seven-bit ASCII text, but may be problematic
> for values with the high-bit set.
>
> I've done some research, and it looks like it would require little
> effort to create an os::string_t type that uses the current locale, and
> assume all raw std::strings that contain eight-bit values are coded in
> that instead.
I'm not sure about the os namespace ;) What about just calling it native_t
like your other class but in the same namespace as utf8_t etc.
> Design-wise, ascii_t would need to change slightly after this, to throw
> on anything that can't fit into a *seven*-bit value, rather than
> eight-bit. I'll add the default-character option to both types as well,
> and maybe make other improvements as I have time.
Sounds good.
> Artyom, since you seem to have more experience with this stuff than I,
> what do you think? Would those alterations take care of your objections?
Also, Artyom's Boost.Locale does very sophisticated encoding conversion but
the unicode conversions done by utf*_t look (scarily?) small. Do they do
as good a job or should these classes make use of the conversions in
Boost.Locale?
Alex
-- Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk