Boost logo

Boost :

Subject: Re: [boost] Boost.Locale (was Re: [SQL-Connectivity] Is Boost interested in CppDB?)
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-12-14 19:16:04


On 15/12/2010 00:28, Joel de Guzman wrote:
> On 12/15/2010 7:13 AM, Mathias Gaunard wrote:
>> On 14/12/2010 22:05, Edward Diener wrote:
>>> On 12/14/2010 2:27 PM, Mathias Gaunard wrote:
>>>> On 14/12/2010 16:08, Eric Niebler wrote:
>>>>> On 12/14/2010 9:53 AM, Dean Michael Berris wrote:
>>>>>> +1 -- if there was a library that did easy conversion from
>>>>>> std::wstring (usually the default in Windows now) to proper UTF-8
>>>>>> encoded std::string in Boost that would be *awesome*. I can totally
>>>>>> use that library in cpp-netlib too. ;)
>>>>>
>>>>> Please, no. std::string is not an appropriate holder for a UTF-8
>>>>> string.
>>>>> It encourages random-access mutation of any byte in a UTF-8 sequence,
>>>>> pretty much guaranteeing data corruption.
>>>>>
>>>>
>>>> It is, however, an appropriate holder for the *data* of a UTF-8 string.
>>>

<snip stuff that seems irrelevant to the message>

>
> UTF-8 is variable length encoded (so is UTF-16). basic_string
> and string are unsuitable for any variable length encoded data,
> as Eric pointed out.

What I said is that basic_string<char> is perfectly suitable as a
container to store UTF-8 data; it is not, however, very suitable to do
text processing with (albeit UTF-8 has sufficiently nice properties to
make it more or less passable).

Raw data is different from an abstraction meant to represent text.

My Unicode library does not provide a Unicode string type so as to
decouple the data storage and representation from the abstraction and
semantics of text we want to attach to that data.

This means that it is up to the user to not temper with the data and
make it invalid, and respect the preconditions of the algorithms he
wishes to use.
This could be strongly enforced by using a specific type, but that would
mean strict ownership of the data by said type, copies between external
representations (and they are many between all the different major C++
libraries), and other interoperability problems.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk