Boost logo

Boost :

Subject: Re: [boost] [UTF String] UTF String library 1.5 ready for perusal
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-02-12 15:14:38


On Sat, 12 Feb 2011 11:00:31 -0800
Jeremy Maitin-Shepard <jeremy_at_[hidden]> wrote:

>>>> The size in code-points *is* the size of the string, according to
>>>> the view of the string that the class exposes.
>>>
>>> Ok, but what would I actually want to use that for?
>>
>> What do you use string.length() for? :-) Efficiently providing an
>> answer to that is one of several things the UTF string classes keep
>> track of it for.
>
> std::string::length specifies the amount of memory required to
> represent it as encoded, and is useful if you intend to pass it to
> something else as a char array, length pair. Given that number of
> code points is directly related to neither the memory required nor the
> number of logical characters/glyphs/size it will take up to display,
> it seems it is unlikely to be useful in many cases.

But for those few cases where it *would* be useful, I see no reason not
to provide it. It costs essentially nothing, since the count is
originally provided by the same function that validates the encoded
data when it's put into a UTF type, and is used for other things as
well. And people are used to being able to retrieve the size of a
string, eliminating that function would discomfort some developers.

> In cases where there is a limit of the maximum length of a string, I
> believe that is almost certainly going to be in terms of the encoded
> length in a particular encoding (i.e.g UTF-8 or UTF-16), rather than
> in code points.

Well, that's easily available too, via T.coded().length().

-- 
Chad Nelson
Oak Circle Software, Inc.
*
*
*



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk