Subject: Re: [boost] [UTF String] UTF String library 1.5 ready for perusal
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-02-12 14:19:09
On Sat, Feb 12, 2011 at 8:00 PM, Jeremy Maitin-Shepard
> On 02/12/2011 05:57 AM, Chad Nelson wrote:
>> On Fri, 11 Feb 2011 20:23:58 -0800
>> Scott McMurray<me22.ca+boost_at_[hidden]> Â wrote:
>>> On Thu, Feb 10, 2011 at 21:41, Chad Nelson
>>> <chad.thecomfychair_at_[hidden]> Â wrote:
>>>>> I understand why it's useful to know how long it is in encoding
>>>>> units, but the number of code points seems quite useless to me.
>>>>> Can you elaborate?
>>>> The size in code-points *is* the size of the string, according to the
>>>> view of the string that the class exposes.
>>> Ok, but what would I actually want to use that for?
>> What do you use string.length() for? :-) Efficiently providing an
>> answer to that is one of several things the UTF string classes keep
>> track of it for.
> std::string::length specifies the amount of memory required to represent it
> as encoded, and is useful if you intend to pass it to something else as a
> char array, length pair. Â Given that number of code points is directly
> related to neither the memory required nor the number of logical
> characters/glyphs/size it will take up to display, it seems it is unlikely
> to be useful in many cases. Â In cases where there is a limit of the maximum
> length of a string, I believe that is almost certainly going to be in terms
> of the encoded length in a particular encoding (i.e.g UTF-8 or UTF-16),
> rather than in code points.
How about size() returning the required storage size for the string as
in number of bytes and length() returning the number of code points?
length() could be used for example when allocating an array of
code-points (char32_t) where the string could be 'expanded' from
UTF-8 for algorithms that require true random-access.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk