Boost logo

Boost :

Subject: Re: [boost] [UTF String] UTF String library 1.5 ready for perusal
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-02-12 15:27:58


On Sat, 12 Feb 2011 20:19:09 +0100
Matus Chochlik <chochlik_at_[hidden]> wrote:

>>> What do you use string.length() for? :-) Efficiently providing an
>>> answer to that is one of several things the UTF string classes keep
>>> track of it for.
>>
>> std::string::length specifies the amount of memory required to
>> represent it as encoded, and is useful if you intend to pass it to
>> something else as a char array, length pair.  Given that number of
>> code points is directly related to neither the memory required nor
>> the number of logical characters/glyphs/size it will take up to
>> display, it seems it is unlikely to be useful in many cases. [...]
>
> How about size() returning the required storage size for the string as
> in number of bytes and length() returning the number of code points?

Wouldn't that confuse any STL algorithm that uses the number of
elements? Anything that cares about the number of elements seems to use
size() to retrieve it, since length() is only provided by strings.

In any case, both measurements are easily available already. T.length()
(or T.size()) gives the length in code-points, i.e. the size it would
be as a UTF-32 string. T.coded() exposes the underlying encoded type,
so T.coded().length() gives the amount of memory needed for the encoded
data.

> length() could be used for example when allocating an array of
> code-points (char32_t) where the string could be 'expanded' from
> UTF-8 for algorithms that require true random-access.

True, though the utf32_t type makes that unnecessary most of the time.

-- 
Chad Nelson
Oak Circle Software, Inc.
*
*
*



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk