Boost logo

Boost :

From: Kirit Sælensminde (kirit.saelensminde_at_[hidden])
Date: 2007-09-28 03:45:32


Jeremy Maitin-Shepard wrote:
> "James Porter" <porterj_at_[hidden]> writes:
>
>> I see what you mean. Still, fixed-width-encoded strings are a lot easier to
>> code, and I think we should focus on them first just to get something
>> working and to have a platform to test code conversion on, which in my
>> opinion is the most important part.
>
> I think as others have said, in practice a fixed-width encoding really
> gains you very little or nothing at all. Needing random access to code
> points is, I think, an extremely rare operation. Replacing one code
> point with another code point is also likewise a rare operation; in
> general you would replace one substring (perhaps a grapheme cluster)
> with another substring (which may also be a grapheme cluster).

On our implementation we store both the UTF-32 and UTF-16 length
(internally use UTF-16) in the string object. For the vast majority of
strings these lengths are the same. This optimisation tripled the speed
of our RSS feed generation as it does do lot of replacing as it needs to
HTML encode <, > and &.

For long strings it is almost always a win to go through and calculate
the final length you will get after the substitutions so that you can do
a single pass with a single allocation to generate the target string.

K


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk