Boost logo

Boost :

Subject: Re: [boost] [string] Yet another Unicode string class
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2011-02-15 05:55:49


On 15/02/2011 00:12, Howard Hinnant wrote:
> On Feb 14, 2011, at 5:55 PM, John Bytheway wrote:
>
>> On 14/02/11 19:08, Scott McMurray wrote:
>>> On Mon, Feb 14, 2011 at 01:53, Mathias Gaunard
>>> <mathias.gaunard_at_[hidden]> wrote:
>>>>
>>>> SBO makes moving a string costly.
>>>>
>>>
>>> How bug a buffer does SBO usually use?
>>
>> The libc++ string uses 23 bytes (on a 64-bit architecture) by squishing
>> the length into a single byte for short strings.
>>
>> I don't see how that makes moves costly, though. The move constructor
>> for this string simply copies the bytes and zeroes out the source; it
>> doesn't even need to branch based on whether it's a long or short
>> string. Perhaps the concern is that such techniques are not defined
>> behaviour? Or that the use of unions might confuse the optimizer?
>
> It isn't so much the size of the buffer that matters, but rather the total number of words that need to be copied or otherwise manipulated (e.g. zeroed). The libc++ string is 3 words (on 32 and 64 bit) and the cost of the move is proportional to 3. If the libc++ string needed to move (for example) 6 words, then its move constructor would be twice as expensive, though still cheap compared to a copy of a long string.

I didn't know it was just 3 words, that's still very cheap.
Using memcpy doesn't break the strict aliasing guarantee, so that's legal.

It could be even cheaper if it could be copied using SIMD instructions,
but then that would put restrictions on the alignment of string, which
one might not want.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk