Boost logo

Boost :

From: Peter Koch Larsen (peter.koch.larsen_at_[hidden])
Date: 2019-12-04 20:50:33


On Wed, Dec 4, 2019 at 7:10 PM Andrey Semashev via Boost
<boost_at_[hidden]> wrote:
>
> On 2019-12-04 19:05, Peter Dimov via Boost wrote:
> > Phil Endecott wrote:
> >> Peter Dimov <pdimov_at_[hidden]> wrote:
> >
> >> > Storing the size (as capacity - size) in the last char for N < 256
> >> will > have more impact, but I'm not sure that it too is worth the
> >> added > complexity.
> >>
> >> Why the last char, rather than always having the size (of whatever
> >> appropriate type) first? Is the idea that this makes data() and
> >> c_str() essentially no-ops?
> >
> > The idea here is that you win one byte by reusing the last byte of the
> > storage as the size, overlapping it with the null terminator in the
> > size() == N case (because capacity - size becomes 0).
>
> I'm not sure this would actually be beneficial in terms if performance.
> Ignoring the fact that size() becomes more expensive, and this is a
> relatively often used function, you also have to access the tail of the
> storage, which is likely on a different cache line than the beginning of
> the string. It is more likely that the user will want to process the
> string in the forward direction, possibly not until the end (think
> comparison operators, copy/assignment, for instance). If the string is
> not close to full capacity, you would only fetch the tail cache line to**
> get the string size.
>
> It is for this reason placing any auxiliary members like size is
> preferable before the storage array. Of course, if you prefer memory
> size over speed, placing size in the last byte is preferable.

There is another reason to place the size (or rather the free size) at
the end of the data: C-compatibility. I have a similar class named
fc_string where I for a fixed_string of N chars use one extra char for
the free space. This character steps in and becomes the
null-terminator in case the string is full, so I use no extra space in
any case. If (for char-based arrays) more than 255 bytes are used, I
store 255 in the end and the free size in the characters below the
last one.
This has the advantage that fixed_string<N> becomes a drop-in
replacement for char[N+1] anytime the string is not required to ne
normalised while still having an O(1) size.
I doubt that cacheing does matter (much) for performance. In practice,
fixed_strings are not large, and if they are the time to access the
individual characters will probably dominate anyway. If there are many
small strings, they will typically be stored in an array, and
accessing the size of a string will load the beginning of the next
string in cache.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk