Boost logo

Boost :

Subject: Re: [boost] Heads up - string_ref landing
From: Maxim Yanchenko (maximyanchenko_at_[hidden])
Date: 2012-11-16 13:12:54


Nevin Liber <nevin <at> eviloverlord.com> writes:

> > Those are high-performance constructs. We can only pray that a compiler
> > will be smart enough to convert our iterator-based code to
> > memcpy/memcmp/memset, and from my experience compilers are not nearly as
> > smart if it's slightly beyond trivial cases.
> >
>
> Now we are getting somewhere. Actual experience. Could you elaborate on
> the compilers and constructs that need to be hand optimized into equivalent
> code because the optimizers aren't doing it themselves? Or are there
> better constructs with size that aren't equivalent to their pointer
> counterparts?

All memcpy/memcmp/memset functions require ptr+size to be passed.
So we either compute the size manually every time from begin-end pointers (it's
really nothing comparing to mem* functions execution time) or carry it on board.

So for this particular set of use cases I believe it doesn't matter if it's pair
of pointers or pointer and size - mem* functions will run order of magnitude
longer anyway.
And here I'd prefer having 2 pointers as it's conceptually cleaner as (again)
char_range is essentially just an iterator_range<char*>.

But there are other operations, e.g. sub_string (sub_range in our case) accepts
2 indexes and the second one can be anything up to std::string::npos (see
std::string interface) meaning "to the end". So you need size to calculate the
result and to avoid jumping beyond the range. As all operations here are very
simple, eliminating computation of ptr difference can give some extra speed.
This might make some difference in parsers when you have a big input string and
then all lexemes and thousands of references from AST to corresponding text
ranges are just char_ranges (subranges) pointing into the big string.
These are all my speculations, I don't have performance figures of ptr+ptr vs.
ptr+size (I measured it one or two years ago in our project (we use ptr+ptr, and
I considered switching to ptr+size), and probably didn't notice any observable
difference as I didn't finally switch - I don't remember any details already.
And yes, we explicitly used mem* functions as we weren't satisfied with the code
GCC generated).
I hope LLVM people (authors of the original proposal) could share their
experience as well.

Thanks,
Maxim


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk