Boost logo

Boost :

Subject: Re: [boost] RFC: interest in Unicode codecs?
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-07-21 09:14:50


Stewart, Robert wrote:
> Sebastian Redl wrote:
>> Phil Endecott wrote:
>> > Eric Niebler wrote:
>> >> I believe that as of C++03, std::string is required to have
>> >> contiguous storage. The null-termination is another thing.
>> >> The only guarantee is that the char* returned by c_str() is
>> >> required to be null terminated. No such guarantee is made
>> >> for the sequence traversed by std::string's iterators.
>> >> Dereferencing the end iterator is verboten.
>> >
>> > Here is an outline of a zero-overhead wrapper for std::string
>> > that I hope guarantees that *end() == 0:
>> >
>> > struct string_with_zero_beyond_end: std::string {
>> > typedef const char* const_iterator;
>> > const_iterator begin() const { return c_str(); }
>> > const_iterator end() const { return c_str() + length(); }
>> > };
>>
>> Not zero-overhead if the string implementation is one (of the
>> non-existent ones) that doesn't store the NUL internally.

Is c_str() allowed to be > O(1) ?

> Worse: you'd be returning iterators to what may well be a
> temporary array of characters. The result of calling c_str()
> need not refer to the internal storage of the string.

People are taking my "code" too literally.

I am just trying to point out that a function for UTF-8 decoding can be
significantly more efficient if it exploits various characteristics of
common types, such as contiguous storage and null termination. I feel
that this is something worth doing.

Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk