Boost logo

Boost :

From: Hervé Brönnimann (hervebronnimann_at_[hidden])
Date: 2007-12-04 05:18:09


I'd like to go back to Chuck's original posting:

> Inspection of the code reveals the following is happening (both in
> 1.34 and HEAD):
> text_iarchive_impl<Archive>::load(std::string &s)
> {
> std::size_t size;
> * this->This() >> size;
> // skip separating space
> is.get();
> // borland de-allocator fixup
> #if BOOST_WORKAROUND(_RWSTD_VER, BOOST_TESTED_AT(20101))
> if(NULL != s.data())
> #endif
> s.resize(size);
> is.read(const_cast<char *>(s.data()), size); // <== This looks
> suspect
> }

Inspection of the g++ string implementation (latest 4.1.x) shows that
resize() will *almost* always create a new rep and thus a new data()
for the string, if it is shared. The only exception is when the
requested size == s.size(). Chuck, was this the bug that hit you?

Seems to me, since you are going to rewrite your string anyway, you
might as well clear() it first, then resize(size). That'll take care
of creating a new representation (either clear, or resize if the
string was already empty). The problem with that is that it will
write over the string twice, first filling it with 0s then with the
read data. But think about it: if the string is shared, you need to
construct a new representation with the requested size anyway, and it
will have to be filled with 0s (or with a copy of what's already in
the string if you decide to resize to a different size instead of
clearing).

Still, suppose you wanted to avoid going over your string twice, what
you'd need is to create a string with an unitialized buffer and start
writing into it directly. There's only one way I can think of to
achieve that: have an input iterator from your "is" stream, such as
std::istream_iterator<char>(is), and use s.assign(is_begin, is_end),
assuming you can define an end iterator that will stop reading after
size chars (don't even know how to do that with istream_iterator, and
there is no assign(int, InputIterator) method in string, sadly). In
any case, however, you then lose the benefit of the single is.read()
call, because the iterator is going to do ++iter and read character
by character. If your stream is buffered, there shouldn't much of an
impact, but then you have a double copy from stream buffer to string
buffer anyway, which is akin to going over your string twice. What
would be nice would be to create a new buffer and read directly into
it, then transfer ownership to the string... Can't do that with
std::string though.

In short, I don't think you can avoid the cost of going over your
string twice. The problem with clear() in gcc 4.x's implementation
is that it creates a new rep with the same capacity as the old string
(), at least if I've read the code correctly. With a shared string
implementation, it is cheaper to assign the empty string, and that
works with any other implementation. In short, I've argued that you
can't do much better than:

text_iarchive_impl<Archive>::load(std::string &s)
{
     std::size_t size;
     * this->This() >> size;
     // skip separating space
     is.get();
     // borland de-allocator fixup
     #if BOOST_WORKAROUND(_RWSTD_VER, BOOST_TESTED_AT(20101))
     if(NULL != s.data())
     #endif
+ s = std::string();
      s.resize(size);
     is.read(const_cast<char *>(s.data()), size); // <== This looks
suspect
}

Compared to Corrado's implementation, it's almost the same, but frees
up resources earlier. You may actually be able to reuse the same
block of memory for the resize() that you just freed.

HTH,

--
Hervé Brönnimann
hervebronnimann_at_[hidden]

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk