Boost logo

Boost Users :

From: Jerker Öhman (jerker.ohman_at_[hidden])
Date: 2007-01-08 14:33:33


We use iostreams::stream to read from a legacy file format. The code
that reads from the stream looks a little like this:

void ReadStuff(std::istream &stream,
                Stuff &stuff,
                OtherStuff &otherStuff)
{
    StreamDirectory streamDir;

    stream >> streamDir;
    stream.seekg(streamDir.GetOffset(STUFF_TAG));
    stream >> stuff;
    stream.seekg(streamDir.GetOffset(OTHER_STUFF_TAG);
    stream >> otherStuff;
}

I.e. the stream contains a directory that holds information about the
items following the directory and the offsets to them. The stream and
code is also organized so that the calls to seekg in most cases are to
the current position of the stream. In other words, seekg doesn’t have
to do anything.

If the stream is buffered the code runs into serious performance
problems. After a while seekg ends up in this function.

template<typename T, typename Tr, typename Alloc, typename Mode>
typename indirect_streambuf<T, Tr, Alloc, Mode>::pos_type
indirect_streambuf<T, Tr, Alloc, Mode>::seek_impl
     (stream_offset off, BOOST_IOS::seekdir way, BOOST_IOS::openmode which)
{
     if (pptr() != 0)
         this->BOOST_IOSTREAMS_PUBSYNC(); // sync() confuses VisualAge 6.
     if (way == BOOST_IOS::cur && gptr())
         off -= static_cast<off_type>(egptr() - gptr());
     setg(0, 0, 0);
     setp(0, 0);
     return obj().seek(off, way, which, next_);
}

and as far as I can see it just dumps the internal buffer and passes the
call on. After still some calls we end up in the stream source seek
function. This wouldn’t be so bad if it wasn’t for the fact that the
stream source offset isn’t the same as the streams offset since the
stream is buffered.

Perhaps a little example would clarify this. Assume that we read 4 bytes
from a stream with a 10k buffer. The stream will then fill its buffer
from the underlying source which means that after the read, the stream
will have a full 10k buffer and an offset into the buffer that is 4. The
underlying stream source will have an offset pointer that points to
10k+1 bytes into the underlying stream. If we now call seekg to
position the file to offset 4 (which already is the current position),
the stream throws away it’s buffer and we end up in the stream source
who’s file position is 10k+1 so it also throws away it’s internal
buffers and seeks back in the underlying stream to offset 4. In this
way, what should have been a NULL operation turns into something very
time-consuming.

Previously we used a class that inherited directly from
std::basic_streambuff that contained horrible code that no one really
understood so switching to boost::iostreams was a blessing.
Unfortunately the boost::iostreams implementation is 10 times slower
when it is buffered and 50% slower when it is unbuffered. When reading
files that are a couple of GB that really matters.

/Jerker


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net