Boost logo

Boost Users :

From: Marcus Alanen (maalanen_at_[hidden])
Date: 2005-09-18 04:41:17


Hello, I've recently used iostreams for easier gziping and bzip2ing of
output streams. However, I'd also like to be able to read any file
compressed with either gzip or bzip2, by analyzing the input stream and
deducing which decompressor to use.

I don't want to open the file twice or to seek in the stream, as this
might not be possible due to e.g. reading from the network. My solution
is to create a custom streambuf, which is told to read and retain the
first few bytes to determine which decompressor to push onto the
boost::iostreams::filtering_streambuf. Then the actual reading can take
place, and the custom streambuf first returns the retained bytes, then
just streams the rest of the actual input streambuf.

As far as I can tell, this works nicely. I wonder if anybody else has
better solutions for this? Perhaps there is come capability of iostreams
that I've overlooked?

Below is the code, feel free to use as you wish. I don't know if I need
to override xsgetc(), underflow() and uflow(), so they are currently
just non-working stubs. So far nothing seems to have triggered any of
them: only xsgetn seems to be used. I haven't found any actual errors,
though. Both gzip and bzip2 files seem to decompress.

Otherwise the code can be improved a lot, e.g. by registering separate
compression detectors instead of hard-coding them in the streambuf. I
think I'm also lost a bit where and when to use streambufs instead of
streams, e.g. m_input.

Apologies if the code is too long for your mailing-list, I didn't find
any guidelines for this.

Regards,
Marcus

#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/bzip2.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <iostream>

#include <boost/noncopyable.hpp>

class general_decompressor_streambuf
   : public std::basic_streambuf<char, std::char_traits<char> >,
     public boost::noncopyable {
private:
   std::streambuf& m_input;
   static const int BUFFER_SIZE = 5;
   unsigned char m_read[BUFFER_SIZE];
   std::streamsize m_readpos;
   std::string m_compression_type;
public:
   general_decompressor_streambuf(std::streambuf& i)
     : m_input(i), m_readpos(0) {
     ;
   }
   ~general_decompressor_streambuf() throw () {
     ;
   }
   std::string get_compression_type() const {
     return m_compression_type;
   }
   void resolve_compressor
   (boost::iostreams::filtering_streambuf<boost::iostreams::input>& sb) {
     int pos = 0;
     while (pos < BUFFER_SIZE) {
       int c = m_input.sbumpc();
       if (c == EOF) return;
       m_read[pos++] = c;
     }

     if (m_read[0] == 037 && m_read[1] == 0213) {
       m_compression_type = "GZIP";
       sb.push( boost::iostreams::gzip_decompressor() );
     } else if (m_read[0] == 'B' && m_read[1] == 'Z' && m_read[2] == 'h') {
       m_compression_type = "BZIP2";
       sb.push( boost::iostreams::bzip2_decompressor() );
     } else {
       ;
     }
   }

   std::streamsize xsgetn(char* s, std::streamsize n) {
     std::streamsize cnt = 0;
     if (m_readpos < BUFFER_SIZE) {
       while (m_readpos < BUFFER_SIZE && n > 0) {
         unsigned char ch = m_read[m_readpos++];
         *s++ = ch;
         ++cnt;
       }
       if (cnt == n) return cnt;
     }
     std::streamsize ss = m_input.sgetn(s, n - cnt);
     ss += cnt;
     return ss;
   }
   int xsgetc() {
     std::cerr << "xsgetc" << std::endl;
     return m_input.sgetc();
   }

   int underflow ( ) {
     std::cerr << "underflow" << std::endl;
     return m_input.sgetc();
   }

   int uflow ( ) {
     std::cerr << "uflow" << std::endl;
     return m_input.sgetc();
   }
};

int main() {
   general_decompressor_streambuf
      buffering_in_streambuf(*std::cin.rdbuf());

   boost::iostreams::filtering_streambuf<boost::iostreams::input> cmpr;
   buffering_in_streambuf.resolve_compressor(cmpr);
   cmpr.push(buffering_in_streambuf);

   std::istream i(&cmpr);
   boost::iostreams::copy(i, std::cout);
   return 0;
}


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net