New subject: [iostreams] experience with automatically decompressing with gzip or bzip2

18 Sep 2005

      Hello, I've recently used iostreams for easier gziping and bzip2ing of 
output streams. However, I'd also like to be able to read any file 
compressed with either gzip or bzip2, by analyzing the input stream and 
deducing which decompressor to use.

I don't want to open the file twice or to seek in the stream, as this 
might not be possible due to e.g. reading from the network. My solution 
is to create a custom streambuf, which is told to read and retain the 
first few bytes to determine which decompressor to push onto the 
boost::iostreams::filtering_streambuf. Then the actual reading can take 
place, and the custom streambuf first returns the retained bytes, then 
just streams the rest of the actual input streambuf.

As far as I can tell, this works nicely. I wonder if anybody else has 
better solutions for this? Perhaps there is come capability of iostreams 
that I've overlooked?

Below is the code, feel free to use as you wish. I don't know if I need 
to override xsgetc(), underflow() and uflow(), so they are currently 
just non-working stubs. So far nothing seems to have triggered any of 
them: only xsgetn seems to be used. I haven't found any actual errors, 
though. Both gzip and bzip2 files seem to decompress.

Otherwise the code can be improved a lot, e.g. by registering separate 
compression detectors instead of hard-coding them in the streambuf. I 
think I'm also lost a bit where and when to use streambufs instead of 
streams, e.g. m_input.

Apologies if the code is too long for your mailing-list, I didn't find 
any guidelines for this.

Regards,
Marcus

#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/bzip2.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <iostream>

#include <boost/noncopyable.hpp>

class general_decompressor_streambuf
   : public std::basic_streambuf<char, std::char_traits<char> >,
     public boost::noncopyable {
private:
   std::streambuf& m_input;
   static const int BUFFER_SIZE = 5;
   unsigned char m_read[BUFFER_SIZE];
   std::streamsize m_readpos;
   std::string m_compression_type;
public:
   general_decompressor_streambuf(std::streambuf& i)
     : m_input(i), m_readpos(0) {
     ;
   }
   ~general_decompressor_streambuf() throw () {
     ;
   }
   std::string get_compression_type() const {
     return m_compression_type;
   }
   void resolve_compressor
   (boost::iostreams::filtering_streambuf<boost::iostreams::input>& sb) {
     int pos = 0;
     while (pos < BUFFER_SIZE) {
       int c = m_input.sbumpc();
       if (c == EOF) return;
       m_read[pos++] = c;
     }

     if (m_read[0] == 037 && m_read[1] == 0213) {
       m_compression_type = "GZIP";
       sb.push( boost::iostreams::gzip_decompressor() );
     } else if (m_read[0] == 'B' && m_read[1] == 'Z' && m_read[2] == 'h') {
       m_compression_type = "BZIP2";
       sb.push( boost::iostreams::bzip2_decompressor() );
     } else {
       ;
     }
   }

   std::streamsize xsgetn(char* s, std::streamsize n) {
     std::streamsize cnt = 0;
     if (m_readpos < BUFFER_SIZE) {
       while (m_readpos < BUFFER_SIZE && n > 0) {
         unsigned char ch = m_read[m_readpos++];
         *s++ = ch;
         ++cnt;
       }
       if (cnt == n) return cnt;
     }
     std::streamsize ss = m_input.sgetn(s, n - cnt);
     ss += cnt;
     return ss;
   }
   int xsgetc() {
     std::cerr << "xsgetc" << std::endl;
     return m_input.sgetc();
   }

   int underflow ( ) {
     std::cerr << "underflow" << std::endl;
     return m_input.sgetc();
   }

   int uflow ( ) {
     std::cerr << "uflow" << std::endl;
     return m_input.sgetc();
   }
};

int main() {
   general_decompressor_streambuf
      buffering_in_streambuf(*std::cin.rdbuf());

   boost::iostreams::filtering_streambuf<boost::iostreams::input> cmpr;
   buffering_in_streambuf.resolve_compressor(cmpr);
   cmpr.push(buffering_in_streambuf);

   std::istream i(&cmpr);
   boost::iostreams::copy(i, std::cout);
   return 0;
}

[iostreams] experience with automatically decompressing with gzip or bzip2

Marcus Alanen

Jonathan Turkanis

Jonathan Turkanis

Marcus Alanen

Jonathan Turkanis

tags

participants (2)