Boost logo

Boost Users :

Subject: [Boost-users] [iostreams] Possible bug in gzip_decompressor?
From: Gavin Band (gavinband_at_[hidden])
Date: 2012-05-17 09:57:24


Dear list,

I have been using boost.iostreams to read gzipped files for some time, but
recently have run into some problems with files that are compressed using
tools other than gzip (namely bgzip from
http://samtools.svn.sourceforge.net/viewvc/samtools/trunk/samtools/).

Here is the file:

$ hexdump -C ../hello.txt.bgz
00000000 1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00
 |............BC..|
00000010 35 00 f3 48 cd c9 c9 d7 51 28 c9 c8 2c 56 00 a2
 |5..H....Q(..,V..|
00000020 44 85 92 d4 e2 12 85 b4 cc 9c 54 3d 2e 00 86 1e
 |D.........T=....|
00000030 ef a4 1c 00 00 00 1f 8b 08 04 00 00 00 00 00 ff
 |................|
00000040 06 00 42 43 02 00 1b 00 03 00 00 00 00 00 00 00
 |..BC............|
00000050 00 00 |..|
00000052

I think this is valid gzip format (based on reading
http://www.gzip.org/zlib/rfc-gzip.html). It differs from what you get by
compressing the file using the gzip program in two respects: 1. it uses the
'extra' flag (FLG.EXTRA) to add extra data to each block header, and 2. it
has two blocks, the second of which is empty.

My file reading code is as follows:

/* test.cpp */
#include <iostream>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/device/file.hpp>

int main( int, char** ) {
boost::iostreams::filtering_istream stream ;
 stream.push( boost::iostreams::gzip_decompressor() ) ;
boost::iostreams::file_source file( argv[1] ) ;
 stream.push( file ) ;
boost::iostreams::copy( stream, std::cout ) ;
}

I am using boost 1.48.0 on Ubuntu linux. The compilation is

$ g++ -o test -g test.cpp -L /home/gav/Projects/Software/usr/lib
-lboost_iostreams-gcc46-mt-1_48 -I /home/gav/Projects/Software/boost_1_48_0

Using gunzip on the above file works fine:
$ gunzip -c ../hello.txt.bgz
Hello, this is a test file.

Using the test program does not:
$ ./test ../hello.txt.bgz
terminate called after throwing an instance of
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::iostreams::gzip_error>
>'
  what(): gzip error

I was able to fix this problem with two code changes in the boost.iostreams
implementation, as follows.

*Change #1*: at libs/iostreams/src/gzip.cpp:65: change "state_ = s_extra;"
to "state_ = s_xlen;"
Rationale: without this change, it looks like state s_xlen is never entered
so the length of the extra data is not parsed. *Update:* I notice this
change was already raised in ticket #5908, and has already been fixed in
trunk.

*
*
*Change #2*: at libs/iostreams/src/zlib.cpp:153: change "crc_imp_ = 0;" to
"crc_ = crc_imp_ = 0;"
Rationale: without this change, an empty block of data does not
re-initialise the member variable crc_ of zlib_base. This causes an
exception on line 447 of boost/iostreams/filter/gzip.hpp.

I confirmed this behaviour a second way, by concatenating two files
compressed using gzip, the first nonempty and the second empty. This
creates a file which looks like this:

$ hexdump -C hello2.txt.gz
00000000 1f 8b 08 00 0d ea b4 4f 00 03 cb 48 cd c9 c9 57
 |.......O...H...W|
00000010 28 c9 48 2d 4a e5 02 00 8e 45 d1 59 0c 00 00 00
 |(.H-J....E.Y....|
00000020 1f 8b 08 00 18 ea b4 4f 00 03 03 00 00 00 00 00
 |.......O........|
00000030 00 00 00 00 |....|
00000034

This file again decompresses with gunzip but not with the program above.
 Change #2 above fixes this.

Could someone take a look at this and let me know if this change is really
appropriate? (Or perhaps I'm doing something wrong.) I can create a bug
report if desired.

Many thanks,
Gavin Band.



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net