<span style="border-collapse:collapse;color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px">Dear list,<div><br></div><div>I have been using boost.iostreams to read gzipped files for some time, but recently have run into some problems with files that are compressed using tools other than gzip (namely bgzip from�<a href="http://samtools.svn.sourceforge.net/viewvc/samtools/trunk/samtools/" style="color:rgb(17,85,204)" target="_blank">http://samtools.svn.sourceforge.net/viewvc/samtools/trunk/samtools/</a>).</div> <div><br></div><div>Here is the file:</div><div><br></div><div><div><font face="'courier new', monospace">$ hexdump -C ../hello.txt.bgz�</font></div><div><font face="'courier new', monospace">00000000 �1f 8b 08 04 00 00 00 00 �00 ff 06 00 42 43 02 00 �|............BC..|</font></div> <div><font face="'courier new', monospace">00000010 �35 00 f3 48 cd c9 c9 d7 �51 28 c9 c8 2c 56 00 a2 �|5..H....Q(..,V..|</font></div><div><font face="'courier new', monospace">00000020 �44 85 92 d4 e2 12 85 b4 �cc 9c 54 3d 2e 00 86 1e �|D.........T=....|</font></div> <div><font face="'courier new', monospace">00000030 �ef a4 1c 00 00 00 1f 8b �08 04 00 00 00 00 00 ff �|................|</font></div><div><font face="'courier new', monospace">00000040 �06 00 42 43 02 00 1b 00 �03 00 00 00 00 00 00 00 �|..BC............|</font></div> <div><font face="'courier new', monospace">00000050 �00 00 � � � � � � � � � � � � � � � � � � � � � � |..|</font></div><div><font face="'courier new', monospace">00000052</font></div></div><div><br></div> <div>I think this is valid gzip format (based on reading�<a href="http://www.gzip.org/zlib/rfc-gzip.html" style="color:rgb(17,85,204)" target="_blank">http://www.gzip.org/zlib/rfc-gzip.html</a>). �It differs from what you get by compressing the file using the gzip program in two respects: 1. it uses the 'extra' flag (FLG.EXTRA) to add extra data to each block header, and 2. it has two blocks, the second of which is empty.</div> <div><br></div><div>My file reading code is as follows:</div><div><br></div><div><div><font face="'courier new', monospace">/* test.cpp */</font></div><div><span style="font-family:'courier new',monospace">#include <iostream></span></div> <div><span style="font-family:'courier new',monospace">#include <boost/iostreams/filtering_stream.hpp></span></div><div><font face="'courier new', monospace">#include <boost/iostreams/filter/gzip.hpp></font></div> <div><font face="'courier new', monospace">#include <boost/iostreams/copy.hpp></font></div><div><font face="'courier new', monospace">#include <boost/iostreams/device/file.hpp></font></div><div> <font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">int main( int, char** ) {</font></div><div><span style="font-family:'courier new',monospace"><span style="white-space:pre-wrap"> </span>boost::iostreams::filtering_istream stream ;</span></div> <div><font face="'courier new', monospace"><span style="white-space:pre-wrap"> </span>stream.push( boost::iostreams::gzip_decompressor() ) ;</font></div><div><font face="'courier new', monospace"><span style="white-space:pre-wrap"> </span>boost::iostreams::file_source file( argv[1] ) ;</font></div> <div><span style="font-family:'courier new',monospace"><span style="white-space:pre-wrap"> </span>stream.push( file ) ;</span></div><div><font face="'courier new', monospace"><span style="white-space:pre-wrap"> </span>boost::iostreams::copy( stream, std::cout ) ;</font></div> <div><span style="font-family:'courier new',monospace">}</span></div></div><div><br></div><div><br></div><div>I am using boost 1.48.0 on Ubuntu linux. �The compilation is</div><div><br></div><div>$�g++ -o test -g test.cpp -L /home/gav/Projects/Software/usr/lib -lboost_iostreams-gcc46-mt-1_48 -I /home/gav/Projects/Software/boost_1_48_0</div> <div><br></div><div>Using gunzip on the above file works fine:</div><div>$ gunzip -c ../hello.txt.bgz�</div><div>Hello, this is a test file.</div><div><br></div><div>Using the test program does not:</div><div><div>$ ./test �../hello.txt.bgz�</div> </div><div><div>terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::iostreams::gzip_error> >'</div><div>� what(): �gzip error</div> </div><div><br></div><div>I was able to fix this problem with two code changes in the boost.iostreams implementation, as follows.</div><div><br></div><div><br></div><div><b>Change #1</b>: at libs/iostreams/src/gzip.cpp:65: change "state_ = s_extra;" to "state_ = s_xlen;"</div> </span><span style="border-collapse:collapse;color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px">Rationale: without this change, it looks like state s_xlen is never entered so the length of the extra data is not parsed. �</span><span style="border-collapse:collapse;color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px"><b>Update:</b>�I notice this change was already raised in ticket #5908, and has already been fixed in trunk.</span><span style="border-collapse:collapse;color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px"><div> <br></div><div><b><br></b></div><div><b>Change #2</b>: at libs/iostreams/src/zlib.cpp:153: change "crc_imp_ = 0;" to "crc_ = crc_imp_ = 0;"</div> </span><span style="border-collapse:collapse;color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px">Rationale: without this change, an empty block of data does not re-initialise the member variable crc_ of zlib_base. �This causes an exception on line 447 of boost/iostreams/filter/gzip.hpp.</span><span style="border-collapse:collapse;color:rgb(34,34,34);font-size:13px"><div style="font-family:arial,sans-serif"> <br></div><div style="font-family:arial,sans-serif">I confirmed this behaviour a second way, by concatenating two files compressed using gzip, the first nonempty and the second empty. �This creates a file which looks like this:</div> <div style="font-family:arial,sans-serif"><br></div><div style="font-family:arial,sans-serif"><div> <div style="font-family:'courier new',monospace"><div>$ hexdump -C hello2.txt.gz�</div><div>00000000 �1f 8b 08 00 0d ea b4 4f �00 03 cb 48 cd c9 c9 57 �|.......O...H...W|</div><div>00000010 �28 c9 48 2d 4a e5 02 00 �8e 45 d1 59 0c 00 00 00 �|(.H-J....E.Y....|</div> <div>00000020 �1f 8b 08 00 18 ea b4 4f �00 03 03 00 00 00 00 00 �|.......O........|</div><div>00000030 �00 00 00 00 � � � � � � � � � � � � � � � � � � � |....|</div><div>00000034</div></div><div style="font-family:'courier new',monospace"> <br></div><div><font face="arial, helvetica, sans-serif">This file again decompresses with gunzip but not with the program above. �Change #2 above fixes this.</font></div></div></div><div><font class="Apple-style-span" face="arial, helvetica, sans-serif"><br> </font></div><div style="font-family:arial,sans-serif"><font face="arial, helvetica, sans-serif"><br></font></div><div style="font-family:arial,sans-serif"><font face="arial, helvetica, sans-serif"><br></font></div><div style="font-family:arial,sans-serif"> <font face="arial, helvetica, sans-serif">Could someone take a look at this and let me know if this change is really appropriate? �(Or perhaps I'm doing something wrong.) �I can create a bug report if desired.</font></div> <div style="font-family:arial,sans-serif"><font face="arial, helvetica, sans-serif"><br></font></div><div style="font-family:arial,sans-serif"><font face="arial, helvetica, sans-serif"><br></font></div><div style="font-family:arial,sans-serif"> <span style="font-family:arial,helvetica,sans-serif">Many thanks,</span></div><div style="font-family:arial,sans-serif"><font face="arial, helvetica, sans-serif">Gavin Band.</font></div> <div style="font-family:arial,sans-serif"><font face="arial, helvetica, sans-serif"><br></font></div></span>