|
Boost : |
Subject: [boost] IOStreams zlib_decompressor does not check crc for empty string
From: John Wallace (jrw32_at_[hidden])
Date: 2015-06-05 09:44:52
When decompressing an empty file, the zlib_decompressor does not check the
crc, which can cause an exception to be raised on a correctly-formatted
gzip file, especially when the gzip file is created by concatenating
compressed blocks. This comes up especially when working with bgzip files
(see specification at:
http://bioinformatics.oxfordjournals.org/content/27/5/718.full; all bgzip
files end in a special empty block)
I have attached a demonstration file (zlib_test.cpp) that illustrates the
problem. This program simply reads a compressed input line by line from
stdin and outputs to stdout (nearly equivalent to zcat).
Assuming we compile to a.out, the following BASH commands illustrate the
issue:
$ cat <(echo "foo" | gzip -c) <(echo -n "" | gzip) <(echo "bar" | gzip) |
./a.out
foo
Whereas, we expect the following:
$ cat <(echo "foo" | gzip -c) <(echo -n "" | gzip) <(echo "bar" | gzip) |
zcat
foo
bar
I have attached a patch to zlib.cpp (base version 1.51, but I confirmed
that it persists in 1.58) which I believe should resolve the issue by
simply setting the crc to 0 when an empty string is decompressed.
Thanks for your time,
John Wallace
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk