Boost logo

Boost Users :

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2005-11-07 14:44:52


Tiago de Paula Peixoto wrote:
> On 10/31/2005 04:04 PM, Jonathan Turkanis wrote:
>>>> In the above example, the filter is automatically closed at the
>>>> end of main; this causes the gzip footer to be written. But since
>>>> no data was ever compressed, the gzip header has never been
>>>> written.
>>>>
>>>> I guess this is a bug of some sort. What behavior would you
>>>> expect in this case? It seems to me it would make the most sense
>>>> to output data in the gzip format representing a 0-length file.
>>>>
>>>
>>> That would also make sense to me, but it would be inconsistent with
>>> the bzip2_compressor behavior, which doesn't write any footer if
>>> there was no header.
>>
>>
>> I can't really change the behavior of bzip2, since it's just a
>> wrapper around libbz2, whereas with gzip I implemented the header
>> and footers myself. I wouldn't worry too much about consistency,
>> since this is a corner case.
>
> Well, anyway is fine for me personally, as long as the resulting file
> is a valid gzip/bzip2 file (which isn't the case with gzip in 1.33.0).
> Although, strictly speaking, a zero-length file isn't either a gzip
> nor a bzip2 file, most people will be able to cope with it nevertheless.
> So I don't feel strongly about it either way.
>
> But people still may expect (as I did) that changing between
> gzip_compressor and bzip2_compressor would maintain this same
> invariant. So I would prefer having both writing nothing to the
> stream in this
> case, than having them behaving differently (since bzip2 can't be
> changed easily).

Since I can't easily produce similar behavior for all the compression filters,
maybe I should specify in the docs that the output of the compression filters is
well-defined only if some data is written.

> Would you find it too ugly/wrong to modify gzip_compressor to delay
> the writing of the header until some data would be sent?

It's easy to do (when rephrased ;-) ), but I'm not sure it makes that much
sense. If you're just compressing and decompressing, it's easy to treat the case
of an empty file specially. But if you have a long chain of filters with a
compressor or decompressor in the middle, thing could get messy.

>>> And also it would create an impossibility of just visiting a file in
>>> append mode, without writing any data to it.
>>
>>
>> I don't follow. What do you want to be able to do?
>
> Well, suppose a program keeps a log file which is gzipped. Every time
> the program runs, and opens the log file in append mode, some data
> gets written to the file, even if the program exits without logging
> any information, which would make the file grow continuously, albeit
> slowly. Of course, the obvious workaround would be to delay the
> opening of the logfile until there's some data to be written. But
> that may be less convenient and/or intuitive.

This sounds difficult to implement, since when you open the log for appending
you have to find a way to restore the compressor to the state it was in when it
finished compressing the existing data. The only way I know how to do this would
be to decompress the data, then compress it again.

>>> This could be fixed if
>>> gzip_compressor were seekable. Is this possible to be implemented?
>>
>>
>> The only way I can see to implement this would be to buffer all i/o
>> and only compress or decompress it when the stream is closed. This
>> could be implemented as an adapter that would work with almost any
>> filter, so I wouldn't want to build it into gzip. I'll put this on
>> my list of possibilities for 1.34.
>
> So the entire uncompressed file would be in memory? Doesn't the
> gzip/bzip2 interface provide a more efficient alternative? Not even to
> seek only forward?

The zlib API docs are here: http://www.gzip.org/zlib/manual.html. If you can see
a way to this I'll definitely consider it.

-- 
Jonathan Turkanis
www.kangaroologic.com

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net