Subject: Re: [boost] GIL io_new review
From: Lubomir Bourdev (lbourdev_at_[hidden])
Date: 2010-12-12 18:42:37
On Dec 11, 2010, at 7:18 AM, Domagoj Saric wrote:
>> On the issue of using classes vs free functions, I don't think there is a
>> huge difference either way. The the new_io code borrows the style from
>> GIL, which uses free functions as algorithms (if you call I/O
>> GIL in turn borrows this style from the STL. Using functions one can
>> functionality with overloading (but there are ways to do this with classes
>> and it is nicer to be able to read/write a file with one line, without
>> having to
>> first construct a reader/writer and then call it. But these are minor
> I understand you had limited time to go through all the, now really numerous
> posts, but I think I've shown (in the pre-review discussion) that these are
> not minor quibbles...The free function interface has issues both in terms of
> performance and in terms flexibility/power of access to the backend that it
> gives...So far shown attempts to fix those where all shifts towards public
> classes/objects representing the image to be read/written (and/or further
> increase in the number of parameters)...
But free functions are just like methods except they have an extra hidden parameter pointing to the class instance.
I think the discussion here is not free functions vs classes but whether or not to maintain a state. Please note that both you and Christian use classes to maintain the state; Christian simply wraps the classes into free functions for simple things like read and write an entire image. But to do something more complicated, such as read the image in parts, one can simply build that functionality using the classes directly. So not much design changes are necessary. Adding new functionality may be needed but less, if any, changes of the existing functionality.
>> On the issue of performance, I don't think one test with one file format
>> 30% slower proves that there are serious performance issues. My guess is
>> Christian has been focusing on adding functionality and never got around
>> work on performance improvements, which usually means that we can get a
>> lot of speedup by doing some simple optimizations in the inner loops.
> Actually there were two file formats (JPEG and PNG) and the speed difference
> was ~37.5% (the size difference was ~30%)...In fact simply looking
> at/stepping through the implementation of io and io_new (which I've
> elaborated in more detail previously) is enough to 'see' the inefficiency
> w/o testing but I did the test anyway...
First of all, I don't believe your tests represent real use scenarios. If you loop over the same image, the image gets into the OS cache and therefore the I/O access bottleneck is no longer there. It is not typically the case that the image is in the OS cache before you start reading it. A better test case is to have many images and do a single loop over them. So I cannot trust the 30% number that you report. My prediction is that if you do it for fresh images the differences will be far smaller than 30%.
The second point is that if the code has never been profiled, there are usually lots of easy ways to speed it up without major changes.
> I simply cannot follow the logic of making a priori slow software (by
> ignoring obvious inefficiencies) and then making it fast later (unless of
> course for the later selling of some 'clever' corporate marketing fog which
> obviously is not the case here)...
> There would be no need for serious performance improvements if it weren't
> for the, unfortunately so prevalent, 'habit'/'school of coding' that simply
> by default uses STL containers/constructs for just about anything without
> even blinking an eye...
> And the injudicious use of STL containers coupled with the inherent template
> bloat are the major reasons why io_new (and the current io) are slow (not
> counting the issues inherent in the interface/design)...
These are some strong statements here, deserving a separate thread that I am sure many Boosters will have a lot to say about. Here are my two cents:
1. Using STL, or templates in general does not always lead to template bloat. It simply makes it easier to generate bloated code, sometimes inadvertently, sometimes on purpose.
2. Template bloat does not always mean that the code will run slowly. In fact, it often means the code will run faster because certain choices are done at compile time instead of at run time.
3. STL is a very well designed library. That said, there are some STL implementations that are really very bad, so you have to be careful which one you use. In addition, it is very easy to shoot yourself in the foot with the STL.
> In this context I actually do not blame Christian almost at all for this but
> the 'climate' that fosters this "Way"...
> It is no wonder that 'we live in the age' of several-megabyte-single-dialog
> software if 'every' library author thinks it OK to add a few tens or even a
> couple of hundreds of unnecessary kilobytes of code because it supposedly
> will not be noticeable because of disk access or some similar 'by-the-book'
> excuse (that is rarely actually tested)...
> ps. Ideally, in most cases there should be no 'inner-loop' in GIL.IO (other
> than those within the backend library itself)...
>> On the issue of using/not using streams, I don't have a strong opinion,
>> except that stream support for I/O was one of the highest requested
>> functionalities in GIL's review. Domagoj, you should have been more
>> vocal at that time. Christian simply responds to the wishlist of the
>> Boost community. In general, I feel that performance might be a bit
>> overhyped in recent discussions. The image I/O operations are often
>> dominated by disk access and by compression, etc. I don't think
>> avoiding a copy of the pixels would make much of a difference in the
>> majority of uses. That said, we do want to make sure GIL I/O is as fast
>> as it can be.
> There are assumptions in this which you must/should test before you start
> believing in them...I have already done numerous tests and analysis (not
> only for GIL.IO) that prove those assumptions wrong...
Again, I believe the tests you have reported must be redone by reading 1000 DIFFERENT images, not the same image.
> That iostreams are evil is a simple _fact_ demonstrated countless times by
> numerous people...And it seems there is no longer any point in arguing this
> fact as it is simply ignored by people that seem to have some sort of a
> priori trust in the STL (and you can't rationally argue with 'blind faith')
> as if it is some 'holy artefact' of the C++ language when in fact it is
> sometimes its stumbling block...
I never said anything in support or against streams. Please note that streams are not part of the STL.
Here is a handy reference for the STL: http://www.sgi.com/tech/stl
> With that said, I have no problem with providing an iostream interface (as
> an additional option, that has no influence on other code) only with
> providing _only_ an iostream interface (which AFAICT is currently the case
> for in memory images...luckily not for all of io_new)...
And this is what Christian has done. If your input is not a stream, the code doesn't use streams; it operates straight to FILE*. But if you start with a stream, then it uses a stream. Did I understand the code correctly, Christian? If so, does this address your objection then, Domagoj?
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk