Subject: Re: [boost] GIL io_new review
From: Christian Henning (chhenning_at_[hidden])
Date: 2010-12-11 11:55:12
Hi Lubomir, thanks for reviewing.
> Christian has been (and still is) very passionate to make the next generation I/O. He started working on this at least four years ago and have been improving and polishing the code ever since. The problem with I/O is that you can never declare success. There are an unlimited number of file formats, back-ends, ways to read and write the image, image representations, compression settings, etc. For each of these one needs to deal with completeness, performance, test cases, failure modes, platform dependencies... All we can hope for is push the boundary as much as we can, and leave the rest for the next update.
Has it really been four years? ;-)
> The design is fine, but I would like the requirements for adding new file formats / backends to be made more explicit. Specifically we want a concept for a file format reader/writer/etc and adding new file formats would constitute providing models for these concepts.
I have to admit I'm a bit weak with concepts. I like the general idea
but never really done anything in this regard. I hope the boost
community can help out.
> The documentation is adequate, but requires cleanup and fixing of lots of typos.
On my list. I just have to find a lost "native English" soul that has
some time at hand. But I'll try my best myself.
> So my recommendation is "conditional accept" after the above identified must-have functionality has been added, specifically:
> - Support for reading images whose file format is unknown at compile time. That involves the ability to register file formats / backends, associate them with given extensions and then call the appropriate reader based on the extension of the filename.
On my list.
> - Support for different backends. It appears that this can be done simply the same way a file format can be extended, i.e. have "png_libpng" and "png_lodepng" tags. It is not immediately obvious to me why this would be a bad design choice; if someone disagrees they should elaborate.
I would like to understand Domagoj design first to make a better decision.
> - Support for efficient reading/writing of parts of images that are too large to fit in memory. There are two ways to proceed: one is to have an input/output iterator that enforces sequential access, and another is to allow for user-driven access at the cost of sometimes severe performance. I'd like to hear the opinion of people who would use this more advanced options.
Me too, I like to hear what other people think. I tend more towards
the iterators. It's easier to use and to test for it.
> My preference is that Christian joins forces with Domagoj to collaborate on these and take the best of their libraries to gil.io. Looks like Domagoj also spent a lot of time and effort on this problem and has gained a lot of valuable intuition. I don't want to see his efforts go to waste either.
We already have discussed some mutual involvement. Just waiting for
the review to finish.
> - Improve the language in the documentation section.
Not surprised here. ;-)
> - Make sure the test images are license-free. If the license status is unknown, change them to other license-free images.
I have been very careful in selecting the images. All authors have
been informed and they either given their consent or didn't bother
> - Do some profiling and performance optimization of the most common formats. Add some test cases to measure speed of common scenarios.
On my list.
> - Make sure the test cases provide complete coverage. For example, set a breakpoint in every method, every switch statement and branch. Run your tests and remove any breakpoints that are encountered. Look at the breakpoints that were not hit and add tests for them.
cool technique. Will keep it in mind.
> Finally, here is an idea that I like, but may be too much work to do in the current release:
> I would like to have a GIL virtual image view associated with a given file. That would allow reading/writing to simply be copy_pixels, and reading into a different format to be copy_and_convert_pixels. Reading a given ROI could simply be done by creating a subimage_view. The cool thing about this is one could in theory make changes to images
> without even loading them in memory. The caviat is that doing this right and efficiently is very hard, requires clever caching, plus a wrong access pattern can result in horribly bad performance.
Thanks again for your time.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk