Boost logo

Boost :

Subject: Re: [boost] GIL io_new review
From: Domagoj Saric (domagoj.saric_at_[hidden])
Date: 2010-12-09 07:09:33


"Phil Endecott" <spam_from_boost_dev_at_[hidden]> wrote in message
news:1291823931996_at_dmwebmail.dmwebmail.chezphil.org...

> I'm doing this on Linux. Wikipedia tells me WIC is "Windows Imaging
> Component".

Yes, that's why I asked whether you can test this on Windows (as the LibTIFF
wrapper is not ready for it, 2D ROI access, yet)...

>> Example code demonstrating a possible (skeleton) solution:
>> http://codepad.org/WD7CpIJ8 ...
>
> I don't really follow what that code is doing,

Hmm...if I understood your initial example/use case correctly you have a
huge TIFF comprised of 5k-x-5k tiles that you need to load, edit and then
rechop into 256x256 PNG files...
So the example code assumes 24 bit RGB input (for the sake of an example)
and:
 - allocates the input_tile_holder ( 5000x5000x24bit ~ 75 MB)
 - creates a WIC reader object for the large input TIFF
 - allocates the output_tile_holder (256x256x24bit)
 - reads individual input tiles in a loop (but incorrectly, as it reads only
diagonal tiles)
 - writes output files within the above loop...

Perhaps the syntax for specifying ROI based access is confusing (e.g.
"offset_view()" marks the source view/creates from it a ROI target which is
not readily obvious from the name of the utility function...I'll have to
rename it)...

> and it's not obvious to me what its memory footprint will be.

Well, with io2, it all depends on the backend...if the backend is 'smart
enough' to read in only the required parts of an image (and WIC
theoretically/as per documentation should be) the footprint should be
obvious as io2 (when the target is not a virtual image but a raw in memory
image as in your case) does not allocate any memory (nor it performs any
additional/redundant data copying...in general it strives for zero overhead,
both for CPU and RAM)...so the only allocated memory is that allocated by
the user (in this case, input_tile_holder and output_tile_holder) and by
the backend...

> In contrast, I think I can write something that's not much longer and more
> obviously correct by using the sort of simple wrappers around the
> libraries that I have been proposing and explicitly managing the tiled
> input and output:
>
...snipped code...
>
> Anyone can look at that and see that it keeps 1400 input files open (bad)
> and needs 700 kwords of buffer memory (good). If you wanted to have fewer
> files open you could code explicitly something else that had one input
> file open (good) and buffered complete tiles (probably 2 x 5000 x 700,000
> = 7 GBytes) (bad).

This example seems different from the one you gave in the first post (or I
misunderstood both)...Now you seem to have an image that is (1400*5000)
pixels wide and 1300000 pixels tall and that is not actually a single file
but the 5k-x-5k tiles preseparated into individual files...and it misses the
'editing' logic and saving to 256x256 PNGs...As I don't see what it is
actually trying to do with the input data I cannot know whether you actually
need to load entire rows of tiles (the 1400 files) but doesn't such an
approach defeat the purpose of tiles in the first place?
I can only, as a side note, say that the shared_ptrs are an overkill...in
fact, if the number 1400 is really fixed/known at compile-time the
individual ReadTiff heap allocations are unnecessary...

>> ps. unfortunately I do not have access to such a huge image to test
>> whether WIC can actually handle such monster images...
>
> If you're interested in experimenting, I suggest making random input tiles
> or just replicating them.

Luckily I found this
http://www.unearthedoutdoors.net/global_data/true_marble/download :)

If you want, I can take the largest TIFF there and chop it out into 256x256
PNGs and measure the RAM and CPU time usage (or do some other test if you
can define it clearly)...

>> If we've already waited so long, why rush if some of us agree that there
>> are still things that need 'polishing'...
>
> To get the ball moving on GIL extensions, and because this is better than
> what GIL currently has. Past experience suggests that the existence of
> this io extension will not prevent other similar and incompatible things
> (i.e. yours) from being accepted in the future.

Can I ask to which librar(y/ies) does your past experience refer to, as to
me the practice seems quite the opposite (e.g. even after years of
complaints and different proposals Boost.Function still has not changed, or
the fact that we have two signals and two regex libraries)...

>> You said you already did some LibXXX wrappers of your own...if this is
>> so...why not join the effort even if only temporary to make sure you get
>> what you want...Christian also seems open for cooperation...
>
> My wrappers have all been written to implement some subset of the
> functionality that I needed at the time. For example I have never
> implemented any of the error handling stuff, and in some cases I have only
> read or write but not both.

That's what I meant, you could look at the proposed code and, if it does not
suit you, propose a change/patch...

-- 
"What Huxley teaches is that in the age of advanced technology, spiritual
devastation is more likely to come from an enemy with a smiling face than
from one whose countenance exudes suspicion and hate."
Neil Postman 

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk