Boost logo

Boost :

Subject: Re: [boost] GIL io_new review
From: Domagoj Saric (dsaritz_at_[hidden])
Date: 2010-12-11 06:36:47


"Phil Endecott" <spam_from_boost_dev_at_[hidden]> wrote in message
news:1291912190039_at_dmwebmail.dmwebmail.chezphil.org...
>>> and it's not obvious to me what its memory footprint will be.
>>
>> Well, with io2, it all depends on the backend...if the backend is 'smart
>> enough' to read in only the required parts of an image (and WIC
>> theoretically/as per documentation should be) the footprint should be
>> obvious
>
> Right, it's not obvious to me.

Why not, if it can be seen that the code reads an image with a sequentially
moving ROI and writing each read chunk/ROI to a file and then
'forgetting'/freeing it then it should be obvious that the maximum amount
RAM used will be the size of the ROI...unless of course the backend used is
not quite 'smart' (more on 'trusting the backend' below)...

>> This example seems different from the one you gave in the first post (or
>> I misunderstood both)...Now you seem to have an image that is (1400*5000)
>> pixels wide and 1300000 pixels tall and that is not actually a single
>> file but the 5k-x-5k tiles preseparated into individual files
>
> Right.

OK, I somehow missed on first reading that you actually read the input image
row by row (what I call a '1D ROI' in io2), i.e. you do not use tiles/real
2D ROIs... io2 can already do that with all backends including LibTIFF (only
GDI+ and WIC backends support 2D ROIs currently) so when I catch the time
I'll finally clean it up/fix it to compile on *NIX+Clang/GCC so you can
test-drive it if you want...however I have to temporarily relax my
involvement as I'm in the middle of a major release on my daily job + I'm
about to become a dad in the coming days, probably even before this review
ends so please bear with me ;-)

>> ...As I don't see what it is actually trying to do with the input data I
>> cannot know whether you actually need to load entire rows of tiles (the
>> 1400 files) but doesn't such an approach defeat the purpose of tiles in
>> the first place?
>
> No. Hmm, I thought this code was fairly obvious but maybe I'm making
> assumptions.
>
> There is a balance between
> - Buffer memory size
> - Number of open files
> - Code complexity
>
> Options include:
<...snipped...>

Well I must still say I do not understand why must you read entire rows of
input tiles (1400*5000 in length), why can't you just read individual 5k
input tiles and chop each of those into the smaller output PNGs and then
move on...yes the 5000%256!=0 issue might require reading up to 4 input
tiles and/or remembering some of the edge data for the cross-border input
tiles and/or maybe of the input tile which in turn would require a (much)
more complex code...but I do not really see the issue with that (if
performance is your prime concern here)...it's just a real world problem
that you handle the best as you can...a std::vector maybe much simpler than
a std::set but if (performant) sorted insertion and lookup is what you need
then there will be no doubt on which to use, right? Complex problems
sometimes simply give complex (or slow) solutions...

> There is also the issue of concurrency to think about. The code that I've
> posted can be made to work reasonably efficiently on a shared memory
> multiprocessor system by parallelising the for loops in the read/write_row
> methods. It's also possible to split over multiple machines where the
> tile boundaries align, which is every 160,000 pixels; that lets me divide
> the work over about 50 machines (I run this on Amazon EC2).

After performing the tests with the GeoTIFFs (that Mateusz and I discussed
about), and without knowing how complex the intermediate 'editing' procedure
is, it seems to me that even the 1400*5000*1300000 input image could be
chopped up in a reasonable time (i.e. counted in hours) even by a single
machine (that can still be considered a 'personal computer' as opposed to a
'server') with a 6 core i7, 8 GB of RAM and a dual RAID-0 setup (say 2+2 WD
VelociRaptors)...

> The essential issue is that I believe it's unreasonable or impossible to
> expect the library code to do any of this automatically - whatever its
> documentation might say, I don't trust that e.g. "WIC" will do the right
> thing. I want all this to be explicit in my code.

But then, if you refuse to trust documented performance claims even after
they are tested, you are, I must say, confined to writing assembly code
because you then simply can't trust anything...And by that rationale your
code is also not 'obvious' as how can one trust your backend to actually
read only what you told it to and not the entire image or something
similar...This is akin to not trusting malloc to allocate only the amount
requested (fearing it might actually allocate 16 times more)...

As you can see, in the other thread, I tested io2+WIC and the expectations
proved correct...So yes I think you can expect (and should demand it
otherwise, which is why I went for io2 in the first place) for a library to
do 'the right thing' (at least for a major subset of use cases, for others
it should give you low level control)...

> So I just want the library to provide the simple
> open/read-row/write-row/close functionality for the image files, and to
> supply types that I can use for pixel_t, etc.

Independent of the rest of the discussion/issues, yes I would agree that
explicit read/write row would be a valuable part of the interface...

>> I can only, as a side note, say that the shared_ptrs are an overkill...in
>> fact, if the number 1400 is really fixed/known at compile-time the
>> individual ReadTiff heap allocations are unnecessary...
>
> Please, think of this as pseudo-code if you like. Any shared pointer
> overhead will be swamped by the time taken in the image encoding and
> decoding.

Pseudo-code or not I simply have a knee jerk reaction to such coding
practices ;) Especially here where we hunt for performance...using
shared_ptr as opposed to say scoped_ptrs or a ptr_container gains you
nothing here so why use them by default...More importantly, as already, said
why heap allocated the readers at all (at least individually)? Especially
considering that LibTIFF already does that so all you get is an additional
indirection and worse locality of reference...

> Well having two signals and two regex libraries is a perfect example of a
> case where the existence of one solution has not prevented another similar
> and incompatible solution from being accepted later.

But accepted/merged as an improved replacement but as a parallel entity
which is a bit 'akward'/'ugly'/'unfortunate'...

-- 
"What Huxley teaches is that in the age of advanced technology, spiritual
devastation is more likely to come from an enemy with a smiling face than
from one whose countenance exudes suspicion and hate."
Neil Postman 

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk