|
Boost : |
From: Lubomir Bourdev (lbourdev_at_[hidden])
Date: 2006-10-25 17:59:16
Dr. Reese,
Thank you for your detailed review. I am going to post my comments even though I am not sure if you check the boost thread.
Greg Reese wrote:
> Major Concern
>
> My major concern is the lack of requirements/concepts on
> arithmetic between Pixels, i.e., pixel classes/structures.
> For example, in the Guide there are numerous code snippets
> for the computation of a gradient. These take the
> form of c = (a - b) / 2
>
> where a and b are uint8s and c is an int8. Unless the
> subtraction operator is defined in some special way not
> mentioned in the Guide, all of these examples are wrong. The
> difference of a and b can easily be a negative number but it
> will be interpreted as a uint8, giving a weird result. Since
> the most common pixels are uint8 grays and RGB colors made of
> triplets of uint8's, this is a serious concern.
We agree, in general. GIL core simply gives you access to the channels of a pixel in their raw form. When you write an algorithm, you have to be very careful and take into account that the channels can be of any type.
But that is true in C++ in general: It will let you assign a signed char to an unsigned char with no complaint. The result will be strange, and you better know what you are doing.
As for the example in the tutorial the types used are as follows:
unsigned char a,b;
signed char c;
c = (a - b) / 2;
This does not result in overflow and works properly. We used this example because the goal of the tutorial was to get a very basic introduction to GIL, things like pixel navigations and views, not to write the real algorithm that computes the gradient. (There are many other problems with the x_gradient algorithm in the tutorial that were intentionally overlooked)
But in general, you are right, when writing image processing algorithms we need some facilities for dealing with channel arithmetic. However, I claim that they should be part of the numeric extension, the GIL extension that deals with algorithms.
In particular, we should have some fundamental atomic channel-level operations defined. For example, we could have the above atomic operation "half_difference" and have specializations for, say, integral channels, to use a shift instead of division.
Then the formula could look, for example, like this:
c = channel_convert<DstChannel>(channel_halfdiff(a,b));
>
> When performing arithmetic with uint8s in image processing,
> the results need to be saturated, i.e., values above 255 need
> to be set to 255 and values below 0 need to be set to 0. One
> way to do this to define the binary arithmetic operators so
> that they convert to int16, do the arithmetic between the two
> operands, saturate the values if necessary and convert back
> to uint8. If this technique is used, the requirement for this
> behavior should be placed on Pixel.
Agreed - the above atomic channel-level algorithms should take into account saturation.
> Note too that the operator actions depend on the data type.
> In many applications, its okay to assume that there won't be
> any over/underflow when doing arithmetic for int16s because
> the number of terms and the pixels'
> values are usually small. The same holds true for not
> converting floats to doubles. However, it wouldn't be unusual
> for a program to want to convert int16s to int32s, so that
> needs to be an option.
>
> Unfortunately, simply defining binary arithmetic between
> Pixels is not sufficient. For example, suppose that we've
> defined binary addition between uint8s as above, so that
> saturation is handled correctly. Now consider finding the
> average of two uint8s a and b, each equal to 255. The answer is
> obviously 255. However, if we use the formula average =
> (a + b) / 2, a
> and b will sum to 255 because of saturation, and 255/2 yields
> a value for the average of 127.
Agreed - having binary arithmetic between pixels is insufficient.
> This problem with pixel arithmetic doesn't affect GIL per se
> because it only represents images, not actions on them.
Exactly!
> However, having a completely generic image representation
> that doesn't accommodate pixel arithmetic makes the
> representation useless for image processing. Moreover, now
> that the authors have finished the bulk of work on the image
> software and are starting on the algorithms to process them,
> the majority of the algorithms will involve pixel arithmetic,
> so a general concept of arithmetic between Pixels should now
> be a concern.
Agreed - one of the very first things to do when writing image processing algorithms, is defining those atomic operations.
> * The channels provide their min and max values, but it
> would be nice if they could also provide the min and max
> values that the data they hold can attain, i.e., how many
> bits of data there are, not of data storage. For example,
> MRIs usually have a 12-bit data range but are stored in 16-bit pixels.
You could use std::numeric_traits<Channel>::max() for this
> * Can GIL accommodate images with a "virtual range",
> i.e., in which the x-y range the user accesses is different
> than the x-y range of the actual data? For example, the
> Fourier Transform of an n x n image is an image of n x n
> (complex) numbers. However, half the transformed numbers are
> redundant and are not stored. Thus it would be useful to let
> the user access the transformed image as if it were n x n but
> let the image representation retrieve the appropriate value
> if the user requests one that is not in the stored range.
> Perhaps this can be done through some sort of pixel iterator
> adaptor. How much work would it be entailed for all of the
> library to work on images like these with a virtual range?
It would be possible to do that using the virtual image abstraction.
It allows you to define the pixel value given coordinates x and y. You could have a function that holds the original image and performs an arbitrary processing/caching to return the value at (x,y).
> * How ubiquitous are proxies in GIL? They can't be used
> in STL algorithms, which is a big drawback.
Proxies are used to represent a planar image. You are, of course, free to treat a planar image as N single-channel images.
> * The use of piped views for processing makes me nervous.
> For example, the Tutorial states that one way to compute the
> y-gradient is to rotate an image 90°, compute the x-gradient,
> and rotate it back. First of all this only works for kernels
> that have that rotational symmetry, e.g., square or circular
> kernels. It doesn't work with rectangular kernels. Secondly,
> even if the kernels have the correct symmetry, the process
> won't yield usable results if the spatial resolution in the
> x- and y-directions is different, as happens often with
> scanners. No computations done by the library should be
> performed in this indirect manner, or at a minimum, the
> documentation should clearly warn about these potential problems.
Agreed. Again, this was just used to illustrate what rotated90cw_view does. We don't recommend people actually write y-gradient by reusing x-gradient with rotated views. There are other severe problems, like performance, besides the problems you mention.
>
> Finally, one possible extension to the design that may
> be helpful is to incorporate the idea of an image region.
Image regions are an interesting concept, and worth considering further.
But there could be different models depending on the access pattern requirements and density.
For example, image view + bitmask may make sense to represent dense localized regions, or a vector of pixel coordinates may be more appropriate for sparse regions. So it makes sense to let this concept emerge, driven by the concrete algorithms.
Thanks again for the detailed review!
Lubomir
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk