From: Ullrich Koethe (koethe_at_[hidden])
Date: 2006-11-16 11:51:25
>Lubomir Bourdev wrote:
> I don't see the big convenience of having copy_pixels do implicit
I agree that there is no advantage at all in a direct call of copy_pixels.
But I'm thinking about conversions happening in nested function calls,
where the intermediate types are _deduced_ (by means of traits and little
template metaprograms). Consequentially, the appropriate conversions must
also be deduced, and a default conversion is just the simplest form of
Type deduction is central to VIGRA. For example,
is actually executed as a separable convolution
where the type of temp_image is automatically determined, and both calls
involve an automatic conversion. (I admit that the customization options
of this behavior could be improved.) With deeper nesting, customization of
this behavior can become somewhat complicated, and defaults will be useful
(or even required).
>>Unfortunately, the CPU itself violates rule 3
> (That seems quite a serious problem though! Can you point me at a
> document describing this? Which CPUs are affected?)
We learned it the hard way. AFAIK, it affects Intel CPUs and compatible.
Registers have more than 64 bits for higher accuracy, but this is not
appropriately handled in the comparisons. One can switch off the extra
bits, but this throws out the baby with the bath water. I'm sure, someone
at Adobe knows everything about this problem and its optiomal solution.
Please keep me informed.
> So in this case the range is -infinity to infinity. It is still defined.
> But I would argue that most of the time the range is finite.
Yes, but often it is not necessary to specify the range explicitly.
> Floating point operations have higher latency and lower
> throughput because of fewer functional units available to process them.
When you time 32-bit integers against 32-bit floats (so that memory
throughput is the same) on a modern desktop machine, the difference is
small (if it exists at all). Small computers (e.g. PDAs and cell phones)
are a different story, where I don't have much experience.
> Another issue is their size and ability to fit in the cache, since they
> are typically four to eight times larger than a char.
Well, to do image processing with any kind of accuracy, you will need at
least 16-bit integers. Then the difference to 32-bit float shouldn't be
> A third issue is
> the performance of floating point to integer conversion on many common
Indeed, these are real performance killers. That's why we tend to work in
floating point throughout, when we don't need the last bit of speed. After
all, the 25% speed-up of your face detector is not that impressive, given
that it was probably a lot of work. We made a similar experience with
replacing floating point by fixed point in some application -- it was
faster, but hardly that much faster to justify the effort and loss in
> This is why
> providing generic algorithms that can work natively on integral types
> (unsigned char, short, int) is very important for GIL. This necessitates
> providing a suite of atomic channel-level operations (like
> channel_invert, channel_multiply, channel_convert) that have performance
> specializations for various channel types.
What I often do is to specialize the functors. For example, a
LinearRangeMappingFunctor computes a linear transformation at each pixel
by default, but for uint8, it computes a look-up table in its constructor.
The specialized functor can be created automatically.
> I am not arguing that there are contexts in which knowing the range is
> not important - of course there are!
> All I am saying is that the ranges matter at least for _some_
No doubt about that. Perhaps, the notion of a range is just too general?
It might be better to study the semantics of various uses of ranges and
provide the appropriate specializations on this basis.
For example, one specialization I was thinking about is a 'fraction' which
maps an arbitray range onto the semantic interval 0...1.
is the type of the standard 8-bit color channel, but
Fraction<unsigned char, 0, 200>
Fraction<unsigned short, 1000, 16000>
would be possible as well, and the lower and upper bounds represent 0 and
1 respectively. The default bounds would be numeric_limits::min and
Fraction<float, 0, 1>
would be a float restricted to the interval 0...1 (which could be mapped
to a native float, depending on the out-of-bounds policy). A traits class
can specify how out-of-bounds values are handled (e.g. by clamping, or by
simply allowing them) and how mixed-type expressions are to be coerced. I
suppose you have benchmarked the abstraction penalty of ideas similar to
this -- can you send me some of the data?
What other semantic interpretations of ranges are required?
> It is not against GIL principles to have intermediate values outside the
> range when it makes sense, as long as you know what you are doing.
OK, that makes sense.
> 1. Provide a metafunction to construct a channel type from a (built-in)
> type and range. For example, here is how we could wrap a float into a
> class and associate the range [0..1] with it:
> typedef channel_type<float,0,1>::type bits32f;
That's very similar to my Fraction proposal above. You would then just write
which also assigns a meaning to the range. And if out-of-bounds handling
was 'ALLOW_OUT_OF_BOUNDS', that type could be a native float.
> C. Like A, but associate ranges with certain built-in types (like 0..1
> with float)
> This is essentially what GIL does currently. The advantage is that in
> the vast majority of cases you can use built-in types as channels (no
> abstraction penalty) and they will do what you want.
Well, I prefer clamping over modulo arithmetic as a default, which is not
quite built-in for the integral types.
>> > In my opinion tiled images are a different story, they
>>cannot be just > abstracted out and hidden under the rug the
>>way planar/interleaved images > can.
>>I'm not so pessimistic. I have some ideas about how
>>algorithms could be easily prepared for handling tiled
> We would be very interested in hearing more about this. But I must be
> misunderstanding you because I can't imagine how this could possibly be.
> How could you have a scheme for taking any inherently global algorithm
> (like flood-fill) and making it tile-friendly.
This is certainly a difficult one, but I guess there exists some parallel
version written in the Golden Age of Parallel Image Processing (which
ended because the serial computers improved faster than people were able
to write parallel algorithms).
But for a general solution, I was thinking mainly about the simpler
functions, like pixel transformations, filters, morphology, local edge
detectors, perhaps geometric transformations and warping.
-- ________________________________________________________________ | | | Ullrich Koethe Universitaet Hamburg / University of Hamburg | | FB Informatik / Dept. of Informatics | | AB Kognitive Systeme / Cognitive Systems Group | | | | Phone: +49 (0)40 42883-2573 Vogt-Koelln-Str. 30 | | Fax: +49 (0)40 42883-2572 D - 22527 Hamburg | | Email: u.koethe_at_[hidden] Germany | | koethe_at_[hidden] | | WWW: http://kogs-www.informatik.uni-hamburg.de/~koethe/ | |________________________________________________________________|
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk