Boost logo

Boost :

From: Lubomir Bourdev (lbourdev_at_[hidden])
Date: 2006-11-15 20:56:14


Hi Ulli:

> Lubomir Bourdev wrote:
> >>>First of all, in GIL we rarely allow direct mixing images of
> >>>incompatible types together.
> >>
> >>What are the criteria for 'incompatible'?
> >
> >
> > Compatible images must allow for lossless conversion back
> and forth.
> > All others are incompatible.
> >
>
> That's a nice rule, but it's not the rule of the underlying
> C/C++ language, where compatible means more or less that an
> implicit conversion between two types is defined. Our default
> behavior in VIGRA resembles this as far as possible, with the
> exception that the default conversions include clamping and
> rounding where appropriate. In practice, both definitions of
> compatibility will probably work equally well as long as
> conversions can be customized. I still think that a default
> implicit conversion makes life easier in many situations,
> without causing unpleasent surprises.

I don't see the big convenience of having copy_pixels do implicit
conversion.
If you want to convert between incompatible types, then instead of
copy_pixels, simply use something else, like copy_and_cast_pixels, or
copy_and_round_pixels, or whatever. You are free to define such
operations.
 
> I like this name -- it says exactly what's happening. Please
> remember that
>
> image::recreate(width,height, initial_pixel_value)
> image::recreate(size_object)
> image::recreate(size_object, initial_pixel_value)
>
> should also be defined.

Yes, good idea.

>
> > It comes down to some basic principles:
> >
> > 1. We want equality comparison to be an equivalence relation.
> > 2. We also want equality comparison and copying to be
> defined on the
> > same set of types.
> > 3. The two operations should work as expected. In particular, after
> > a=b, you can assert that a==b.
> >
> > I hope you will agree that these are fundamental rules that hold in
> > mathematics and violating them can lead to unintuitive and
> bug-prone
> > systems.
> >
>
> Unfortunately, the CPU itself violates rule 3: a double in a
> register and the same double written to memory and read back
> need no longer compare equal - a behavior that is really hard
> to debug :-(

Well, the world will never be perfect, but that doesn't mean we
shouldn't strive for perfection :-)
(That seems quite a serious problem though! Can you point me at a
document describing this? Which CPUs are affected?)

> >
> > Regardless of what we do, there is always a notion of the operating
> > range of a channel, whether it is implicit or explicit.
>
> I don't agree. For example, when you use Gaussian filters to
> compute derivatives or the structure tensor etc. the result
> has no obvious range - it depends on the image data and
> operator in a non-trivial way. For example, the minimal and
> maximal possible values of a Gaussian first derivative are
> proportional to 1/sigma, but sigma (the scale of the Gaussian
> filter) is usually only known at runtime. It is the whole
> point of floating point (as opposed to fixed point) to get
> rid of these range considerations.

So in this case the range is -infinity to infinity. It is still defined.
But I would argue that most of the time the range is finite.

> > You are essentially suggesting that the range of a channel
> be defined
> > at run time, rather than at compile time, right?
> >
>
> More precisely, we avoid explicitly defined ranges, until
> some operation (e.g. display) requires an explicit range.
> Remember that it is no longer slow to do all image processing
> in float.

My experience, and that of my colleagues, is that there is often
significant speed difference between integer and floating point
computations. Floating point operations have higher latency and lower
throughput because of fewer functional units available to process them.
Another issue is their size and ability to fit in the cache, since they
are typically four to eight times larger than a char. A third issue is
the performance of floating point to integer conversion on many common
architectures. The differences are especially large on non-desktop
devices, such as PDAs. I was able to speed up my face detector by more
than 25% just by making certain operations integer. This is why
providing generic algorithms that can work natively on integral types
(unsigned char, short, int) is very important for GIL. This necessitates
providing a suite of atomic channel-level operations (like
channel_invert, channel_multiply, channel_convert) that have performance
specializations for various channel types. Many of these operations
require knowing the range of the channel, which is why GIL channels have
ranges.

> > But you still need to know the range for many operations.
> > For example, converting between additive and subtractive
> color spaces
> > requires inverting the channel, which requires knowing its range.
>
> Not necessarily. In many cases, one can simply negate the
> channel value(s) and worry about range remapping later, when
> the required output format is known.

I am not arguing that there are contexts in which knowing the range is
not important - of course there are!
All I am saying is that the ranges matter at least for _some_
operations.
GIL's principles, as I stated before, are to push the complexity down to
the elements and resolve it there. Having smart channels makes writing
higher-level code easier. If the channel knows its operational range, we
can define a set of atomic channel-level operations (such as
channel_invert, channel_multiply, channel_convert) and then use them
when writing higher level algorithms.

> Likewise, if you don't require fixed ranges, you may perform
> out-of-gamut computations without loss of precision. You
> can't display these values, but they are mostly intermediate
> results anyway.

It is not against GIL principles to have intermediate values outside the
range when it makes sense, as long as you know what you are doing.

> Thus, I strongly argue that the range 0...1 for floating
> point values should be dropped as the default behavior.

We don't like hard-coding 0..1 range for float either, so we agree with
you and are willing to look into alternatives.
Here is what we have come up with:

1. Provide a metafunction to construct a channel type from a (built-in)
type and range. For example, here is how we could wrap a float into a
class and associate the range [0..1] with it:

typedef channel_type<float,0,1>::type bits32f;

2. Define a RangedChannelConcept (a channel that has associated range).
All atomic channel-level functions that need a range will require a
model of RangedChannelConcept

Now we have several alternatives:

A. Define the range of any channel T by default to use
numeric_limits<T>::min() and max()

The advantage is that all built-in types will be valid models of
RangedChannelConcept, and it so happens that the range 0..255 for uchar
and 0..65535 for ushort are the ones people typically use. The
disadvantage is that float will have a range of -inf to inf, so using it
with range-requiring algorithms will be totally bizarre. We could
mitigate this by typedef-ing bits32f to be the float0..1 type instead of
native float.
And we can provide typedefs for other useful float/double ranges.

B. Don't define a default range.

As a result, built-in types will not model RangedChannelConcept. So
GIL's channel typedefs bits8, bits16, bits32f will all be wrapped and
not native. The advantage is that using a range algorithm like color
conversion with a built-in type like float or double will not compile at
all (rather than produce undesired results). The disadvantage is that
built-in types will rarely be used as GIL channels, which could have
performance implications due to abstraction penalty (and I do have data
to demonstrate such performance penalty exists)

In both cases, the client is free to define a range for the built-in
types, which will make them model RangedChannelConcept. As long that
such definition is on the client side and not in generic code, I believe
it is OK.

C. Like A, but associate ranges with certain built-in types (like 0..1
with float)

This is essentially what GIL does currently. The advantage is that in
the vast majority of cases you can use built-in types as channels (no
abstraction penalty) and they will do what you want. You can always use
a float and you can be outside range, but any range-requiring algorithms
will assume 0..1 range for float. If you do want to use range-requiring
algorithms, but don't want the 0..1 range, you can still create a custom
wrapper channel. (For example, if you want to display the gradient image
on screen). Alternatively, you can scale your float image to 0..1 range
before color-converting it. The obvious disadvantage is that we are
hard-coding 0..1 for float which is somewhat arbitrary and non-generic.

My inclination is to go with option A, as it provides a reasonable
tradeoff between performance and genericity. Thoughts?

>
> > In my opinion tiled images are a different story, they
> cannot be just > abstracted out and hidden under the rug the
> way planar/interleaved images > can.
>
> I'm not so pessimistic. I have some ideas about how
> algorithms could be easily prepared for handling tiled
> storage formats.

We would be very interested in hearing more about this. But I must be
misunderstanding you because I can't imagine how this could possibly be.
How could you have a scheme for taking any inherently global algorithm
(like flood-fill) and making it tile-friendly. Some algorithms require
to be rewritten from scratch completely, and for others no reasonable
tile-friendly solution exists...

Lubomir
 


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk