Boost logo

Boost :

From: Lubomir Bourdev (lbourdev_at_[hidden])
Date: 2006-10-30 13:33:41


Hi Matt,

Thanks for your review!

Matt Gruenke wrote:
> I'm concerned that GIL's color spaces combine too many distinct
> concepts: memory layout, channel ordering, channel type &
> range, and a
> more pure definition of a color space (e.g. basis vectors, primaries,
> transfer function, etc.). In practice, this may require some
> users to
> define a large number of GIL color spaces.

I guess our documentation was not very clear...
The only aspects that GIL color spaces combine are the ordering and the
name of the channels. Maybe using Fernando's suggestion things will be
clearer - instead of color spaces, think of them as "pixel formats".

Memory layout (planar vs interleaved, packed vs non-packed) is addressed
elsewhere. The channel type and range are properties of the channels.
All the other aspects you list, such as transfer functions and color
profiles, make sense to be properties of the color conversion object.

That means that you don't need to create a new pixel format to represent
each combination of the above properties.
Let's pick a concrete example, handling premultiplied alpha in RGBA.
There are two ways you could do that:

1. You can certainly create a new pixel format. And you need just one
line of code:

struct premultiplied_rgba_t : public rgba_t {};

You can then provide color conversion to and from your new format. Since
this is a new color format, you don't even need to create a custom color
conversion object, you can just define the default color conversion for
it.
A pixel format is just a tag and it is trivial and easy to make new ones
if you need to. You can also store constants and types inside it if this
is convenient for you.

2. Alternatively, instead of creating a custom pixel format, you could
define your own color conversion object, as described in Section 16 of
the design guide. In particular, it will treat RGBA as pre-multiplied.
Use that only when you know your image contains premultiplied alpha.

Custom color conversion may be a better choice if you want to handle
more advanced color conversion, such as using color profiles and
transfer functions. The color conversion object is stored and is allowed
to have a state.

These two design decisions go to a fundamental problem that you need to
resolve every time you want to add new functionality: does the new
functionality make more sense to be a property of the type, or a
property of the algorithm? There are tradeoffs for either approach. GIL
does not force one choice on you - it lets you do either, or both.

> For example, a program or library that handles encoding/decoding of
> MPEG-4 video (non-studio profiles) has to deal with as many as 6
> variants of YUV, 4 transfer functions, and two different scales of
> sample values (without getting into things like n-bit profile). In
> addition to that, professional video production systems will
> also have
> to deal with a variety of linear, non-linear, and log-scale RGB
> formats. Add RGBA, and you also have to deal with whether Alpha is
> premultiplied. Combined with a few different channel
> orderings and data
> layouts, I fear the result is such a multiplicity of
> combinations that
> the core purpose of GIL's color space construct would be defeated.

Hopefully my description above addresses all these examples. You don't
need to create a custom color space for every combination of
possibilities. These variations, which are mostly orthogonal to each
other, are best addressed in different GIL abstractions, which are also
orthogonal, such as custom channels and channel algorithms, custom
pixels, pixel references and iterators, custom color conversion objects,
views, etc.

>
> Perhaps this is simply at odds with GIL's goal of uncompromising
> performance. Still, I think the library shouldn't simply
> exclude such
> cases. There should be ways to trade various amounts of
> efficiency for
> various amounts of runtime flexibility.

I believe GIL provides you with a variety of design alternatives of
extending functionality, each of which comes with a tradeoff between
efficiency, ease of use, ease of implementation and runtime flexibility.

>
> While I generally agree with Lubomir about the numerical aspects of
> promotion & conversion traits (i.e. the safe default will be
> overkill,
> in most cases), I do think they could have both significant
> runtime and
> structural advantages.

I am not arguing against traits. Using traits certainly makes coding
more convenient. All I am arguing is that the types of intermediate
results should not be hard-coded into the algorithms.

> The runtime advantages would be
> mostly avoidance
> of extra conversion steps and may require a per-algorithm
> override, so
> that the policy can be tweaked on a case-by-case basis.

I am not sure how the presence of traits has to do with avoiding extra
color conversion steps.
Perhaps by "promition and conversion traits" we mean of different
concepts. The way I see traits is as a set of metafunctions that give
you a type of intermediate result based on input types and perhaps other
criteria, such as the specific algorithms that are used. You don't need
to have traits - you could in theory pass the intermediate types
yourself. It is more convenient to use traits to provide suitable
defaults in cases you don't care.

> The
> structural
> advantages would come from being able to establish a semantic
> relationship between the values of different channel types.
> In order to
> get this right, I think channel types would rarely be POD
> types, such as
> int and float. Instead, an example of what you might use is a unique
> type that represents its value as a float, but which can have
> distinct
> traits template specializations like channel_traits<T>::zero_value(),
> channel_traits<T>::unity_gain_value(), and
> channel_traits<T>::saturated_value() (i.e. "white" value - often
> different from the largest value representable by the type!).
> I should
> concede that I haven't developed this idea very far, though it may
> provide a foundation for addressing some of the concerns
> raised by Dr.
> Reese.

Yes, I agree we need channel and color properties like the zero value,
the white point, etc.
One way of introducing them is as you outlined - provide custom channel
types and associate traits with them.
An alternative is to use built-in types, and pass the above information
as a context to the algorithms.
These are the exact same design decisions as I outlined above - should
it be part of the type or part of the algorithm. GIL does not prevent
you to make either choice.

>
> Is there any way to create views of an image that include a subset of
> the channels (for subsets larger than 1), besides
> color_converted_view?
> Or is there some way color_converted_view might come for
> free, in this
> case (I didn't get a chance to look into that)? IMO, this is pretty
> important for processing RGBA images since, as Ullrich Koethe points
> out, it's often necessary to treat Alpha differently than RGB
> (or YUV).
> Same goes for Z-buffers, and a variety of other synthetic and
> captured
> channels one might imagine.

This should be easy to define, and we can reuse planar_ref for it.
"planar_ref" is a pixel model whose channels can be at different
locations in memory. It is currently used to represent a reference to a
planar pixel, but we could also use it to represent a reference to a
subset of the channels of a pixel, for example. And it works as both
l-value and r-value reference.

>
> Finally, GIL seems to lack any explicit notion of non-uniform sample
> structures. In video, 4:2:2, 4:1:1, and 4:2:0 are
> ubiquitous. An image
> representation that can efficiently support uniform presentation of
> channels with different sample frequencies and phases is
> important for
> high-performance video processing applications. I accept
> that this is
> beyond GIL's intended scope, though I'm citing it as a problem I had
> hoped GIL would solve. While I believe a synthetic view
> could be used
> to provide 4:4:4 access to images with other sample
> structures, such a
> mechanism would likely constitute a severe
> performance/quality tradeoff
> - at least for a serious video processing application.

Again, there are a variety of ways of implementing the video formats
that vary between simplicity and performance.

One easy way is to use the virtual view abstraction: keep a handle on
the image. You will be called with a given X and Y coordinates, which
allows you to compute the locations of the Y,Cb,Cr channels and return
them. You could use the planar_ref to return them as an l-value, which
will allow you to have a mutable view. (Of course, changing one
component may result in changing components of other pixels, because
they are shared).

An alternative design is to provide a custom pixel iterator that keeps
track of its position in the image.
Upon dereference it will return a planar_ref with the corresponding
channels. This is abstracting the problem at low level, which allows us
to use the regular locator and image view classes and is potentially
more efficient.

I have no comments regarding the rest of your review.

Thanks again for spending the time to review GIL!

Lubomir


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk