Boost logo

Boost :

From: Lubomir Bourdev (lbourdev_at_[hidden])
Date: 2006-11-01 13:04:25


 

> I believe there exist component formats with the channel
> order Y'CrCb.
> However, when referring to the family of digital component video
> formats, I think the order Y'CbCr is preferable, as it matches the
> common (though less precise) Y'UV nomenclature.

OK. I am obviously not that familiar with video formats. (Also replace
v120 with v210 in my post)

> >Its dereference will return a reference to the appropriate channels.
> >For example, the fourth pixel uses:
> > For Y': bits [22..31] of the 3rd word
> > For Cb: bits [2 ..11] of the 2nd word
> > For Cr: bits [12..21] of the 3rd word
> >
> >
>
> Almost. If you just wanted a rough approximation, that would
> be better
> than nothing. However, the chroma phase for Y3 is actually half way
> between C1 and C2. Of course, using standard video interpolation
> (windowed sinc), you would need to access many chroma samples
> on either
> side.
>
> This could be used to demonstrate an interpolating iterator. But, if
> you wanted the easy way out, your example should have used
> Y0, Y2, or Y4!
> ;)

No, I want the hardest imaginable example. Bring it on!! :)

>From your description I can conclude that in v210 every pixel with odd
x-coordinate has the CR and CB values defined as the average ones of its
logical left and right pixel.

In this case here is how I would change my plan:

1. Drop the requirement that our model of v210 be writable. It is not
possible to accurately write into v210 pixel by pixel. Writing requires
providing simlutaneously the values of at least a pair of pixels sharing
channels. So writing requires providing a GIL algorithm, and cannot be
done just by data abstraction.

2. Now that we have writing out of the way, we can create a model of an
r-value channel reference (ChannelConcept, but not
MutableChannelConcept) that takes two channels and returns a value
half-way in between (or if generalization is important, an arbitrary
interpolation). Lets call it InterpolatingChannel

We obviously want to use a model of InterpolatingChannel, instantiated
with two 10-bit channel references, to represent the CR and CB channels
of the odd pixels. We don't need it for the even pixels and ideally we
would like to use a "simple" 10-bit channel reference there. The problem
is that even and odd pixels will have different types. STL-style
iterators don't support changing their value type dynamically. So we
must use InterpolatingChannel for the CR and CB channels of the even
pixels as well. We could simply use the same channel when interpolating.

The second design question is, what to do with the type of Y channel.
There are two options - we could represent it with InterpolatingChannel
or with a "simple" 10-bit channel reference. The second option results
in a faster implementation, but requires us to write a heterogeneous
equivalent of planar_ref (the model of a pixel whose channels are not
together in memory).

The pixel iterator model I outlined need not change. In fact, we could
make it more generic and used in other contexts. It is a model of a
pixel iterator for cases where a sequence of pixels may each have a
different memory layout, but the sequence itself repeats. In the case of
v210 the sequence is 6 pixels long and the increment to the next
sequence is 16 bytes. We could reuse this class to represent a bitmap
image. There the sequence is 8 pixels long and the increment to the next
sequence is 1 byte. A bitmap image has a grayscale color space. For its
channel reference we could use the same class I outlined for the v210
channel reference, except we instantiate it with a bit depth of 1
instead of 10.

Obviously all of this abstraction is only necessary so we can use the
image formats in generic algorithms. However, it comes with a price in
performance - we can do a much better job by processing the entire
sequence instead of the pixels one at a time. This could be done by
providing performance overloads for specific algorithms.

Alternatively, we could represent v210 format (and bitmap) as special
single-channel images whose "pixel" corresponds to an entire sequence (6
logical pixels in case of v210 and 8 logical pixels in case of bitmap).
That allows us to write some operations simpler and more efficiently,
but then such images cannot be used in traditional GIL algorithms that
pair 1-to-1 pixels between images, such as copy_and_convert_pixels,
transform_pixels, etc.

Of course, GIL allows you to use both alternatives - use a
single-channel view to your v210 "6-pixel-grayscale" data in cases that
makes sense, and use a more traditional view when you need 1-to-1
mapping with regular pixels.

Lubomir


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk