Boost logo

Boost :

From: joel de guzman (isis-tech_at_[hidden])
Date: 2001-06-21 10:30:39


Hello,

More on sparse bit vectors...

[snip]

[Joel]
> > -> Macintosh style graphical regions. A vector of sparse bit
> > vectors can represent something like Regions in the Mac.
> > in fact the Mac-Region inspired this implementation. [ I
> > am aware that this is patented. I don't know if my
> > implementation infringes on this patent. But I am also
> > aware that this patent will expire soon (now)? ]

[Joel]
I've always used Mac-Regions as sets before. I sorely miss it
when doing cross-platform programming.

[Dave]
I think the Mac implementation uses a some kind of difference encoding from
row-to-row, so that, e.g., a gray halftone (checkerboard) is actually very
cheap to encode. If you're not doing that, you probably don't infringe.

[Joel]
I did some Web search:

<quotes>

The region patent probably expires sometime between 1998 and 2000, but
I have a feeling that Apple won't be too unhappy with you for using the
info to write Mac programs before then.

--
     Gavriel State | Lead Developer - CorelDRAW! for Macintosh
It is patented, I've read the patent.  I don't recall the patent # (this was
several years ago),
but if you find it you can order a copy from the US Patent office. So it is
not secret. Look
in IM QuickDraw, the patent number should be there somewhere.
The format is rle, but with a couple of strange twists.  There are two
cases:
1) Trivial case: rectangle represented by a size of 10 and the rectangle in
the Region.rbox field.
2) Complex case, size > 10.  The data is a series of shorts in the following
format
y x x x 7fff
y x x x 7fff
7fff
where y is the current scanline, x is the horizontal offset and 0x7fff is
the end of line/ end of data marker.
What's tricky here, (and the key to the patent), is that x does not simply
invert the current line,
but rather x inverts everything down and to the right of the point.
This is working from memory so there may be a few differences in the
details.  Check a couple
of samples against this (overlapping rectangles is a good one to try).
--
Jerry Whitnell                               First Virtual Corp
jerry_at_[hidden]
</quotes>
[Joel]
I guess Dave is right. My implementation does not infringe
on the patent.
I'm unsure why they chose an end marker instead of a length
prefix. How could they do per-row bsearch for hit testing with
this format? Also, all data is placed in a contiguous memory
block. Hit testing requires linear pass over the Y lines!
Well at least Apple's regions are totally ADTs. The only visible
part is the region bbox and memory size. Might they
have changed the representation by now?
[Greg]
> There are other ways to do sparse bit vectors.  My favorite
> is a very compact binary tree representation that uses on
> average two bits for every "on" bit and no bits for the
> "off" bits.  It's not patented so far as I know.  If it is
> I know of plenty of prior art.
[Joel]
There are pros and cons for both techniques. The
range-run technique has very nice properties especially
suitable for many real world applications. I think the
reason is because it is very natural to construct sets from
ranges. And this particular data structure representation
excels when defining ranges.
A set of all unicode Japanese characters (Kanji, Katakana
and Hiragana) for example connot be trivially represented
with the binary tree scheme that you mentioned. This will
occupy lots of memory and consequently have a slower
insertion and search time. I am not sure about Kanji*, but
Katakana and Hiragana occupy a single contiguous range.
I'd say, at the most, there are not more than 5 ranges in
there (correct me if I'm wrong, I don't have the Unicode
charts with me right now).
 (*Japanese Kanji and the Chinese characters overlap)
OTOH, the binary tree representation excels when the
data is randomly distributed such that there are lots
of discontinuities. Would it be possible to have both and
dynamically switch to an algorithm/data structure based
on the resulting size/complexity? Hmmm.
Cheers,
Joel de Guzman
PS> I am also very much interested with floating point range-
runs. Here, a step is 1 epsilon unit instead of 1 for integers.
This I guess is not possible (or at least portable) with b-tree
representations.
PPS> I wonder why I can't find data structures and algorithms
like this in books? None of my algorithm books have anything
close. Or am I just plain ignorant again? Perhaps~~~

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk