Boost logo

Boost :

From: Nathan Myers (ncm-yahoospam_at_[hidden])
Date: 2002-01-01 00:19:52


joel de guzman wrote:
> From: "Nathan Myers" :
> > Joel's implementation ... uses a sorted
> > vector of ranges, rather than a map indexed on the first element. As a
> > result, insert and erase operations involve block moves, but memory usage
> > is tight, where in extent_set<> everything involves tree operations.
> > Joel's uses reference counting, with copy-on-write to enable efficient
> > copy/assignment.
> >
> > The important differences are more subtle. Joel's interface emphasizes
> > construction, copying, combining and membership tests; extent_set<>
> > emphasizes inserting and erasing ranges, and scanning the ranges present.
> > ...
> The actual implementation has two layers. You forgot to mention
> the lower layer which is the actual range_run and range classes.
> The range_run class (which is the real engine) has only a few
> member functions. Most importantly, 1) test, 2) set 3) clear.
> test tests for membership. set and clear, ummm, sets and clears
> a range. It also exposes the range iterators. This class is not ref-
> counted. This class is standalone and is not tied to any parser code
> at all. I intend this to be the basis of a more generalized submission.

Thank you, Joel, for your patience with me.

I hadn't thought of them as separable components; I got thrown off
by the reference counting. The range_run<> type is indeed very similar
to extent_set<>, and more complete.

> I may be wrong, but based on what I saw initially, extent_set does
> not allow insertion and removal of overlapping ranges. The test
> program does not perform any. For instance, is it possible to
> do something like this?
>
> s.set(range(0, 100)); // result = [0..100]
> s.add(range(200, 300)); // result = [0..100] [200,300]
> s.set(range(50, 150)); // result = [0..300]
>
> or given the current state: [0..10] [20..30] [40..50] [60..70]
>
> s.set(range(0, 100)); // result = [0..100]
>
> and this?
>
> set.set(range(min_int, max_int)); // min_int..max_int
> set.clear(range(-100,100)); // [min_int..-100] [100..max_int]

Such functionality would be easy to add to extent_set<>, but would involve
a loop to implement. Maybe the loop would only run one cycle or be skipped
entirely in the non-overlapping case. However, it would not be quite so
elegant :-).

>> The preconditions radically simplify the logic of the insert and erase
>> members. They correspond, logically, to the requirement the same memory
>> not be passed to free() twice. More general functions that don't impose
>> such preconditions must contain a loop. They would be easy to add (along
>> with general set union, intersection, and complement-intersection), and I
>> expect that will happen eventually, but they would not be a substitute for
>> the more basic operations.
>
> If I read this correctly, Spirit's low level range_run class might be the
> eventual result you are looking for?
>
> BTW, your other concerns e.g. less-than < vs. generic compare is of
> course trivial to implement. Finally, I am still not sure if a map is a good
> choice, performance wise, nevertheless, a higher level class might
> use a policy to choose between the vector/map implementation.

I agree, we could merge range_set and extent_set with probably minimal
upset to either. A policy argument for whether to use a vector or a map
to represent it, to trade off space against update speed, also seems
workable, in principle.

An important difference is that with a tree representation, incrementing
or decrementing the iterator can be pretty expensive, so the algorithms
need to be careful not to do it unnecessarily. (That is why extent_set<>
stores its extents in reverse order; map<>::lower_bound gives the right
answer, then.)

A survey...

Judging from the postings by Darin, Howard, and Dave, this represents one
of a family of very similar interesting structures. What is common to
them all is not so much their range interface, but rather that they
interpolate coverage of their domain and/or range. They store one
or both endpoints depending on whether they are continuous, and use
a slope of one or zero. (Dave suggested allowing other slopes; this
would allow mapping from, e.g., block number to byte offset. He also
mentioned bidirectional mapping. That might be hard to shoehorn into
a common structure.)

Darin's IncrementalRangeMap, my extent_map<>, and Howard's range_map<>
are almost the same.

Darin's RangeMap differs from the other examples in having a 0 slope; it
could tolerate a non-numeric type in its, er, range. RangeSet, extent_set,
and range_run have what amounts to a boolean, um, range. In all four
cases arithmetic on the domain type is limited to, at most, incrementing
or decrementing.

Treatment of argument domains that overlap existing mappings varies from
"Precondition: not", with elegantly minimal operator code, to "anything
goes". I see no reason not to have both interfaces; the difference is
usually more a matter of preference -- minimalism vs. generality -- than
of semantics or performance.

(BTW, I find the "range" term difficult to use around mapping primitives,
given the existing domain/range usage in the same context. That's another
part of why I have preferred to say "extent".)

It's not clear to me whether all these should be unified under a policy
parameter, or just presented as a family of components. Probably the
latter, although they might be able to share some implementation code.
in the way that std::set<> and std::map<> do.

Dave's cumulative extent tree, a 1D case of Beman's RTree and
Reid's 3D structures, is another productive area to explore.

Happy New Year to all,

Nathan Myers
ncm at cantrip dot org


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk