Boost logo

Boost :

From: joel de guzman (djowel_at_[hidden])
Date: 2001-12-30 15:44:17


----- Original Message -----
From: "Nathan Myers" :

> "joel de guzman" wrote:
> > From: "Nathan Myers" :
> > >
> > > A bit set implementation (sparse or otherwise) wouldn't be useful in my
> > > application. It will be easy to add general set operations to extent_set
> > > if anybody finds them valuable; in my application the greater efficiency
> > > of the more-restricted interface matters, and the preconditions matter not
> > > at all.
> >
> > The sparse bit set implemented as range-runs is a superset of
> > the extent set. Internally it's almost alike. I haven't checked but
> > performance wise, the extent set shouldn't be much faster than
> > the range-run in the average case especially if you are not inserting
> > or removing overlapping ranges anyway. I haven't done some profiling
> > though and perhaps I am wrong.
>
> Now that I have had an opportunity to read Joel's code, I can comment
> further. BTW, it was not where he said, but here instead:
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/spirit/spirit/boost/spirit/impl/chset.ipp
>
> Joel's implementation special-cases char, signed char, and unsigned char,
> and uses a bitmap representation for them. For others, it uses a sorted
> vector of ranges, rather than a map indexed on the first element. As a
> result, insert and erase operations involve block moves, but memory usage
> is tight, where in extent_set<> everything involves tree operations.
> Joel's uses reference counting, with copy-on-write to enable efficient
> copy/assignment.
>
> The important differences are more subtle. Joel's interface emphasizes
> construction, copying, combining and membership tests; extent_set<>
> emphasizes inserting and erasing ranges, and scanning the ranges present.
> These differences are appropriate to their usage domains; Joel's for
> representing character class membership, extent_set<> for resource
> allocation. Joel's is a higher-level structure that could easily use
> extent_set<> in its implementation, given a few more operators.
>
> Along another dimension, Joel's uses native operators (e.g. less than)
> where mine is parameterized on these operations like std::map<> itself.
>
> I don't think we should let similarities in implementation details
> distract us from what a component is for, and how well it serves that
> purpose. Joel's design is highly evolved to be useful to aid parsing.
> My extent_set<> is more primitive, in the zoological sense, and
> lower-level, and potentially more generally applicable as it matures.

Hi Nathan,

The actual implementation has two layers. You forgot to mention
the lower layer which is the actual range_run and range classes.
The range_run class (which is the real engine) has only a few
member functions. Most importantly, 1) test, 2) set 3) clear.
test tests for membership. set and clear, ummm, sets and clears
a range. It also exposes the range iterators. This class is not ref-
counted. This class is standalone and is not tied to any parser code
at all. I intend this to be the basis of a more generalized submission.

I may be wrong, but based on what I saw initially, extent_set does
not allow insertion and removal of overlapping ranges. The test
program does not perform any. For instance, is it possible to
do something like this?

    s.set(range(0, 100)); // result = [0..100]
    s.add(range(200, 300)); // result = [0..100] [200,300]
    s.set(range(50, 150)); // result = [0..300]

or given the current state: [0..10] [20..30] [40..50] [60..70]

  s.set(range(0, 100)); // result = [0..100]

and this?

    set.set(range(min_int, max_int)); // min_int..max_int
    set.clear(range(-100,100)); // [min_int..-100] [100..max_int]

You mentioned that:

<quote>
The preconditions radically simplify the logic of the insert and erase
members. They correspond, logically, to the requirement the same memory
not be passed to free() twice. More general functions that don't impose
such preconditions must contain a loop. They would be easy to add (along
with general set union, intersection, and complement-intersection), and I
expect that will happen eventually, but they would not be a substitute for
the more basic operations.
</quote>

If I read this correctly, Spirit's low level range_run class might be the
eventual result you are looking for?

BTW, your other concerns e.g. less-than < vs. generic compare is of
course trivial to implement. Finally, I am still not sure if a map is a good
choice, performance wise, nevertheless, a higher level class might
use a policy to choose between the vector/map implementation.

Cheers,
--Joel


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk