Subject: Re: [boost] [interval container (itl)] A pretty large bitset
From: Zachary Turner (divisortheory_at_[hidden])
Date: 2009-11-03 01:46:51
On Mon, Nov 2, 2009 at 11:25 AM, Joachim Faulhaber <afojgo_at_[hidden]>wrote:
> I'd like to ask Zachary Turner to comment on the
> use case. Can this be useful in your field?
> Moreover I'd like to encourage everyone to give feedback
> on my interval container library, report problems, bugs
> and share suggestions. If you find this stuff useful
> please volunteer as review manager for the submitted
> library Boost Interval Containers.
> Note, that the new example on 'large_bitsets'
> is not yet included in the library submitted for review
> in the vault but it is contained in the sandbox.
> I am currently preparing an update of the review version of
> the itl, that is adjusted and corrected for the feedback,
> suggestions and patches that I have received during the
> last weeks. I will integrate further feedback and account
> for bug reports if you have any.
This is interesting and at some point I might download the library and play
with it. Is there any easy way for me to determine the memory usage of an
interval_map / interval_set?
It's hard to say whether this will offer an advantage in my use case because
being that I'm interested in primarily a) which blocks on a filesystem are
allocated, and b) which blocks on a filesystem have changed since the last
time I checked, it depends heavily on the usage patterns of the user.
The which-blocks-are-allocated bitset would be more likely to *not* benefit
from this type of additional compression, or at least not significantly,
because it would be very rare on a disk to have a significantly high
concentration of alternating bits, or extremely short runs.
The which-blocks-have-changed-since-last-time bitset is a little more
uncertain. Obviously file access patterns aren't truly random, but let's
just say in the worst case scenario of completely random disk access, and
hence completely random bits being flipped from 0 to 1, how would this
affect the memory usage of this template? I feel like this is kind of a
difficult situation to analyze so I'm not looking for an exact O() storage
complexity or anything, but just any insight you have to offer. My current
solution compresses only intervals and thus also suffers in the case of
random access, but I wonder if this solution would allow me to save space.
What kind of iteration support is provided? I often need to iterate through
each bit that is set to 1, where the value_type of the iterator is the
0-based index of the bit in the bitset. So for example if I have a bitset
representing the bit string 0110110101100 I would want, or at least I would
want to be able to easily create, an iterator that returns 1, 2, 4, 5, 7, 9,
10. The other main important thing would be the ability to do in-place
bitwise |, &, ~. Advancing an iterator also needs to have constant-time
complexity for me, and the bitwise operators should probably be no greater
than n log(n). O(n) would be ideal on the bitwiswe operators, but I don't
think my current implementation supports that due to the need to do interval
merging and balancing.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk