Subject: Re: [boost] [interval container (itl)] A pretty large bitset
From: Joachim Faulhaber (afojgo_at_[hidden])
Date: 2009-11-03 08:47:04
2009/11/3 Zachary Turner <divisortheory_at_[hidden]>:
> On Mon, Nov 2, 2009 at 11:25 AM, Joachim Faulhaber <afojgo_at_[hidden]>wrote:
>> I'd like to ask Zachary Turner to comment on the
>> use case. Can this be useful in your field?
> This is interesting and at some point I might download the library and play
> with it. Is there any easy way for me to determine the memory usage of an
> interval_map / interval_set?
interval_map / interval_set are implemented via std::map / std::set.
So for every node carrying an interval / interval-value-pair
you have three pointers (left,right,parent) and a color bit.
The interval itself needs 2*sizeof(domain_type) plus
one byte for the border information. Finally a map node
has to provide space for the value of it's codomain type.
The number of interval or interval-value nodes is
returned by the function interval_count() or iterative_size()
on all interval containers.
> It's hard to say whether this will offer an advantage in my use case because
> being that I'm interested in primarily a) which blocks on a filesystem are
> allocated, and b) which blocks on a filesystem have changed since the last
> time I checked, it depends heavily on the usage patterns of the user.
> The which-blocks-are-allocated bitset would be more likely to *not* benefit
> from this type of additional compression, or at least not significantly,
> because it would be very rare on a disk to have a significantly high
> concentration of alternating bits,
as I have already mentioned in this thread, the periodic bit compression
is a nice byproduct but should not be overemphasized. Clearly we won't
find periodic patterns very often in file allocation tables ;)
> or extremely short runs.
... hmm, I thought, due to fragmentation there could be more
narrow sequences of changing block patterns that could profit
from bitcompression ...
> The which-blocks-have-changed-since-last-time bitset is a little more
> uncertain. Obviously file access patterns aren't truly random, but let's
> just say in the worst case scenario of completely random disk access, and
> hence completely random bits being flipped from 0 to 1, how would this
> affect the memory usage of this template? I feel like this is kind of a
> difficult situation to analyze so I'm not looking for an exact O() storage
> complexity or anything, but just any insight you have to offer. My current
> solution compresses only intervals and thus also suffers in the case of
> random access, but I wonder if this solution would allow me to save space.
It would allow to save space only, if in addition to interval compression
your blocks-have-changed-patterns allow for additional bit compression:
Clusters of changed-blocks within not too big ranges, that
can be compressed into 'local' bitsets.
In the worst case scenario of randomly and sparsely distributed bits
'large_bitset' has no advantage over an itl::interval_set or your
compressed_bitset implementation (if I understood it right).
> What kind of iteration support is provided? I often need to iterate through
> each bit that is set to 1, where the value_type of the iterator is the
> 0-based index of the bit in the bitset. So for example if I have a bitset
> representing the bit string 0110110101100 I would want, or at least I would
> want to be able to easily create, an iterator that returns 1, 2, 4, 5, 7, 9,
mini::large_bitset started as a non trivial example on the usefulness
of itl::interval_maps. So this implementation
aims to be very *mini*mal. Also the more elaborated version 'interval_bitset'
is a first shot and not yet very much field tested (therefore not part
of the core library submitted for review).
With itl::interval_bitset you can currently iterate over intervals
only. I am generally reluctant to implement element iterators
for interval containers, because this would be IMO a permanent
invitation for inefficient usage. Apart from that, it wouldn't
be difficult to provide such an iterator for itl::interval_bitset.
> The other main important thing would be the ability to do in-place
> bitwise |, &, ~.
There are, even in mini::large_bitset, three different overloads for
bitwise operators (except for ~) that are performing inplace,
if the inserted range is already in the container.
As a useful addition I think about adding operators o= to
combine whole (unit-) bitsets at a given offset
interval_bitset& interval_bitset::operator o=
(const std::pair<domain_type, bitset_type>& bits_at);
> Advancing an iterator also needs to have constant-time
> complexity for me, ...
that would be no problem (apart from the reluctance problem ;)
> ... and the bitwise operators should probably be no greater
> than n log(n).
O(log(n)) for bits (elements)
O(log(n)) for unit bitsets
O(n) for intervals of elements (interval defined bitset)
O(m log(n+m)) for interval_bitsets of iterative_size n and m
Thank you for all your comments and suggestions!