Boost logo

Boost :

Subject: [boost] boost::compressed_bitset
From: Zachary Turner (divisortheory_at_[hidden])
Date: 2009-06-10 21:57:33


I know there's over a zillion bitset implementations, but often times I have
found myself in need of a bitset that is efficient (space-wise) at storing
millions, billions, or even trillions (no joke) of bits. For this I have
developed an in-house class which stores bits in a structure vaguely
reminiscent of a "T-Tree" (http://en.wikipedia.org/wiki/T_tree). You can
think of it basically as an std::map<boost::uint64_t, boost::uint64_t> where
<key, value> represents the *interval* of bits [key, key+value-1]. So it's
basically run-length encoded, with logarithmic lookups instead of linear
lookups. Obviously the worst case scenario is where bits alternate 10101...
throughout the entire bitset, and this structure will be terrible in that
case. But where there are frequent runs of 0 and 1 bits, this gives orders
of magnitude savings in space.

Since a map is sorted on key, it's still a valid binary search tree and one
can find the node for the k'th bit by map::lower_bound(k) and quickly check
whether the bit is set by testing whether k is in [iter->first ,
iter->first+iter->second). The most difficult part is insertion / deletion
since it involves splitting (or merging) intervals and rebalancing but the
complexity is still no worse than that of a normal rebalancing avl tree.

I almost exclusively use this structure in a way that interval nodes
represent sequences of 1 bits, but it can just as easily be used the other
way around, where intervals represent sequences of 0 bits. In fact now that
I think about it probably doesn't even matter.

Performing bitwise operations on such a structure is very fast. All you
need to do is expand / reduce the intervals appropriately. Lookup /
assignment is obviously not constant time but of course that is the
tradeoff. Forward / reverse iteration is still very fast however as you
only need to take the non-constant time hit once on begin(). It is also
very fast to iterate only 1 bits or only 0 bits for the same reason, which
can be quite slow with normal bitsets.

The implementation I currently have is very unsuitable for boost for many
reasons, but if there is enough interest I will (slowly) work on adapting
this to be more generic / usable by a wider audience, of course with
suggestions and input from the community.

Zach


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk