Boost logo

Boost :

Subject: Re: [boost] boost::compressed_bitset
From: Zachary Turner (divisortheory_at_[hidden])
Date: 2009-06-11 08:58:11


On Thu, Jun 11, 2009 at 1:52 AM, Scott McMurray <me22.ca+boost_at_[hidden]>wrote:

> 2009/6/10 Jeffrey Bosboom <jbosboom_at_[hidden]>:
> >> I almost exclusively use this structure in a way that interval nodes
> >> represent sequences of 1 bits, but it can just as easily be used the
> other
> >> way around, where intervals represent sequences of 0 bits. In fact now
> >> that
> >> I think about it probably doesn't even matter.
> >
> > Since it will use less space if the bitset is sparse with occasional 0 or
> 1
> > bits, this sounds like a good choice for a nontype template parameter.
> > Obviously the user could just invert the sense of the bits themselves
> > (instead of representing presence in a set, it could represent "lack of
> > absence"), but a template parameter would be more user-friendly. A
> > converting constructor could be provided for converting between the two.
> >
>
> I'm not sure it's worth the complexity. It's usually not that hard to
> invert a condition in user code, since renaming the variable from
> visited to pending, free to allocated, or similar makes it just about
> as readable.
>
> I remember hearing about an interval template library; Is there any
> possible useful overlap here? Also, at first glace I thought you were
> talking about what I'd call a sparse_bitset, where the storage is an
> optimization of a map<offset, bitset<128> >. (Specifically I'm
> thinking of the LLVM class described here:
> http://llvm.org/docs/ProgrammersManual.html#dss_sparsebitvector)
> Obviously the interval method is more efficient for long runs of ones,
> but the offset one is much better for short runs. Can you elaborate
> more on your usage? Should boost maybe provide both?
>
> And out of curiosity, did you compare the T-tree to other trees?
> Perhaps there's also an intrusive::t_tree hiding in here =)

To be honest I don't even know a whole lot about T-Trees. I just had come
up with this idea for a compressed bitset and then started searching around
to see if similar data structures had ever been invented, and then I found a
t-tree which seemed to be reminiscent.

I will take a look at the sparsebitvector. It does seem quite similar just
from how you described as a map<offset, bitset<128> >. The first thing I
wonder though (without having read the link yet), is why does the value need
to be an entire bitset?. Perhaps they are also trying to make it efficient
in the case where the bitset is "almost" sparse, but there are occasional
sequences of interleaved 0s and 1s. That's an interesting idea I hadn't
thought of. In any case it does seem like there is some potential for
overlap so I'll read the link in a little bit.

The two usage scenarios I can give you off the top of my head are one that
I'm using it for, and one that the Windows Kernel uses the same data
structure for. I use it to store a filesystem allocation bitmap. For every
block on the filesystem (where the block size can be 1KB, 4KB, etc depending
on user settings), there is 1 bit to say whether the block is allocated or
unallocated. If you work out the math, for a 1TB drive you can end up with
a bitset taking up 33MB of data if you use an uncompressed bitset. But on
the other hand, massive storage capacity can often be reminiscent of a RAID
array, which is likely to be defragmented. And of course most people in
general tend to do some sort of defragmentation pretty often. Although even
if that were not the case, typical filesystem allocation patterns even for
heavily fragmented drives tend to have lots of runs of 1s or 0s.

The usage in the Windows kernel is similar, although they use it to store
what they call the "VAD tree" (virtual address descriptor). For every
process running on the system, there is a 4GB or larger address space
(depending on architecture). They use this structure to determine which
parts of this 4GB address space are allocated and which are not, so they can
quickly satisfy allocation requests. It's likely that linux uses something
similar but I don't know much about linux kernel.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk