Boost logo

Boost :

Subject: Re: [boost] boost::compressed_bitset
From: Scott McMurray (me22.ca+boost_at_[hidden])
Date: 2009-06-11 12:11:05


2009/6/11 Zachary Turner <divisortheory_at_[hidden]>:
>
> I will take a look at the sparsebitvector.  It does seem quite similar just
> from how you described as a map<offset, bitset<128> >.  The first thing I
> wonder though (without having read the link yet), is why does the value need
> to be an entire bitset?.  Perhaps they are also trying to make it efficient
> in the case where the bitset is "almost" sparse, but there are occasional
> sequences of interleaved 0s and 1s.  That's an interesting idea I hadn't
> thought of.  In any case it does seem like there is some potential for
> overlap so I'll read the link in a little bit.
>

Well, by the time you have a map<intptr_t, T>, the size of a node is
at least 3 pointers plus the amount for T, so for alignment reasons
you might as well store at least a pointer's worth of bits in it,
since its free to do so, and then it's quite tempting to store more,
since tripling the number of bits per node only costs 50% extra memory
at worst, the first time, and can reduce it in some cases.

And on some platforms, with the way new is implemented, using the same
amount of storage for bits as for the allocator and node overhead
actually lets you store something like 8 pointers worth of bits per
node.

> The two usage scenarios I can give you off the top of my head are one that
> I'm using it for, and one that the Windows Kernel uses the same data
> structure for.  I use it to store a filesystem allocation bitmap.  For every
> block on the filesystem (where the block size can be 1KB, 4KB, etc depending
> on user settings), there is 1 bit to say whether the block is allocated or
> unallocated.  If you work out the math, for a 1TB drive you can end up with
> a bitset taking up 33MB of data if you use an uncompressed bitset.  But on
> the other hand, massive storage capacity can often be reminiscent of a RAID
> array, which is likely to be defragmented.  And of course most people in
> general tend to do some sort of defragmentation pretty often.  Although even
> if that were not the case, typical filesystem allocation patterns even for
> heavily fragmented drives tend to have lots of runs of 1s or 0s.
>

Ah, I see. The block allocator will already be trying pretty hard to
avoid the pathological case for the range version. Smart.

LLVM's version uses a linked list, keeping an iterator to an internal
element so adjactent requests can be O(1). I wonder whether an
intrusive::splay_tree might make an interesting backend for
compressed_bitset for similar reasons?

Not that this helps in any way, but whole "storing a run of one bits
as one element" reminds be of segmented binary numbers, which I first
saw from Okasaki's thesis.

~ Scott


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk