Boost logo

Boost :

From: RGProlog_at_[hidden]
Date: 2003-10-07 17:56:35


Is there any interest in a class implementing compression algorithm for
transition sparse bitsets. As an example of the algorithm's utility let's say we
want to create a uniform minute by minute record of weekly attendance applicable
to all of the existing web users worldwide. The user's record thus conceived
must cover all, let's say 10,000,000 of the existing web sites and will be 60
* 24 * 7 * 10,000,000 or 100,800,000,000 bits long. A simple observation that
makes this proposition feasible, though, is that first, rather than storing a
given user's bitset attendance record, what is sufficient for us to store
instead is the user's logon and logoff times for the websites the user attended
which in turn are represented by the bitset's transitions and second the number
of transitions in the bitset is subject to the user's physical ability to
attend only that many Web sites and no more. Thus the bitset's transition count,
which in the example at hand may run in hundreds at most, is small relative to
the bitset's mega size. The fast proliferating sources of information, be it
the Internet or in any other mass medium, combined with the user's ability to
absorb but a tiny part of the information available plus the necessity to
research and monitor global flow of information, conditions that are not likely to
change, makes transition sparse bitsets arguably the most important and
ubiquitous bitsets in the existence. The ts_bitset class stores a bitset of variable
length as a std::vector of the bitset transitions, implements the relevant
part of the std::bitset functionality and extends it to the subintervals of the
bitset. For example std::bitset::count() function has
ts_bitset::count(pos1,pos2) counterpart which returns in O(n) time, n being the bitset's transition
count, and std::bitset::any() function has a ts_bitset::any(pos1, pos2)
counterpart which returns in Log(n) time. The ts_bitset class also implements
operators &=, |=, and ^= that act on two equal size bitsets and which, along with
ts_bitset member functions like reverse () or flip (), were not speed wise
feasible for std::bitset implementation. All of the ts_bitset member functions return
in O(n) time (O(m + n) time when 2 bitsets are involved) or faster.



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk