From: Jeremy Siek (jsiek_at_[hidden])
Date: 2002-07-30 22:30:21
I've very happy to hear that you'd like to work on optimizing
dynamic_bitset. I no expert bit twizzler, I just needed dynamic_bitset bad
enough to go ahead and implement it. I'd be happy to help you get your
optimizations into the code.
On Wed, 31 Jul 2002, Gennaro Prota wrote:
> My attention has mainly focuses on the performances. Again, if it's
> too early to consider that, please let me know.
This is the perfect time.
> IMHO many functions of the library can be effectively optimized (for
> instance, I suspect that the shift operators <<, >>, >>= and <<= can,
> though I've not tried yet). Also they can be optimized in a portable
I don't doubt that. The current implementations are just the
straightforward, easy to verify ones.
> Since I was not sure whether others are interested in speeding up the
> library now, I limited myself to one member function: count(). The
> attached files contains two alternative implementations that use a
> lookup table and turn out to be almost 6 times faster than the current
> one. For instance here's an excerpt of the timings on my machine:
> Timings for dynamic_bitset<unsigned int> [1000000 iterations]
> Current implementation: 19.688 s
> With 256-element table: 2.954 s
> With 16-element table: 3.335 s
> To avoid entangling the original code, the new stuff is in two
> separate files: dynamic_bitset_table.hpp and table_proposal.impl. The
> changes to dynamic_bitset.hpp have been limited to the declaration of
> the new member functions count_tb and count_tbreduc and the #include
> of dynamic_bitset_table.hpp.
> Ok, I hear what you are all saying... how do you choose the size of
> the table? :-) Yes, that's true. The provided solution is not fully
> portable, because some C++ implementations may require a larger table
> (depending on the value of CHAR_BIT) or a slightly different coding of
> the count function (in order to break-up a CHAR_BIT-sized byte in more
> than two parts - see the code for details) but the gain in speed is
> remarkable, and the majority (all?) the C++ implementations currently
> supported by boost have CHAR_BIT == 8. So I think it's worth
> considering anyway, at least for some platforms.
> Two words on the attached code: as I said the idea (used by STLPort
> too) is to use a table in which the element of index n is the number
> of digits in the binary representation of n. The easiest solution is
> too use a table of 1+UCHAR_MAX elements, but, as the tests show, it
> can be easily reduced to 1 << ((CHAR_BIT+1)/2) elements without
> significant loss of performance, at least on an Intel box. For
> instance the count_tbreduc actually uses only 16 elements of the
> On an implementation where CHAR_BIT == 9, 32 elements of the table
> would be used. And the count of each byte would be computed as the
> count of the lowest 5 bits + the count of the highest 4 bits.
> If CHAR_BIT is, let's say, 32 then this "subdivision in two parts" of
> each byte would require a table of 2 pow 16 = 65536 elements; thus a
> solution could be to modify the code of the count_tbreduc function
> num +=
> detail::count<>::table [*p & lowmask] // lowest (CHAR_BIT+1)/2 bits
> +detail::count<>::table [*p >> (CHAR_BIT/2)]
> to use, for instance, 4 addenda.
> Now, what's the situation? All the platforms currently supported by
> boost would greatly benefit from this technique. Do we want to give it
> up for sake of portability?
> If not, I think at least 2 options are possible:
> a) providing an implementation with a 16- or 32-slot table; then using
> conditional compilation to choose between that table implementation
> and the portable one. If the combination table-size/count()-loop is
> not suitable for a given C++ platform boost automatically switches to
> the portable 1-bit-a-time code.
> b) as above, but with the possibility for the user to provide his own
> bit-count solution for a single byte. For instance changing the loop
> above in:
> num += user_defined_count (*p);
> Again, suitable macros could be used to control whether to switch to
> the slow solution or asking the user to provide the
> user_defined_count() function.
Solution a) sounds like the better one to me. Why don't you send me an
update of dynamic_bitset.hpp (better grab the latest version from today)
with count() changed using this solution. Try to make the changes such
that it is easy for me to tell what you did via diff.
> So far with count(). As I said it's likely that other functions, e.g.
> the shift operators, could be optimized. If there are no objections I
> can see whether making a two-pass shift (first pos/sizeof(Block)
> blocks, then pos%sizeof(Block) bits) is faster. Anyone interested?
Go for it :)
> Now, some questions :-)
> 1. the code, it seems, does not compile on VC++. What's the intent?
> Should it in the future?
It has in the past, and should again. I'll take a look at it, and
if you have any suggestions, let me know.
> 2. the constructor
> dynamic_bitset(size_type num_bits, unsigned long value, const
> Allocator& alloc)
> could be uneasy to use for some people (and slightly error prone)
> because it requires to remember the order of the arguments for
> num_bits and value. Were alternatives considered?
Yes, this is a pervasive problem with C++, no named/keyword parameters :(
I've got a sledgehammer of a solution that I used in the BGL, but I
didn't think this function was worth that complication.
> 3. Many member functions use the literals '1' and '0' where the type
> of the character is charT. Is this really generic? (Not that I know
> how to otherwise, but I'm wondering...)
Shrug. If you have any ideas let me know.
Jeremy Siek http://www.osl.iu.edu/~jsiek/
Ph.D. Student, Indiana Univ. B'ton email: jsiek_at_[hidden]
C++ Booster (http://www.boost.org) office phone: (812) 855-3608
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk