Boost logo

Boost :

From: Gennaro Prota (gennaro_prota_at_[hidden])
Date: 2002-07-31 06:40:13


On Tue, 30 Jul 2002 22:30:21 -0500 (EST), in
gmane.comp.lib.boost.devel you wrote:

>Hi Gennaro,
>
>I've very happy to hear that you'd like to work on optimizing
>dynamic_bitset. I no expert bit twizzler, I just needed dynamic_bitset bad
>enough to go ahead and implement it. I'd be happy to help you get your
>optimizations into the code.
>
>On Wed, 31 Jul 2002, Gennaro Prota wrote:
>[snip]
>> My attention has mainly focuses on the performances. Again, if it's
>> too early to consider that, please let me know.
>
>This is the perfect time.
>
>> IMHO many functions of the library can be effectively optimized (for
>> instance, I suspect that the shift operators <<, >>, >>= and <<= can,
>> though I've not tried yet). Also they can be optimized in a portable
>> way.
>
>I don't doubt that. The current implementations are just the
>straightforward, easy to verify ones.
>
>> Since I was not sure whether others are interested in speeding up the
>> library now, I limited myself to one member function: count(). The
>> attached files contains two alternative implementations that use a
>> lookup table and turn out to be almost 6 times faster than the current
>> one. For instance here's an excerpt of the timings on my machine:
>>
>>
>> Timings for dynamic_bitset<unsigned int> [1000000 iterations]
>> --------------------------------------------------
>> Current implementation: 19.688 s
>> With 256-element table: 2.954 s
>> With 16-element table: 3.335 s
>>
>>
>> To avoid entangling the original code, the new stuff is in two
>> separate files: dynamic_bitset_table.hpp and table_proposal.impl. The
>> changes to dynamic_bitset.hpp have been limited to the declaration of
>> the new member functions count_tb and count_tbreduc and the #include
>> of dynamic_bitset_table.hpp.
>>
>>
>> Ok, I hear what you are all saying... how do you choose the size of
>> the table? :-) Yes, that's true. The provided solution is not fully
>> portable, because some C++ implementations may require a larger table
>> (depending on the value of CHAR_BIT) or a slightly different coding of
>> the count function (in order to break-up a CHAR_BIT-sized byte in more
>> than two parts - see the code for details) but the gain in speed is
>> remarkable, and the majority (all?) the C++ implementations currently
>> supported by boost have CHAR_BIT == 8. So I think it's worth
>> considering anyway, at least for some platforms.
>>
>> Two words on the attached code: as I said the idea (used by STLPort
>> too) is to use a table in which the element of index n is the number
>> of digits in the binary representation of n. The easiest solution is
>> too use a table of 1+UCHAR_MAX elements, but, as the tests show, it
>> can be easily reduced to 1 << ((CHAR_BIT+1)/2) elements without
>> significant loss of performance, at least on an Intel box. For
>> instance the count_tbreduc actually uses only 16 elements of the
>> table.
>>
>> On an implementation where CHAR_BIT == 9, 32 elements of the table
>> would be used. And the count of each byte would be computed as the
>> count of the lowest 5 bits + the count of the highest 4 bits.
>> If CHAR_BIT is, let's say, 32 then this "subdivision in two parts" of
>> each byte would require a table of 2 pow 16 = 65536 elements; thus a
>> solution could be to modify the code of the count_tbreduc function
>>
>> num +=
>> (
>> detail::count<>::table [*p & lowmask] // lowest (CHAR_BIT+1)/2 bits
>> +detail::count<>::table [*p >> (CHAR_BIT/2)]
>>
>> );
>>
>>
>> to use, for instance, 4 addenda.
>>
>>
>> Now, what's the situation? All the platforms currently supported by
>> boost would greatly benefit from this technique. Do we want to give it
>> up for sake of portability?
>>
>> If not, I think at least 2 options are possible:
>>
>>
>> a) providing an implementation with a 16- or 32-slot table; then using
>> conditional compilation to choose between that table implementation
>> and the portable one. If the combination table-size/count()-loop is
>> not suitable for a given C++ platform boost automatically switches to
>> the portable 1-bit-a-time code.
>>
>>
>> b) as above, but with the possibility for the user to provide his own
>> bit-count solution for a single byte. For instance changing the loop
>> above in:
>>
>> num += user_defined_count (*p);
>>
>> Again, suitable macros could be used to control whether to switch to
>> the slow solution or asking the user to provide the
>> user_defined_count() function.
>
>Solution a) sounds like the better one to me. Why don't you send me an
>update of dynamic_bitset.hpp (better grab the latest version from today)
>with count() changed using this solution. Try to make the changes such
>that it is easy for me to tell what you did via diff.

Great :) Actually I thought that b) is better: on a machine where the
provided implementation cannot be used it would resort to the slow
one. But if the user wants to improve performances he can still define
a suitable BOOST_USE_MY_BYTE_COUNT_FN_PLEASE and provide his own code.

Anyhow I'll work at it. Today I'll be busy but tomorrow I will
certainly have the time to do the changes (take into account my GMT+1
time-zone) . BTW, I have already attached the new count
implementations but the newsreader sent the attached file in a
separate message (i.e. it sent a message with the text only, and
another one with the files only). Moreover, since the message with the
files was the first to appear on my newsserver I sent all the stuff
again (thinking that the text had been stripped), so you should see
now 4 messages (yup!): those with '0/1' in the subject contain the
text and those with '1/1' contain the modified files. I'm saying this
just to make sure that you saw the code.

>
>> So far with count(). As I said it's likely that other functions, e.g.
>> the shift operators, could be optimized. If there are no objections I
>> can see whether making a two-pass shift (first pos/sizeof(Block)
>> blocks, then pos%sizeof(Block) bits) is faster. Anyone interested?
>
>Go for it :)
>
>> Now, some questions :-)
>>
>> 1. the code, it seems, does not compile on VC++. What's the intent?
>> Should it in the future?
>
>It has in the past, and should again. I'll take a look at it, and
>if you have any suggestions, let me know.
>
>> 2. the constructor
>>
>> dynamic_bitset(size_type num_bits, unsigned long value, const
>> Allocator& alloc)
>>
>> could be uneasy to use for some people (and slightly error prone)
>> because it requires to remember the order of the arguments for
>> num_bits and value. Were alternatives considered?
>
>Yes, this is a pervasive problem with C++, no named/keyword parameters :(
>I've got a sledgehammer of a solution that I used in the BGL, but I
>didn't think this function was worth that complication.
>

Yes. There are various solutions on top of my head but they all are
quite verbose for the user :-(

>> 3. Many member functions use the literals '1' and '0' where the type
>> of the character is charT. Is this really generic? (Not that I know
>> how to otherwise, but I'm wondering...)
>
>Shrug. If you have any ideas let me know.
>

As I said I don't know how to deal with the problem. But this seems
one of the areas of expertise of Dietmar Kuehl. What about asking him?
:-) This problem is IMHO quite general, since (as pointed out by Terje
Slettebø in the recent discussion about lexical_cast) it appears every
time you use literals with arbitrary character types.

Genny.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk