Boost logo

Boost :

From: Tobias Schwinger (tschwinger_at_[hidden])
Date: 2007-08-09 05:12:25


Hi,

just some comments, having discussed some related material just
yesterday (note I'm *not* an author of the library).

Rei Thiessen wrote:
> Hello,
>
> (my version of the library is intrusive.2007-03-04.zip. Is there a more
> recent version?)
>
> I have three suggestions:
>
> 1) The containers uses a modulo operation to map the hash value range into
> the bucket index domain. When using a bucket size that is an order of 2, the
> excess cost of the div instruction is unneeded. This was a critical issue
> for me.

Interesting. Note, that this optimization does not pay off in general:
Worse distribution (no avalanche effect because bits are cut off) ==>
more probing ==> more data cache misses.

> There can be a new templated type parameter for specifying a functor that
> takes the hash output value and the bucket size as inputs and outputs the
> bucket index. The default type of this template parameter can simply be
> std::modulus.

Please, no more type template parameters! Add another type member to the
traits of that container and keep the primary interface simple, instead.

>
> 2) Each container stores the bucket size. This is too an unneeded excess in
> certain circumstances. If the containers had a (optional) templated constant
> parameter that determined its bucket size, then the container memory
> footprint can be reduced at the expense of bloating the code segments with
> multiple instantiations. This memory saving is crucial when deploying
> numerous small hash container objects.

1) and 2) can nicely be combined (see below).

> This new templated constant parameter has interface ramifications. We would
> require two specializations of the container class: the run-time bucket size
> specialization is what we have today; for the compile-time bucket size
> container, the constructors would not have "size_type buckets_len"
> parameters.

Well, not necessarily:

     template< size_t BucketSize >
     class constant_bucket_size
     {
       public:

         constant_bucket_size()
         { }

         size_t get_bucket_size() const
         { return BucketSize; }

         size_t modulate(size_t value)
         {
           // metaprogram that decides whether to use & or %
         }
     };

     class dynamic_bucket_size
     {
         size_t val_bucket_size;
       public:

         /* implicitly converting */
         dynamic_bucket_size(size_t bucket_size)
         { }

         size_t get_bucket_size() const
         { return this->val_bucket_size; }

         size_t modulate(size_t value)
         {
           return value % thos->val_bucket_size;
         }
     };

     template // ...
     class container
     // ...
     {
       public:
         typedef typename Traits::bucket_size bucket_size;

         container(... , bucket_size const& bs =
             bucket_size() );

         void set_bucket_size(bucket_size const& bs);
     };

If 'dynamic_bucket_size_policy' is configured in the container's traits,
the ctor and the 'set_bucket_size' member function accept an integer. If
'constant_bucket_size_policy' is used an attempt to use an integer will
yield a compile error. EBCO can be exploited for a zero-overhead
implementation.

> 3) Now, if we add another templated type parameter for specifying a functor
> that takes the object 'this' pointer as input and outputs the memory address
> of the bucket array, then we can reduce the memory cost of each container to
> zero.

That's pretty much "non-intrusive Intrusive", again, which we have been
discussing yesterday ;-).

So 3) can probably be generalized further and applied to all containers
(see yesterday's posts).

BTW: Zero-sized objects do not exist in C++ (EBCO allows empty
subobjects to be mapped to the same address and thus take up no extra
space - a concrete object, however, takes up at least one byte, so
pointer comparison can be used for checking object identity).

> Combining 2) and 3), we can develop a single concept: As an example...

<code>

Why are you assuming that a constant bucket size should always be a
power of two?

Regards,
Tobias


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk