Boost logo

Boost :

From: Dave Harris (brangdon_at_[hidden])
Date: 2002-11-17 11:28:34


In-Reply-To: <01C28DC2.8F345AA0_at_[hidden]>
>From the headers...
> typedef unsigned char version_type; // upto 255 versions
> namespace serialization_detail {
> typedef unsigned short class_id_type; // upto 64k kinds
> // of objects
> typedef int object_id_type; // upto 2G objects
> }

It seems to me these limits are arbitrary, and in some cases rather low.
Wouldn't it be better, and more general, to use int or long?

On a related note, I think variable length integers ought to be supported
as primitive. For example, consider something like:

    void basic_oarchive::save_vri( unsigned long x ) {
        bool more_to_come = true;
        
        while (more_to_come) {
            unsigned char low_bits = x & 0x7f;
            x >>= 7;
            more_to_come = (x == 0);
            unsigned char high_bit = more_to_come ? 0x80 : 0x00;
            *this << (high_bit | low_bits);
        };
    }

    unsigned long basic_iarchive::load_vri() {
        unsigned long x = 0;
        bool more_to_come = true;

        while (more_to_come) {
            unsigned char bits;
            *this >> bits;
            x = (x << 7) | (bits & 0x7f);
            more_to_come = (bits & 0x80) != 0;
        }
        return x;
    }

This encodes an unsigned int as a variable number of bytes. The low 7 bits
of each byte contribute to the number, and the high bit says whether there
are more bytes to come.

Although I've used this technique in the past I haven't tested this exact
code, so it may have bugs or be in the wrong place. If we are saving in an
ASCII format we wouldn't want to do this because ASCII is intrinsically
variable length anyway. And of course, we cannot use it as the default way
of writing integers because for some numbers it is less efficient (with
this scheme the overhead can never be more than a byte).

That said, when used appropriately the benefits include:

(a) Smaller archives in the common case.
(b) Faster loading and saving (because of there being fewer bytes to move
around).
(c) Avoidance of arbitrary limits caused by hardwired sizes.
(d) Extra portability due to not relying on the number and ordering of
bytes in primitive types.

Of course something like this can be built on top of the current library,
but if it is included then the library can use it for its bookkeeping
data. It can be used for things like class_id_type and the lengths of
strings and vectors. Then the library will get benefits (a)-(d).

For example, a string like "hello" is currently stored (by boarchive) with
a size_t length, which on my machine is 32 bits, taking 9 bytes
altogether. If the variable length format is used, it will take 6 bytes, a
33% saving. Further, it can be reloaded into a machine for which size_t is
only 16 bits.

-- Dave Harris


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk