Boost logo

Boost :

Subject: Re: [boost] [Review] Boost.Endian mini-review
From: Peter Dimov (lists_at_[hidden])
Date: 2015-01-23 12:47:02


Joel FALCOU wrote:

> Is the library ready to be added to Boost releases?

Let me preface everything else I'll say by that: I think that the answer to
the above question is "yes", as every question raised during the review
seems to have been addressed.

That said, I want to point out that the library in its current state does
not support one approach to dealing with endianness, which I will outline
below.

Let's take the following hypothetical file format as an example, loosely
lifted from the docs:

[00] code (32 bit big endian)
[04] length (32 bit big endian)
[08] version (16 bit little endian)
[10] shape type (32 bit little endian)
[14] ...

All three approaches that the library supports involve declaring a struct
with the above members and expecting that this struct can be read/written
directly to file, which means that its layout and size must correspond to
the above description.

What I tend to do, however, is rather different. I do declare a
corresponding struct:

struct header
{
    int code;
    unsigned length;
    int version;
    int shape_type;
};

but never read or write it directly, which means that I do not need to make
sure that its layout and size are fixed.

Instead, in the function read(h), I do this (pseudocode):

read( header& h )
{
    unsigned char data[ 14 ];
    fread data from file;

    read_32_lsb( h.code, data + 0 );
    read_32_lsb( h.length, data + 4 );
    read_16_msb( h.version, data + 8 );
    read_32_lsb( h.shape_type, data + 10 );
}

Note that this does not require the machine to have a 32 bit int or a 16 bit
int at all. int can be 48 bits wide and even have trap bits. Which is, I
admit, only of academic interest today, but still.

The generic implementation of read_32_lsb is:

void read_32_lsb( int & v, unsigned char data[ 4 ] )
{
    unsigned w = data[ 0 ];

    w += (unsigned)data[ 1 ] << 8;
    w += (unsigned)data[ 2 ] << 16;
    w += (unsigned)data[ 3 ] << 24;

    v = w;
}

which works on any endianness.

This approach - as shown - does have a drawback. If you have an array of
802511 32 bit lsb integers in the file, and the native int is 32 bit lsb,
one read of 802511*4 bytes is vastly superior in performance to a loop that
would read 4 bytes and call read_32_lsb on the byte[4]. Which is why the
above is generally combined with

    void read_32_lsb_n( int * first, int * last, FILE * fp );

but this ties us to using FILE* and, given the rest of the library, is not
strictly needed because it offers us enough options to handle this case
efficiently.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk