Boost logo

Boost :

Subject: Re: [boost] [Endian] Proposed endian integer types
From: Beman Dawes (bdawes_at_[hidden])
Date: 2011-09-09 08:42:13


On Thu, Sep 8, 2011 at 1:19 PM, Phil Endecott <
spam_from_boost_dev_at_[hidden]> wrote:

> Dear All,
>
> I have had a quick look at the proposed endian::big|littleN_t types. A
> couple of comments:
>
> - I was expecting to find that these types would use the conversion
> functions that I've already looked at at the lowest level, so that
> optimisations made to those functions would be useful here too. But
> instead, these classes seem to use their own byte-shuffling code. This
> seems like an odd design.
>

The conversion functions work on aligned built-in integers. Class endian,
except for the aligned specialization, works internally on unaligned char
data. Thus using the conversion functions would involve conversion to or
from a temporary, and a byte copy from or to the temporary. I'm wondering
why you think this is worthwhile?

>
> - I'm not convinced that these types cause the conversion to happen at the
> optimum time. For example:
>
> struct file_header {
> big32_t n_things;
> ....
> };
>
> int main()
> {
> file_header h;
> h.n_things = 0;
> while (....) {
> ++h.n_things;
> ....
> }
> write(h);
> ....
> }
>
> Here the conversion is happening twice every time that n_things is
> incremented. It would be better to instead do the conversion once, just
> before the file_header is written. The same thing applies in reverse when
> reading from a file.
>

Yes.

>
> (Surely this is the most common use-case?)
>

Not for the real-world applications I've worked on.

In those applications, several things are different:

* Most of the variables in the record are never touched. (They are, of
course, touched when the file the record is in is built, but that happens
once. The file is used many, where "many" may be in the millions, times but
the record is only built once.)

* If a variable is touched, it is only to access the value once.

* A few variables do have much more complex uses, but these uses are not in
performance critical code.

There are certainly plenty or real-world applications where the use pattern
you give above is typical. Is there are reason you couldn't use the
conversion functions if you want to do a bulk reorder once after reading a
record, and then again once before writing a record? Particularly assuming
that the conversion functions also have templated forms that will support
any reasonable built-in or UDT type.

>
> I have been wondering if there is a better design that can avoid this. If
> we had some sort of struct introspection, we could store the fields in
> native byte order and then
>
> template <typename T>
> T external_representation(const T& t)
> {
> T res(t);
> for each field of T {
> res.field.reorder();
> }
> }
>
> ++h.n_things; // cheap
> write(external_representation(**h)); // conversion happens once here.
>
> Of course we don't have introspection of structs so we can't do that.
> Maybe someone else has another idea.
>

I've always provided read and write functions that take care of any
reordering needed.

struct file_header {
  big32_t thing_1;
  big32_t thing_2;
  ....
  void reorder(){
     thing_1 = endian::reorder(thing_1); // assume conversion.hpp uses
value returning reorder()
     thing_2 = endian::reorder(thing_2);
     ...
  }
  bool read(... file) { // return false on eof
    file.read(...);
    if (file.eof()) return false;
    reorder();
    return true;
  }
  void write(... file) {
    reorder();
    file.write(...);
  }
};

Is that the best we can do with today's language? I haven't worried about it
because the apps I work on have only a few formats like the above, so having
to add a few additional functions isn't a big deal.

Thanks for the comments,

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk