Boost logo

Boost :

Subject: Re: [boost] [boost::endian] Request for comments/interest
From: Terry Golubiewski (tjgolubi_at_[hidden])
Date: 2010-06-02 00:37:39


Aha!

----- Original Message -----
From: "Tomas Puverle" <tomas.puverle_at_[hidden]>
Newsgroups: gmane.comp.lib.boost.devel
To: <boost_at_[hidden]>
Sent: Tuesday, June 01, 2010 11:01 AM
Subject: Re: [boost::endian] Request for comments/interest

>> >> struct Packet {
>> >> internet::IpHeader ipHeader;
>> >> internet::UdpHeader udpHeader;
>> >> UserMessage userMessage;
>> >> }; // Packet
>
> I do see one problem with this - you have defined your struct with a
> specified
> endianness.

Yes, the over the wire endianness is always (in my world) specified
independently and does not depend on the source or destination endianness.
Both endpoints use the same header file (ideally) or each side constructs
their own.
If it gets more complicated than this, then we use CORBA.

> However, if you wanted to send it to a different destination which
> requires a
> different endianness, you would have to duplicate your data structure.

Yes, this design assumes compile-time knowledge of endianness.
So, the author cannot specify the endianness at runtime; its a template
argument.

> The more I think about it, the more it's clear to me that this represents
> an
> example of the wrong separation of concerns (no flame intended - simply a
> software engineering term):
>
> In the typed approach, the endianness becomes the property of the message.

I think this is the appropriate separation of concerns. The endianness is a
property of how the data in the message is represented.
>
> However, I think endianness should be the property of the destination, no?

I definitely disagree. Because the code should not know the endianness of
the destination.
There could be several destinations, possibly with differing endianness, for
the same message.
Also, if a new board gets added that is has a different endian, I should
only have to change the code on that board. No other boards should be
affected.

>
>> char buf[BIG_ENOUGH];
> <snip>
>
> Terry, I like your example as it does indeed seem to make simple things
> simple.

For the kind of work that I do, yes.

> - read a matrix from disk and for the sake of argument, let's say it's in
> a
> big-endian format
> - read a number representing the number of its dimensions
> - based on the number of elements, swap each element to the machine's
> representation
>
> //this is the part I am most interested in how you'd solve using the typed
> //approach. Ignore the potential alignment issue.
> double * dataBeg = static_cast<unsigned char*>(data) + sizeof(mh),
> dataEnd = dataBeg + nElem;
> swap_in_place<big_to_machine>(dataBeg, dataEnd);
>
> //process dataBeg ...
>
> I can't help but think that the typed approach will have to be O(N) in
> both
> little and big-endian scenarios.

AHA! Yes. If done this way, reading in the entire array and then swapping
it in one step, then one would need an endian_swapper, like yours to make it
a no-op in the native case.
However, if I had to solve this problem with my tools, I wouldn't read the
whole array in at once. I would only read in the header, but not the data
at first.
Unfortunately, I can't remember how to do this with boost::multiarray, if we
just allocate a really big vector for the data, then I would have read in
each element into the data portion like this.

vector<double> dataPortion;
dataPortion.reserve(nElem);
for (int i=0; i!=nElem; ++i) {
  endian<big, double> x;
  read(fh, &x, sizeof(x));
  dataPortion.push_back(double(x));
}

This requires an extra copy into the temporary x, which cannot be a
register, before storing the native version into the vector.
I see your point now for the in-place case. Thank you!

Assuming I read in the whole array into memory at once, as you did, then I
might try this (assuming aliasing rules don't really matter).
To address the in-place case with a typed interface, I would make a
templated copy function.

template<endian_t E1, class T, endian_t E2> inline
endian<E2,T>* copy(const endian<E1, T>* first, const endian<E1,T>* last,
endian<E2,T>* dest) {
  if (E1 == E2 && first == dest)
    return dest + nElem;
  while (first != last)
    *dest++ = *first++;
  return dest;
}

(I don't recall how to generalize the parameters as iterators. Help
anyone?)

... then to swap in place ...
endian<big, double>* src = reinterpret_cast<endian<big, double>*>(mh +1);
double* dst = reinterpret_cast<double*>(src);
(void) copy(src, src+nElem, dst); // Almost a no-op in the native, endian
case.

... but I'm not sure if this is legal from an aliasing standpoint.

I think this approach would have similar performance to your swap() and
swap_in_place(). Tomorrow night, I'll make some measurements.

terry


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk