Boost logo

Boost :

Subject: Re: [boost] [boost::endian] Request for comments/interest
From: Tomas Puverle (Tomas.Puverle_at_[hidden])
Date: 2010-05-29 21:18:28


Terry,

> Since IP packets cannot be 10GB, I submit that you're going to have to break
> your 10GB array down into messages.

Thank you for your continued feedback. You have raised some interesting
points and issues. Please see my comments inline.

First of all, note that I was careful to say "send the data to an external
device" but I believe that you are thinking about the problem purely from the
point of view of networking, at least your message seems to imply so in this
case.

I am not going to have to break my message into packets. And *even* if the
message needs to be broken into packets, it will not be done by me, but by the
OS. I will just call write()/WriteFile() or whatever with the data I have
available. I am not going to break it up into packets ahead of time.

> boost::array<endian<little, uint32_t>, MaxFragmentSize> buffer;
>
> That you copy fragments of the 10GB array into before sending, and then on
> the receiving size, copy them out.
> The user on either side of the interface can extract the data from the
> fields without knowing the endianness of the field or the endianness of the
> machine he's working on.
> He doesn't have to know to call a swap function. He just extracts the data
> using the standard copy algorithm. The conversion happens automatically by
> implicit conversions.
> One copy into each message. One copy out. What could be better than that?

Sorry for the overquote but I wanted to make sure we didn't lose the context
in this particular case.

Better than 1 copy in/out is 0 copies in/out. Here's how:

I memory map a file. Now the data is in memory.
Alternatively, I allocate a buffer and read from a disk or from a network.
Note that the OS needs to make a copy to get the data into the user-space
buffer. At this point, ideally, I should be able to start using the data.

If I understand your suggestion correctly, you would, at this point, construct
a collection of endian types in place in this buffer, allocate a new buffer
and copy the data out to it, during which the swapping would happen.

If this is correct, I think there are several problems with this approach:
 - this may not seem relevant but I think this is really ugly and much less
maintainable than the functional approach.
 - I can do a swap_in_place<>() on the original buffer. 0 copies. 0 work in
the case when the endianness is already correct.
 - On the other hand, you have to allocate a new buffer, placement new all the
endian types, perform the copy. Cost: Allocation + at least 2N operations in
either case, not to mention the other bad side effects related to unncessary
work which I already detailed in another post.

> then field alignment isn't an issue.

Correct, but it may affect the quality of the code the compiler can generate.
I belive my approach doesn't suffer from this problem.

> Doesn't swap_in_place<>() make the same assumption of overlaying types?

No, since the type just gets written back to the same type and location. The
only assumption swap_in_place<>() makes is that a swapped type is again
representable in the original type. And yes, I will give you that this is a
non-trivial assumption, as, as others have pointed out, this may not be valid
for floating point values or even pointers on some machines.

> In the message-based interfaces that I am used to, one always must copy some
> data structures into a message before you send it.

But, as I pointed out, we are not just talking about network protocols.

> In both techniques you have to copy the information out of the message, if
> you use it, at least one time. The problem with the swapping mechanisum is
> that the swap, requires a write and a read from every location,

This is not necessarily true.
It may the case with swap_in_place but not necessarily with swap<>().

However, while I have agreed with you that some people might find the endian
types useful, I have to take exception to your claim above, that you always
need to swap everything. That is simply not true! I can, just like you, do
the following:

int i = swap<big_to_machine>(s.i);

Actually, I personally find this code rather readable and in many respects, I
find it more instructive than the following:

int i = s.i;

where i happens to be an endian type. I would go as far as to argue that my
code is much more self-documenting and would lead to fewer surprises for a
programmer not familiar with your code.

> With the typed-approach you only pay for the message fields that you read.

And equally with the functional approach.

> No extra work is required on native-endian machines.

But I think I've demostrated that there is actually a significant amount of
extra work required even on native-endian machines.

> I think the typed-approach actually fits the "only pay for what you use"
> mantra better.

Disagree.

> I get the impression that I'm missing something. If you're game, I'd like
> to consider a real-world use-case that uses multiple endians and has
> different protocol layers.

Of course. I like the idea of actual use cases.

> We're only considering byte-ordering here too. An equally important part of
> the endian problem for me, is the bit-ordering. For this I use a similar
> technique for portable bitfields
>
> bitfield<endian_t, w1, w2, w3, w4, w5, ...>

I am not sure what the above means, sorry.

> I'm arguing against swapping though because I've been using the type-based
> method (but not Beman's exactly) successfully for a long time. I'm a very
> biased. :o).

This has been very useful. Thank you.

Tom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk