Boost logo

Boost :

From: Beman Dawes (bdawes_at_[hidden])
Date: 2006-06-10 09:18:00


Yuval Ronen wrote:
> Beman Dawes wrote:
>>> what I suggested before (and obviously failed to convince): There should
>>> be a set of Integer types for various sizes/alignments, which could be
>>> used without any relation to endianness (which probably means native
>>> endianness, just as using a simple 'int' or 'uint32_t' means native
>>> endianness).
>> What I'm missing is the motivation. Other than for endian I/O, I'm not
>> able to visualize any need for integers of various sizes/alignments
>> beyond those already provided by <cstdint>.
>
> They are needed for the exact same reason you wrote class endian in the
> first place.
>
> You described your motivation as dealing with large files containing
> records. These records could contain integer types, and you wanted to be
> portable, and therefore store them in a declared endianness, rather than
> an unknown native endianness. You also wanted to be very economical with
> storage requirements, so you used an weird-sized, unaligned integers.
>
> That's perfectly fine. Now lets just take this exact example, and remove
> the need for portability. If I write code that I *know* will run on a
> homogeneous set of platforms, and I want to save the performance penalty
> imposed by the non-native endianness, then I'd like to use native
> endianness. The need for weird-sized unaligned integers to save space
> didn't disappear. It's still there. Hence the need for these Integer
> types for various sizes/alignments.

OK, "need for weird-sized unaligned integers to save space" is a valid
motivation, although I would be surprised if usage was widespread.

> Bottom line, is that I believe the need for these integer types exists
> (for space efficiency, or other reasons) even without the endianness
> specifications, and the latter should be built around them, and not
> interleaved with them.

The fundamental characteristic of these types is that they are stored as
a sequence of bytes. Since a sequence of bytes has to have some ordering
(big, little, native, whatever), I don't see how you could efficiently
have types containing a sequence of bytes, and only later layer
endianness on top of them. To do so would seem to imply later byte-swapping.

In other words, if there was an unaligned any-size <= 8 integer library,
presumably it would use native endianness. That implies byte-by-byte
copying. Then changing endianness would involve byte swapping, doubling
the cost compared to the current approach.

>> In any case, such types would seem to fit better into an integer library
>> than a library providing endian byte-holders.
>
> Absolutely, that's what I was saying. These types should reside in
> Boost.Integer, and Boost.Endian should just accept them (and others) as
> template parameters.

I can see that as an argument for making endian a part of Boost.Integer
rather than a separate library.

>
>>> - I think that using bits numbering is better than bytes, because a)
>>> uniformity with the types in <cstdint> is *very* important, IMO and b)
>>> as some noted, the size of a char is not necessarily 8 bits (so help me
>>> God if I understand why this is more useful than harmful), so bits
>>> numbering is less ambiguous than bytes (and maybe this is the reason why
>>> it was chosen to be used in <cstdint>).
>> <cstdint> is about integers, where the number of bits is critical, even
>> if not exactly a certain number of bytes.
>>
>> <boost/endian.hpp> is about endian byte-holders, where the number of
>> bytes is critical, even if not exactly matching the architecture's
>> integer number of bits.
>
> boost/endian is not about integers? How can it be not? The *only* area
> where endianness is relevant is with integers.

I've seen other numeric types (decimal, floats) where endianness was an
issue, but did not include them in this proposal because I personally
have no experience with such types and no need for such types.

> A buffer of bytes has no, and doesn't need any, endianness.

> That's why I think an *integer* type
> is the parameter to the endian classes. It seems we agree on that,
> because your code does exactly this - passes integer types to the endian
> class.
>
>>> Actually, it just occurred to me
>>> that if portability between different platforms (with different
>>> CHAR_BITS) is our main concern here, then it *must* be bits, isn't it?
>> CHAR_BITS is fixed at 8. It never varies.
>
> I'm certainly not a standard expert, but several posters in this thread,
> and in the Boost.Asio review thread, claimed that CHAR_BITS can be
> larger than 8. I had no knowledge of my own here, so I relied on it. If
> this is wrong, then I am wrong as well. My apologies for that.

No, I was the one that was wrong. Sorry.

>...
>
> Let me understand, are you saying that using an int somewhere is
> "aligned by happenstance", and therefore considered "unaligned"?

Let me give an example:

struct foo
{
   big3_t v1;
   big3_t v2;
   big2_t v3;
};

Now by happenstance v3 has an offset modulo 2 of 0. But in the
applications I work with, it would be a design mistake to change it to
an aligned_big2_t.

That's because foo's may get embedded in larger structs like this:

struct bar
{
   big3_t x1;
   foo x2;
};

It is very important for these apps that no padding be inserted after
x1. That's why v3 isn't logically considered aligned, even though it
happens to have an offset modulo 2 of 0.

>
>>> - Having an enum with values such as 'big', 'aligned_big', 'little',
>>> 'aligned_little', etc, just cries for separation. The enum should have
>>> only 'big' and 'little', and the endian template can accept one more
>>> template argument - 'bool aligned'.
>> My initial implementation did have an additional template argument,
>> taking an enum:
>>
>> enum alignment { unaligned, aligned };
>
> Looks excellent.
>
>> But having an additional argument meant that defaulting didn't work
>> well. It is nice to be able to default the lengths for aligned.
>
> I have to admit that I don't understand how adding the 'enum alignment'
> as a first or second template argument (before or after the 'endianness'
> argument) caused any problems with the default length argument. Sounds
> harmless to me.

There are two defaults we might like:

(1) alignment defaults to unaligned.
(2) num_bytes defaults to sizeof(T)

If the alignment parameter precedes the num_bytes parameter, an
alignment default doesn't work if a num_bytes argument is present.

If the num_bytes parameter precedes the alignment parameter, a num_bytes
default doesn't work if an alignment argument is present.

That's why I'm in favor of adding named arguments to the language.

In this particular case, (2) is less important that (1), so I guess we
could sacrifice (2) and have a separate alignment parameter, placed last
so it could be defaulted to unaligned. I'm undecided.

Thanks for all the comments,

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk