|
Boost : |
From: Yuval Ronen (ronen_yuval_at_[hidden])
Date: 2006-06-10 15:20:01
>> You described your motivation as dealing with large files containing
>> records. These records could contain integer types, and you wanted to be
>> portable, and therefore store them in a declared endianness, rather than
>> an unknown native endianness. You also wanted to be very economical with
>> storage requirements, so you used an weird-sized, unaligned integers.
>>
>> That's perfectly fine. Now lets just take this exact example, and remove
>> the need for portability. If I write code that I *know* will run on a
>> homogeneous set of platforms, and I want to save the performance penalty
>> imposed by the non-native endianness, then I'd like to use native
>> endianness. The need for weird-sized unaligned integers to save space
>> didn't disappear. It's still there. Hence the need for these Integer
>> types for various sizes/alignments.
>
> OK, "need for weird-sized unaligned integers to save space" is a valid
> motivation, although I would be surprised if usage was widespread.
I'm not sure usage of these types (without endianness) would be any less
widespread than usage with endianness, but I should really know when to
quit... :-)
>> Bottom line, is that I believe the need for these integer types exists
>> (for space efficiency, or other reasons) even without the endianness
>> specifications, and the latter should be built around them, and not
>> interleaved with them.
>
> The fundamental characteristic of these types is that they are stored as
> a sequence of bytes. Since a sequence of bytes has to have some ordering
> (big, little, native, whatever), I don't see how you could efficiently
> have types containing a sequence of bytes, and only later layer
> endianness on top of them. To do so would seem to imply later byte-swapping.
>
> In other words, if there was an unaligned any-size <= 8 integer library,
> presumably it would use native endianness. That implies byte-by-byte
> copying. Then changing endianness would involve byte swapping, doubling
> the cost compared to the current approach.
When is this "byte-by-byte copying" supposed to happen? When
constructing the object? Performing I/O? Performing arithmetic operations?
Any implementation of an endian class, my approach or yours, needs to do
some byte swapping and copying, at one of those stages, at least. And as
far as I understand, your implementation does it when performing
arithmetic operations, but I might misunderstood your code. My idea of
how to write it, would make the swapping and copying during construction
and I/O rather then arithmetics, but I think it would also be possible
to do it so arithmetic operations would suffer instead.
Which is better? I don't know, but I don't think you can skip it all
together...
>>> In any case, such types would seem to fit better into an integer library
>>> than a library providing endian byte-holders.
>> Absolutely, that's what I was saying. These types should reside in
>> Boost.Integer, and Boost.Endian should just accept them (and others) as
>> template parameters.
>
> I can see that as an argument for making endian a part of Boost.Integer
> rather than a separate library.
If I'll assume for a moment that we agree on everything else (which we
don't :-) ), then I won't mind either way.
>>>> - I think that using bits numbering is better than bytes, because a)
>>>> uniformity with the types in <cstdint> is *very* important, IMO and b)
>>>> as some noted, the size of a char is not necessarily 8 bits (so help me
>>>> God if I understand why this is more useful than harmful), so bits
>>>> numbering is less ambiguous than bytes (and maybe this is the reason why
>>>> it was chosen to be used in <cstdint>).
>>> <cstdint> is about integers, where the number of bits is critical, even
>>> if not exactly a certain number of bytes.
>>>
>>> <boost/endian.hpp> is about endian byte-holders, where the number of
>>> bytes is critical, even if not exactly matching the architecture's
>>> integer number of bits.
>> boost/endian is not about integers? How can it be not? The *only* area
>> where endianness is relevant is with integers.
>
> I've seen other numeric types (decimal, floats) where endianness was an
> issue, but did not include them in this proposal because I personally
> have no experience with such types and no need for such types.
So it seems we agree that those types are out of scope here. But this is
drifting from my original point. The point was that both <cstdint> and
the endian class(es) deal with integers, not with "byte-holders". Byte
holders, or IOW buffers, don't have anything to do with endianness, only
integers do. So this was the rationale for using bits instead of bytes,
to be consistent with <cstdint>.
>>>> Actually, it just occurred to me
>>>> that if portability between different platforms (with different
>>>> CHAR_BITS) is our main concern here, then it *must* be bits, isn't it?
>>> CHAR_BITS is fixed at 8. It never varies.
>> I'm certainly not a standard expert, but several posters in this thread,
>> and in the Boost.Asio review thread, claimed that CHAR_BITS can be
>> larger than 8. I had no knowledge of my own here, so I relied on it. If
>> this is wrong, then I am wrong as well. My apologies for that.
>
> No, I was the one that was wrong. Sorry.
Which brings me back to my original post - if such platforms with
CHAR_BITS != 8 are to be supported (which is not certain), then I think
counting bits is an absolute must...
>> ...
>>
>> Let me understand, are you saying that using an int somewhere is
>> "aligned by happenstance", and therefore considered "unaligned"?
>
> Let me give an example:
>
> struct foo
> {
> big3_t v1;
> big3_t v2;
> big2_t v3;
> };
>
> Now by happenstance v3 has an offset modulo 2 of 0. But in the
> applications I work with, it would be a design mistake to change it to
> an aligned_big2_t.
>
> That's because foo's may get embedded in larger structs like this:
>
> struct bar
> {
> big3_t x1;
> foo x2;
> };
>
> It is very important for these apps that no padding be inserted after
> x1. That's why v3 isn't logically considered aligned, even though it
> happens to have an offset modulo 2 of 0.
I completely agree with your example, but am still not convinced. The
reason is probably because we disagree on the more fundamental issue of
whether to separate the endian class from size/alignment or not. So
there is no reason to pursue this further.
>>>> - Having an enum with values such as 'big', 'aligned_big', 'little',
>>>> 'aligned_little', etc, just cries for separation. The enum should have
>>>> only 'big' and 'little', and the endian template can accept one more
>>>> template argument - 'bool aligned'.
>>> My initial implementation did have an additional template argument,
>>> taking an enum:
>>>
>>> enum alignment { unaligned, aligned };
>> Looks excellent.
>>
>>> But having an additional argument meant that defaulting didn't work
>>> well. It is nice to be able to default the lengths for aligned.
>> I have to admit that I don't understand how adding the 'enum alignment'
>> as a first or second template argument (before or after the 'endianness'
>> argument) caused any problems with the default length argument. Sounds
>> harmless to me.
>
> There are two defaults we might like:
>
> (1) alignment defaults to unaligned.
> (2) num_bytes defaults to sizeof(T)
>
> If the alignment parameter precedes the num_bytes parameter, an
> alignment default doesn't work if a num_bytes argument is present.
>
> If the num_bytes parameter precedes the alignment parameter, a num_bytes
> default doesn't work if an alignment argument is present.
>
> That's why I'm in favor of adding named arguments to the language.
>
> In this particular case, (2) is less important that (1), so I guess we
> could sacrifice (2) and have a separate alignment parameter, placed last
> so it could be defaulted to unaligned. I'm undecided.
Your current implementation actually sacrifices (1) in favor of (2), and
in this case there should also be no problem adding the alignment
parameter (without a default, which was just sacrificed). It seems that
either preferring (1) over (2), or vice-versa, allows adding the
alignment parameter.
> Thanks for all the comments,
You're welcome.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk