Boost logo

Boost Users :

From: dizzy (dizzy_at_[hidden])
Date: 2008-08-18 09:03:06


On Monday 18 August 2008 14:51:53 Andrea Denzler wrote:
> -----Original Text-----
>
> Dizzy wrote:
> > So you have sizeof().
>
> You missed my point! Or you really think I never heard about sizeof? :-)
>
> When I define my class/struct it happen that to avoid a waste of space I
> want to define it with the needed size.

You never said anything like that in your previous email. However, you do
realize that the standard says very few things about binary layout? So
sizeof(struct { char a; int b; }) is usually > sizeof(char) + sizeof(int).

> For example I want it 16 bit (sorry
> that I use the bit expression, it's a old habit). The only way to do that
> is using preprocessor directive (because platform dependent) that create
> something like int16, int32, int64.
> With small data I don't care but when I work on a huge amount of data the
> difference between int16 and int32 is a double amount of used memory. And
> this matters.

You can also write your own template type that takes bitsize (or value range)
and resolves to the smallest native integral able to satisfy the requirement.
Are you saying that besides the native platform integer types you would also
like something like this to come with the standard library? If so propose it
to the committee. If you are saying there should be _only_ such types then I
hope you realize that's really not acceptable as many people still need to be
able to use C++ with the fastest native type possible.

> wchar_t is horribly defined. Sometimes 1 byte, on windows 2 byte, on most
> unix platforms 4 byte. Not only the size is different but also the encoding
> is different! What a wrong choice.

So you are saying the standard should have decided on a byte size for wchar_t
when the byte size in bits depends on the platform? Or you probably mean on a
range value. Such a range value would be defined depending on the encoding
used. So if the encoding is undefined it makes sense that everything else
about wchar_t is (to allow the implementation to have it's own encoding and
enough wchar_t representation for it).

I see your complain about wchar_t as I see Zeljko's complain about native
integers. wchar_t means native platform wide character capable of holding any
character the platform may support. Just as char is meant to be used to store
the basic character set without any encoding specification (ASCII or not).
Just as "int" is meant to be the fastest native platform integer type, without
having specified negative value encoding, size (apart from the C90 inherited
requirements on the minimum value ranges).

> You ever worked with Unicode? wchar_t is
> supposed to help in this but it request you to add a lot of platform
> dependant code.

No here I think you are wrong. wchar_t is not supposed to help you work with
Unicode. That's like saying that "int" is supposed to help you work with
integer values stored with 2's complement encoding (or that char was meant to
help you work with ASCII characters). Since neither "int" has negative value
encoding specified nor does wchar_t it clearly means that they were not meant
to help you with that.

> When I use UTF-8, UTF-16 or UTF-32 I NEED to know the size
> of the integer value when I define it. char for UTF-8, but today I use
> platform dependent preprocessor directives for UTF-16/32, would be simpler
> if C/C++ offer int16, int32, int64 types.

Of course you need to do so since you need value range guarantees for specific
types. You can query those with numeric_limits<type>::min()/max(). With some
template machinery you can do it at compile time (tho not using
::min()/::max() I just mentioned as in C++03 a function call cannot be used to
form a const expression, but you can write specializations for each of the
char, short, int, long integer types that then check the range using INT_MAX
and such constants).

This is generally valid when working with anything that requires fixed
representation (file binary formats, network protocols, character encoding,
etc). You are saying that besides the native integral types (and native
character types) C++ should offer you some fixed integer types (and some fixed
character encoding types). C++0x will offer both from what I understand.

> Also when I want to store the data on cross platform compatible files I
> suppose to use always the same byte size for integer values. So using
> explicitly a int32/int64 data type (and handling manually endianess) is a
> easy way to handle this. Those types doesn’t exist so we all have to define
> manually with stuff like #if.... + typedef int int32, etc etc.

Yes, that's what all people do in their portable layer creating portable
types. Notice that just integer size and byte endianess is not all it takes to
make it portable. You also have to take into consideration the negative value
encoding (the positive one you don't need to as the standard specifies it to
be a pure binary system). My serialization library dedicates an important part
of it describing these differences, what I do is create something like:
integer<signed, 20, integer_serializer<9, 3, high_endian, 1st_complement> >

Which means I want a signed (I use "signed", "unsigned" as plain tags) integer
type of 20 bits minimum that is serialized/deserialized with a representation
of 3 bytes of 9 bits each using high_endianess and 1st_complement negative
value encoding. The library decides at compile time to use fast paths of code
if it detects that the external representation as certain common attributes
with the current machine native representation.

I haven't described this to advertise the library just to say that IMO it is
quite normal for a low level language like C++ to provide native types of
which representation depend largely on the platform. As long as one can still
solve his portable types problems with these everything is fine with me. The
fact that C++0x adds the C99/POSIX fixed width integer types and also some
fixed character encoding types all the better but it's not the end of the
world.

> How can a signed integer type include all the valus of a unsigned integer
> type if both has the same bit/byte size? Or I don't get your point.

Huh? Where in the standard does it says that the unsigned type has to use all
the bits? They have the same size in bytes as reported by sizeof() but that
does not mean they can't have padding. I actually asked about this to make
sure a couple days ago on c.l.c++ usenet group, see the answers (especially
James' one) here:
http://groups.google.com/group/comp.lang.c++/browse_thread/thread/e1fe1592fe7cc112/bc1a0e8339e2b084

> Checked integer types are useful for debug builds. The runtime cost is not
> that heavy, usually on overflow a register flag is set (in some cases even
> an interrupt) so a simple assembler jmp handle this. But again.. only for
> debug/special builds since we all know that there is a runtime overhead.

That I completely agree. One of the main reasons I love using stdlib is that I
can compile it in debug mode and have all iterators, dereferences checked. But
we can't ask from the standard to offer such checked native integers only in
debug more or something :)

> You see.. you too admit that you needed to use integer types where you knew
> it's size, e.g. "fixed integer types".

Of course there is a need (otherwise they wouldn't be in C++0x). I'm just
saying C++ native types are not flawed because of not providing that. They are
not the tool for that, plain simple.

-- 
Dizzy
			"Linux is obsolete" -- AST

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net