Boost logo

Boost :

Subject: Re: [boost] [UUID] PODness Revisited
From: Scott McMurray (me22.ca+boost_at_[hidden])
Date: 2008-12-26 02:03:57


On Thu, Dec 25, 2008 at 23:08, Vladimir Batov <batov_at_[hidden]> wrote:
>
> That's quite an emotionally charged list you compiled. It did not have to be
> such. I certainly do not find aggregates confusing or anything of that sort.
> In fact, I was very happy with them for about 10 years while coding in C
> before I switched to C++ in early 90ies. PODs've come from C and, therefore,
> they *are* a legacy of C and are not called Plain Old Data for nothing. In
> C++ though do find aggregates limiting. With regard to uuid it'd be no
> user-provided constructors, no guaranteed invariant, no private or protected
> non-static data members. And that is fundamental (my view of course) to C++
> -- "it is important and fundamental to have constructors acquire resources
> and establish a simple invariant" (Stroustrup E.3.5). Then, "One of the most
> important aims of a design is to provide interfaces that can remain stable
> in the face of changes" (Stroustrup 23.4.3.5). PODs do restrict interfaces
> and are wide-open implementation-wise. That opens the door for mis-use,
> complicates long-term maintainability. So, unless PODs provide some killer
> feature in return (that cannot be achieved otherwise), I do not see the
> point of paying that price.
>

UUIDs require no resource acquisition in constructors.
UUIDs do not maintain any invariant on their content.
UUIDs have a standard binary format.

As for "mis-use", you have already confessed to such things as
vtable-pointer twiddling, which is an excellent illustration that
making something drastically non-POD doesn't prevent mis-use.

Maintainability depends far more on sane choices by programmers than
by draconian attempts by library writers. If a coder still wants to
write an algorithm that breaks if shared_ptr<T>'s
"unspecified-bool-type" changes (for example), that's not the fault of
the library writer. As another example, std::tr1::array has its
internal storage marked "Exposition only"; Seems like the same
technique ought to be enough here.

So what "price" is left?

And I think you're reading far too much into the name with "are not
called Plain Old Data for nothing". Just like templates weren't
introduced for Template Meta-Programming, PODs have had surprising
applications. In Boost.Phoenix (to become Lambda v3, iirc), for
example, _1 is a POD -- by deliberate design choice -- despite the
fact that a lambda placeholder is anything but "Plain Old Data".

>
> I do not understand "PODness to absolutely maximize the efficiency of
> copying" as I believe
>
> class NonAggregateFoo { ... int int_; };
>
> is copied as efficiently as a raw 'int'. And NonAggregateFoo bunch[] can be
> memcopied as well PODFoo bunch[] (I am not advocating that but simply
> stating the fact).

Your "fact" is wrong.

3.9/3: "For any POD type T, if two pointers to T point to distinct T
objects obj1 and obj2, where neither obj1 nor obj2 is a base-class
subobject, if the value of obj1 is copied into obj2, using the memcpy
library function, obj2 shall subsequently hold the same value as
obj1."

That applies PODs only, and the other representation guarantees in
[basic.types] also apply only to PODs.

So using a "memcopied" NonAggregateFoo bunch[] invokes undefined
behaviour, preventing its use in a standards-conforming library.

It's because of these rules that types can be used directly from, for
example, Boost.Interprocess shared memory, and they only apply to
PODs. No matter how efficient the serialization becomes, it will
always have overhead over no serialization.

> And I do not expect the respective
>
> template<class Archive>
> void
> serialize(Archive ar, unsigned int)
> {
> ar & int_;
> }
>
> to be that slow (with appropriately chosen Archive). Again, here you might
> well know more than I do. Tell me then.
>

I don't know either; I can just quote docs: "Some simple classes could
be serialized just by directly copying all bits of the class. This is,
in particular, the case for POD data types containing no pointer
members, and which are neither versioned nor tracked. Some archives,
such as non-portable binary archives can make us of this information
to substantially speed up serialization."[1]

Though this does make me wonder whether serialization should have a
concept of portably bitwise-serializable classes. Relatedly, the v13
uuid_serialize.hpp header is mostly commented out...

[1] http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html#templates

>>
>> This is extremely off base, and points back to your lack of knowledge
>> regarding MPI, I think.
>
> Uhm, what exactly is extremely off-base here? And what does MPI have to do
> with it? The bigger a class, the smaller the chance it can conform to the
> limitations of POD. I am currently "serving time" in the railway industry
> and dealing with Trains, TrackCircuits, Signals, Stations, (damn long list).
> All use uuids and are used in inter-process inter-machine communications. I
> cannot imagine those classes to be PODs.
>

MPI's history is (as far as I can tell) high-performance Fortran on
supercomputers. Something written specifically for such a machine
will go out of its way to use the most efficient implementation
possible, even if it sacrifices a small amount of expressiveness.
Boost.MPI was explicitly written to match the speed of the underlying
MPI library, using the same guideline.

A matrix is a perfect example of a POD that can become VERY large, and
is exactly the kind of thing that such programs will often deal with.

>
> First, I am under impression that non-aggregate non-virtual objects are as
> memcopyable (with usual caveats) as PODs are. Second, I feel
> boost.serialization still can be optimized for performance. See,
> http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/tutorial.html#mpi.serialization_optimizations.
> Plus binary archives (or your custom archives) can carry a very limited
> overhead. Still, I do not know much about MPI (Oops, I did it again! ;-)).
>

3.9 says otherwise, as mentioned.

>
> As for "an *extremely* convincing argument", then I somehow haven't seen one
> either so that I'd say "indeed, non-aggregates cannot do that, POD is the
> king". But I might not know something you do (gosh, it's turning into some
> "disturbing" confession now ;-)) but that's OK, right?
>

No, I'd say that debating PODness without really knowing what they do isn't OK.

>
>> Again, the fact that this might be possible even if UUID were not a POD
>> type
>> is somewhat irrelevant,
>
> I disagree. It is relevant to me and surely many others working on higher
> abstraction levels. POD comes with conditions. I need to know if I want to
> pay that price. Therefore, I never buy into theoretical efficiency debates
> -- I write stuff, I profile the stuff, I fix the actual (not imagined)
> bottlenecks.
>

Invariants can be non-intrusively added after (trivially, in this
case); It's impossible to non-intrusively remove them. Libraries
should thus error on the side of efficiency, in my opinion.

>
> Well, at the expence of initial invalid invariant state? I think, I'd rather
> agree to the nil-behavior of uuid. Again, "it is important and fundamental
> to have constructors acquire resources and establish a simple invariant"
> (Stroustrup E.3.5).
>

I'm not convinced that uninitialized definitions are such a hardship.
A quotation from an exception safety appendix is of questionable
value, when a POD UUID trivially offers no-throw exception safety, as
(to reiterate) UUIDs require no acquisition of resources nor any
invariants on their contents. (As any possible bit pattern can be
read in from a correctly-formatted user input, there is no invariant.
Composite classes may hold their own invariants on the uuids, just
like they do with the fundamental value types.)

>
> Here comes Vladimir disagreeing again (and not because he is not familiar
> with or afraid of aggregates). It is because I feel that "uuid id = {0};"
> exposes too much implementation detail and assumes the user knows that the
> invalid uuid is all zeros. If, say, tomorrow the Standard changes the value
> of nil, all my code becomes invalid. It might not be the case with uuid.
> However, it is the principle/coding habit I am talking about.
>

It's not "the principle/coding habit" that's up for discussion,
though. (So this pings my
http://en.wikipedia.org/wiki/Fallacy_of_division meter.)

As for implementation detail, I quote the standard: "In the absence of
explicit application or presentation protocol specification to the
contrary, a UUID is encoded as a 128-bit object, as follows: The
fields are encoded as 16 octets, [...]" (RFC 4122, 4.1.2). Any other
implementation is surprising.

The Nil UUID is defined as the one where 16 octets are 0, which is
exactly what {} and {0} say. The option to be explicit (uuid id =
uuids::nil();) is always there.

If the standard were to change the value of nil, all your data would
also become "invalid" (since you might have the hypothetical new Nil
UUID in there somewhere), so you're in trouble anyways. I'd posit
that any change to the standard as gratuitously breaking as that would
just lead to the updated standard being ignored by everyone, though.

Any claim based on the programmers not knowing what they're doing is
void, as that assertion inevitably leads to the statement that they
shouldn't be programming C++ (or any turing-complete language) at all,
preventing the use of any implication that requires them programming.

>> 5. Static initialization has been greatly underrated so far in this
>> discussion. My first use case for a Boost UUID library would be to
>> replace
>> some homegrown COM/XPCOM encapsulation code. In dealing with COM/XPCOM,
>> it
>> is *extremely* common to have hardcoded UUIDs, and *many* of them.
>> Trivial
>> work though it may be, spending application/library startup time
>> initializing hundreds/thousands of UUIDs when they could be statically
>> initialized is senseless.
>
> I believe you'll be able to do that if we do
>
> class uuid
> {
> template<class Range> uuid(Range range);
> }
>
> Then you'll be able to feed your hard-coded initialization data to uuid.
>

Which will then require code to run to initialize the UUID, and twice
the storage, since it's being kept twice. I'd have no interest in
requiring RAM for UUIDs in my embedded system when I could have put a
POD uuid into ROM.

>
> I do not think C++ was designed "with performance and efficiency as primary
> considerations". And I do not think applications "should be written with
> performance and efficiency as primary considerations". Don't get up in arms
> -- those considerations are important. I object to the "primary" part. I do
> not think I even need to debate this -- Knuth, Stroustrup and many others
> have done that.
>

I've read Stroustrup's book about what goals were in mind when C++ was
designed (and how those goals played out), and my recollection is the
complete opposite of what you are claiming. C++ is designed so that,
as much as possible, you "don't pay for what you don't use" and to be
"as fast as C" so that it could "raise the level of abstraction" in
"systems programming" (quotations paraphrased from memory and from the
extended preface at [2]). Those sound exactly like "with performance
and efficiency as primary considerations" to me.

I don't have D&E at hand, though; Does someone have the exact list of goals?

[2] http://www.research.att.com/~bs/dne.html

>
> Call me thick but I did not see those convincing use-cases showing PODs
> considerably more efficient than non-aggregates. Easier? Yes. *Seemingly*
> more efficient? Yes. How much more efficient? I dunno if that is palpably
> real.
>

16 bytes of RAM saved by aggregate-initialization matters in, say, a
distributed sensor network where you get a whole 2KB of RAM per chip,
and not doing the copy means less time out of sleep mode, improving
battery life.

I'll claim that the line dividing "premature optimization" from
"premature pessimization" is in a different spot in library
development, compared to application development.

>> but no convincing arguments have been presented
>> in favor of non-PODness.
>
> Oh, c'mon. How 'bout reading "The C++ Progr. Lang." and the "Evolution of
> C++" books? Discussions there do not revolve around aggregates.
>

To me, that sounds like, "the POD design is not 'modern' enough".

~ Scott


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk