|
Boost : |
Subject: Re: [boost] [UUID] PODness Revisited
From: Vladimir Batov (batov_at_[hidden])
Date: 2008-12-25 23:08:21
Adam,
Wow, that was one passionate reply. Was it something that I said? ;-)
> While you may not see the "magic" in POD types, I can't fathom what
> exactly
> you have against them either.
Well, I most certainly do not have anything against anything. It's nothing
personal. If my emails came across as such, my humble apologies.
> Do you find them more confusing or harder to
> use? Do you find static initialization syntax aesthetically offensive? Is
> it
> your (no offense, but extremely misguided, IMO) lingering impression that
> POD
> types are a legacy of C that should be ignored whenever possible? A list
> of
> examples in favor of making UUID a POD type was presented, and you've
> argued
> against those examples without actually saying what you think the drawback
> is.
That's quite an emotionally charged list you compiled. It did not have to be
such. I certainly do not find aggregates confusing or anything of that sort.
In fact, I was very happy with them for about 10 years while coding in C
before I switched to C++ in early 90ies. PODs've come from C and, therefore,
they *are* a legacy of C and are not called Plain Old Data for nothing. In
C++ though do find aggregates limiting. With regard to uuid it'd be no
user-provided constructors, no guaranteed invariant, no private or protected
non-static data members. And that is fundamental (my view of course) to
C++ -- "it is important and fundamental to have constructors acquire
resources and establish a simple invariant" (Stroustrup E.3.5). Then, "One
of the most important aims of a design is to provide interfaces that can
remain stable in the face of changes" (Stroustrup 23.4.3.5). PODs do
restrict interfaces and are wide-open implementation-wise. That opens the
door for mis-use, complicates long-term maintainability. So, unless PODs
provide some killer feature in return (that cannot be achieved otherwise), I
do not see the point of paying that price.
>> That's what *I* see (caveat: I admit not knowing much about Boost.MPI and
>> Boost.Interprocess requirements and expectations).
>
> Again, no offense intended, but I find it a bit discomfiting that the
> person
> arguing most vocally on this issue would make this admission. Just because
> you
> don't have personal knowledge of a use case where UUID being a POD type
> would
> be greatly beneficial doesn't mean such a use case doesn't exist.
First, you are right about "most vocally". I too had that growing concern
that there was somewhat too much of me lately on the list. Apologies. In my
defence I might say I do not usually do that. My weak point is that once I
get onto something, I tend to follow it through to completion (well, some
might consider that to be a good thing). Point taken though, I'll try
answering your email (hopefully to your satisfaction) and will turn it down.
Secondly, I personally do not see anything wrong with the admission -- I use
some libs extensively, some occasionally and do not use some at all. I
suspect it is quite typical. Stating your knowledge IMO clears up a lot of
possible and unnecessary confusion and many other emotions.
Thirdly, I am not sure I said "such a use case doesn't exist", did I? If I
did, I probably did not mean that. :-) What I am questioning though is the
"greatly beneficial" part. I am glad to see that part is already obvious to
you. I hope it's not a just hunch and you have hard data to back it up.
>> 1. Boost.MPI efficiency does not seem to rely on PODness. Rather it seem
>> to
>> be due to serialization (or rather ability to bypass it).
>
> This isn't technically correct, I think; in MPI's case (though not
> Interprocess'), the type must be serializable regardless, but the ideal
> efficiency scenario comes from specializing both
> boost::mpi::is_mpi_datatype
> and boost::serialization::is_bitwise_serializable. Note that the
> documentation
> for these traits ([1] and [2], respectively) both specifically mention POD
> types -- this is no coincidence.
>
> [1]
> http://www.boost.org/doc/libs/1_37_0/doc/html/boost/mpi/is_mpi_datatype.html
> [2]
> http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html
Yes, my wording was somewhat crude. I presume you have a lot of practical
experience with MPI and you can say with authority that PODness is a must
for MPI's efficiency. Would you mind providing some experimental data that
you observed? My knowledge of MPI is from reading docs (I probably should
stop making these discomforting admissions). There I got that impression
that serializable non-aggregate classes could be made efficient too.
> ... I think you're missing the larger point. In
> modern C++, types intentionally created as POD types are often (not
> always)
> done so to absolutely maximize the efficiency of copying that type.
I do not understand "PODness to absolutely maximize the efficiency of
copying" as I believe
class NonAggregateFoo { ... int int_; };
is copied as efficiently as a raw 'int'. And NonAggregateFoo bunch[] can be
memcopied as well PODFoo bunch[] (I am not advocating that but simply
stating the fact). And I do not expect the respective
template<class Archive>
void
serialize(Archive ar, unsigned int)
{
ar & int_;
}
to be that slow (with appropriately chosen Archive). Again, here you might
well know more than I do. Tell me then.
> The
> existance of the is_pod type trait in boost.type_traits/TR1/C++0x
> reinforces
> this -- e.g. in many implementations, std::copy will use memcpy to perform
> an
> ideally efficient copy when is_fundamental<T>::value || is_pod<T>::value.
> Additionally, a POD type's synthesized copy constructor is generally
> merely a
> memcpy.
Understood. It does not make copying of non-aggregates inefficient though.
Non-automatic 'yes', inefficient 'no'.
> ...
>> If it is a MPI implementation-specific restriction/limitation, I'd expect
>> we'd look at addressing it in MPI rather than shaping other classes to
>> match
>> it.
>
> This is an unreasonable thought process, IMO. If a type has an good use
> case
> with another library (in this case, UUID with
> Serialization/MPI/Interprocess),
> it's up to the type to conform to the library in an ideal fashion, not the
> other way around. E.g., lexical_cast and serialization don't go out of
> their
> way to work with every other type in Boost, but many types in boost have
> serialization and lexical_cast support.
Well, again my initial wording was somewhat crude. I still stand by its
meaning though. A general-purpose library should be
accommodating/considerate rather than imposing. And from what I read about
MPI that's the approach taken there. As for lexical_cast, it is the same --
it imposes the requirement of op>>, op<<, the def. cnstr. However, instead
of rejecting non-conformant classes, it leaves the door open and
accommodates those via specialization and at least as efficiently.
Boost.Serialization? Same. In fact, they *do* "go out of their way to work"
with as many types as possible. I think I can talk about Boost.Serialization
with a little bit of confidence (as I've been using it quite extensively). I
know that the library tries so remarkably hard to keep everyone happy --
optimization? yes; no-default constructors? no problem; separate load/save
logic? bring it on; intrusive/non-intrusive serialization? piece of cake...
the list is long.
>> 2. Scott, you correctly mention that most often we "don't want to send
>> UUIDs
>> by themselves". The thing is that chances of that bigger class being a
>> POD
>> are diminishing dramatically (if not already infinitely close to 0).
>
> This is extremely off base, and points back to your lack of knowledge
> regarding MPI, I think.
Uhm, what exactly is extremely off-base here? And what does MPI have to do
with it? The bigger a class, the smaller the chance it can conform to the
limitations of POD. I am currently "serving time" in the railway industry
and dealing with Trains, TrackCircuits, Signals, Stations, (damn long list).
All use uuids and are used in inter-process inter-machine communications. I
cannot imagine those classes to be PODs.
> When writing an app/library/algorithm intended for use
> in a high-performance parallel context, one goes out of their way to use
> POD
> types extensively, for the sake of performance. Yes, the fact that MPI
> works
> with boost.serialization is nice, but when performance is critical,
> memcpy'able types are key;
First, I am under impression that non-aggregate non-virtual objects are as
memcopyable (with usual caveats) as PODs are. Second, I feel
boost.serialization still can be optimized for performance. See,
http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/tutorial.html#mpi.serialization_optimizations.
Plus binary archives (or your custom archives) can carry a very limited
overhead. Still, I do not know much about MPI (Oops, I did it again! ;-)).
> ... I think to
> argue that a type such as UUID (which is a low-level, fundamental value
> type,
> and specifically *very* likely to be used in an inter-process context)
> should
> *not* automatically work in an ideal fashion in this scenario, one must
> have
> an *extremely* convincing argument, IMO. And so far, I haven't seen one
> presented. ;-)
As for inter-process context, then if it is on the same machine (in shared
memory), then there is no that exclusive PODness quality that allows objects
to be stored/accessed in shared memory -- non-aggregate non-virtual objects
are as good for that as PODs. If that is over the network, then I suspect
we have many more things to worry about efficiency- and data
consistency/integrity-wise. Say, network latency, synchronization, node
dropouts, (a long list).
As for "an *extremely* convincing argument", then I somehow haven't seen one
either so that I'd say "indeed, non-aggregates cannot do that, POD is the
king". But I might not know something you do (gosh, it's turning into some
"disturbing" confession now ;-)) but that's OK, right?
>> 3. As for deployment of an object in shared memory, it does not have to
>> be a
>> POD either.
>
> Please take another look at the specific link Scott provided ([4]);
> boost::interprocess::message_queue only copies raw bytes between
> processes, so
> for non-POD types generally that requires that an object be binary
> serialized
> before sending. However, for a POD type, binary serialization is a
> completely
> redundant process (read: a complete waste of CPU cycles); one can just
> send
> the bytes of the object directly, and as an added bonus, avoid becoming
> dependant on the somewhat heavy serialization library altogether.
Yes, I hear you. I just do not know how big deal that is. I can only argue
this point with any conviction after I try optimized binary serialization
vs. memcopy. If you tried, then I'd love to hear that. If you did not, then
I am still unsure of *real* tangible benefits on PODness.
> Again, the fact that this might be possible even if UUID were not a POD
> type
> is somewhat irrelevant,
I disagree. It is relevant to me and surely many others working on higher
abstraction levels. POD comes with conditions. I need to know if I want to
pay that price. Therefore, I never buy into theoretical efficiency
debates -- I write stuff, I profile the stuff, I fix the actual (not
imagined) bottlenecks.
> ...
> I want to touch on a few other points as well, were UUID to be a POD type:
>
> 1. The default constructor behavior/existance debate would be put to rest.
> ;-)
Well, at the expence of initial invalid invariant state? I think, I'd rather
agree to the nil-behavior of uuid. Again, "it is important and fundamental
to have constructors acquire resources and establish a simple invariant"
(Stroustrup E.3.5).
> 2. The efficiency of lexical_cast would be better than *any* default
> constructor behavior, regardless of which one was ultimately decided
> upon.
I think you are referring to the non-initialized instance in the default
lexical_cast<uuid>(string). It might or might not be correct though --
writing to and reading from those streams might have real impact instead of
initialization or no initialization. Not profiled that though.
> ...
> 4. Initializing a nil UUID would become more succinct. Contrast
> 'uuid id(uuid::nil());' and 'uuid id = {};', or 'id(uuid::nil())' and
> 'id()' in a constructor initialization list. Assuming any level of
> familiarity with aggregates, the latter are much more concise, IMO. (And
> C++0x will certainly introduce that familiarity if one doesn't have it
> already.)
Here comes Vladimir disagreeing again (and not because he is not familiar
with or afraid of aggregates). It is because I feel that "uuid id = {0};"
exposes too much implementation detail and assumes the user knows that the
invalid uuid is all zeros. If, say, tomorrow the Standard changes the value
of nil, all my code becomes invalid. It might not be the case with uuid.
However, it is the principle/coding habit I am talking about.
> 5. Static initialization has been greatly underrated so far in this
> discussion. My first use case for a Boost UUID library would be to
> replace
> some homegrown COM/XPCOM encapsulation code. In dealing with COM/XPCOM,
> it
> is *extremely* common to have hardcoded UUIDs, and *many* of them.
> Trivial
> work though it may be, spending application/library startup time
> initializing hundreds/thousands of UUIDs when they could be statically
> initialized is senseless.
I believe you'll be able to do that if we do
class uuid
{
template<class Range> uuid(Range range);
}
Then you'll be able to feed your hard-coded initialization data to uuid.
> 6. Regarding the potential for uninitialized state: I personally view UUID
> as
> a fundamental, borderline primitive type (others will almost certainly
> disagree); uninitialized state is generally understood and accepted for
> actual primitive types, so why should it be such a scary concept for
> UUID?
It's certainly not scary. It's just not in C++ spirit (see quotes at the top
of the email) and everyone knows what primitive types are. I do not think
people expect other types to behave that way.
> 7. Lastly, to reiterate: this is C++. Every type, every library, every
> algorithm should be written with performance and efficiency as primary
> considerations.
I do not think C++ was designed "with performance and efficiency as primary
considerations". And I do not think applications "should be written with
performance and efficiency as primary considerations". Don't get up in
arms -- those considerations are important. I object to the "primary" part.
I do not think I even need to debate this -- Knuth, Stroustrup and many
others have done that.
> ... There are demonstrable use cases where UUID can work more
> efficiently as a POD type,
Call me thick but I did not see those convincing use-cases showing PODs
considerably more efficient than non-aggregates. Easier? Yes. *Seemingly*
more efficient? Yes. How much more efficient? I dunno if that is palpably
real.
> but no convincing arguments have been presented
> in favor of non-PODness.
Oh, c'mon. How 'bout readng "The C++ Progr. Lang." and the "Evolution of
C++" books? Discussions there do not revolve around aggregates.
Best,
V.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk