Boost logo

Boost :

From: David Bergman (davidb_at_[hidden])
Date: 2002-11-19 00:22:25


Hi,

This is a comment from the Java corner of the world: I have, as many
other developers using Java, implemented serialization of objects onto
XML. It is not that hard, although there might not exist (can anyone
verify this?) a standardized (more or less...) "C++ Object XML Format".

There are two alternatives:

1. Use an intelligible XML Application (yes, that is what the XML folks
call the specific XML languages, such as XHTML...), giving not only
platform independence (which I assume the serializer module already
achieves...) but language independence, i.e., the object or value can be
unmarshalled, or generated, by a Python program, much in the spirit of
the good old XDR.

2. Embed the binary output of the existing serializer in an XML element.
This constrains the XML snippet to this particular serialization
algorithm, including, at least initially, to the C++ language.

Alternative 1:

One problem is that you have to come up with an XML Application for
these objects. What one does is to give atomic tags (i.e., having
"<!ELEMENT foo EMPTY>") for the primitive data types and then compound
elements for other types (i.e., "<!ELEMENT compund (compound |
atomic)>").

Although XML Schemas would be more suitable than DTDs for the XML
Application we need

One sample DTD (not meant to be a complete description, but rather an
illustrative sample) for such an XML Application would be (I skip the
DTD header...):

------------------------------------------------------------------------

---
<!-- A macro for all possible data values -->
<!ENTITY actual_data % "atomic | compound | ref | array" >
<!--
The primitive data, which is (often) either signed or unsigned, and
normal or long precision
'objectId' is only necessary if referenced. See 'ref'...
-->
<!ELEMENT atomic EMPTY>
<!ATTLIST atomic
	objectId ID #IMPLIED
	signed (true|false) #IMPLIED
	long (true|false) #IMPLIED
	type (char|short|int|long|float|double) #IMPLIED
	value CDATA #IMPLIED
>
<!--
The (short) type description of compound data, which is either a class
or a struct declaration.
This description could be extended to incorporate behaviour and
layout...
Note that this type description includes meta type descriptions, i.e.,
templates...
Also, 'instantiates' refers to the template instantiated in creating
this type (i.e., a 'typeId' of a 'type'), in which case 'name' is not
required.
One could also divide the 'type' element into an 'actual_type' and
'template' to distinguish these two meta levels in C++...
The 'instantiationParams' is a very ad-hoc way to provide instantiation
information...
-->
<!ELEMENT type EMPTY>
<!ELEMENT type
	kind (struct | class | template | instance) "struct"
	typeId ID #IMPLIED
	instantiates IDREF #IMPLIED
	instantiationParams #CDATA #IMPLIED
	namespace CDATA ""
	name CDATA #IMPLIED
>
<!--
The actual compund data, referring to the aforementioned type
descriptions
'type' refers to a 'type' element.
'objectId' is only necessary if referenced (in contrast to pure embedded
compounds). Note that this will simply be a document-wide unique ID in
most cases...
-->
<!ELEMENT compound (%value;)*>
<!ATTLIST compound
	objectId ID #IMPLIED
	type IDREF #REQUIRED
>
<!--
The other kind of composition is, obviously, arrays.
The polymorphism in this element definition w.r.t. the actual items will
not be used by the C++ runtime...
-->
<!ELEMENT array (%value;)>
<!ATTLIST array
	objectId ID #IMPLIED
	length NMTOKEN #IMPLIED
>
<!--
Ok, we might need references (including pointers) to data.
Note that this assumes that the reference is actually referring to a
valid object or value, and not some arbitrary address, which is
obviously not self-evident in the C++ case (this is what my Java alter
ego does not have to deal with...)
Even a reference has an optional ID, in case it is referred (known as a
"handle" chain).
'referee' could be a 'objectId' of an 'atomic', 'compound', 'array', or
a 'ref'.
-->
<!ELEMENT ref EMPTY>
<!ATTLIST ref
<!-- The ID of this reference, not the referee !! -->
	objectId ID #IMPLIED
	kind (pointer | reference) "pointer"
	referee IDREF #REQUIRED
>
------------------------------------------------------------------------
--
In the spirit of environment independence, one would also need to
represent the meta data for the behavior and exact layout of structs and
classes (to detail the 'type' elements in the DTD above) used, so a
dynamic environment can replicate that meta structure as well...
Alternative 2:
This would just be XMLish in the superficial sense, since the only
compatible reader would be your specific unmarshaller. Anyhow, it would
reap the benefits of (1) being able to state "XML" in the product
description (thereby raising the probability of acceptance in certain
enterprise environments) and (2) having the marshalled C++ objects (and
values) passing certain firewalls.
Anyhow, one can embed any binary (marshalled) data in an XML document by
simply using a CDATA section such as:
<?xml version="1.0" ?>
<!DOCTYPE object_graph SYSTEM "boost_serial.dtd">
<object_graph>
<![CDATA[
... some binary representation, looks kind of like /A9kjQjA778AkkkQQQ/
]]>
</object_graph>
The binary representation inside the CDATA section must follow the
ISO/IEC 10646 stanard, and should use the UTF-8 or UTF-16 encodings. I
strongly recommend using the UTF-8 here!
There are several encoding schemes for converting binary data to ISO/IEC
10646, such as Base-64.
One could additionally add a MIME type as an attribute to 'object_graph'
to describe the particular binary encoding scheme used. Also, there is
room here for encryption and/or compression...
There are variants having the binary part outside the XML document, as
an attachment to XMTP or SOAP, such as
<SOAP-ENV:Envelope>
  <SOAP-ENV:Body>
    <boost:object_graph
href="http://repository.company.org/files/state_04.bin" >
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>
One could be even more experimental in using MS:s DIME format...
I hope this helped a bit, and I could definitely give more info in
XMLing the serialization library.
Thanks,
David 
-----Original Message-----
From: boost-bounces_at_[hidden]
[mailto:boost-bounces_at_[hidden]] On Behalf Of Robert Ramey
Sent: Monday, November 18, 2002 10:02 PM
To: 'boost_at_[hidden]'
Subject: FW: [boost] Serialization & XML (was Serialization Library
Review)
Is there a reason you sent this to me privately?
> From: David Abrahams <dave_at_[hidden]>
>I believe your assessment that some
>data structures can't be represented using XML is incorrect, and
>that's easy to prove. A serialization library which makes generation
>of XML output difficult is severely handicapped in the modern world.
Well, I have conceded that it was preliminary.  All I know about XML
is from a small book containing a concise description of XML.
My skeptism is based on the following thought experiment:
Suppose on is given a list of polymorphic pointers, some of which
correspond to bottom node of a diamond in heritance structure
and some of which are repeated in the list and serialized
some where else as well.
a) How would such a thing be represented in XML?
b) Could be loaded back to create an equivalent structure?
c) Would it be useful for anything other than this serialization system?
If someone can assure me that the answers to all three of the above
is yes then it should be possible - otherwise not.  Given that its
"easy to prove" these questions should be easy to answer in
a convincing way.
 Robert Ramey
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk