|
Boost : |
From: David Bergman (davidb_at_[hidden])
Date: 2002-11-19 00:22:25
Hi,
This is a comment from the Java corner of the world: I have, as many
other developers using Java, implemented serialization of objects onto
XML. It is not that hard, although there might not exist (can anyone
verify this?) a standardized (more or less...) "C++ Object XML Format".
There are two alternatives:
1. Use an intelligible XML Application (yes, that is what the XML folks
call the specific XML languages, such as XHTML...), giving not only
platform independence (which I assume the serializer module already
achieves...) but language independence, i.e., the object or value can be
unmarshalled, or generated, by a Python program, much in the spirit of
the good old XDR.
2. Embed the binary output of the existing serializer in an XML element.
This constrains the XML snippet to this particular serialization
algorithm, including, at least initially, to the C++ language.
Alternative 1:
One problem is that you have to come up with an XML Application for
these objects. What one does is to give atomic tags (i.e., having
"<!ELEMENT foo EMPTY>") for the primitive data types and then compound
elements for other types (i.e., "<!ELEMENT compund (compound |
atomic)>").
Although XML Schemas would be more suitable than DTDs for the XML
Application we need
One sample DTD (not meant to be a complete description, but rather an
illustrative sample) for such an XML Application would be (I skip the
DTD header...):
------------------------------------------------------------------------
--- <!-- A macro for all possible data values --> <!ENTITY actual_data % "atomic | compound | ref | array" > <!-- The primitive data, which is (often) either signed or unsigned, and normal or long precision 'objectId' is only necessary if referenced. See 'ref'... --> <!ELEMENT atomic EMPTY> <!ATTLIST atomic objectId ID #IMPLIED signed (true|false) #IMPLIED long (true|false) #IMPLIED type (char|short|int|long|float|double) #IMPLIED value CDATA #IMPLIED > <!-- The (short) type description of compound data, which is either a class or a struct declaration. This description could be extended to incorporate behaviour and layout... Note that this type description includes meta type descriptions, i.e., templates... Also, 'instantiates' refers to the template instantiated in creating this type (i.e., a 'typeId' of a 'type'), in which case 'name' is not required. One could also divide the 'type' element into an 'actual_type' and 'template' to distinguish these two meta levels in C++... The 'instantiationParams' is a very ad-hoc way to provide instantiation information... --> <!ELEMENT type EMPTY> <!ELEMENT type kind (struct | class | template | instance) "struct" typeId ID #IMPLIED instantiates IDREF #IMPLIED instantiationParams #CDATA #IMPLIED namespace CDATA "" name CDATA #IMPLIED > <!-- The actual compund data, referring to the aforementioned type descriptions 'type' refers to a 'type' element. 'objectId' is only necessary if referenced (in contrast to pure embedded compounds). Note that this will simply be a document-wide unique ID in most cases... --> <!ELEMENT compound (%value;)*> <!ATTLIST compound objectId ID #IMPLIED type IDREF #REQUIRED > <!-- The other kind of composition is, obviously, arrays. The polymorphism in this element definition w.r.t. the actual items will not be used by the C++ runtime... --> <!ELEMENT array (%value;)> <!ATTLIST array objectId ID #IMPLIED length NMTOKEN #IMPLIED > <!-- Ok, we might need references (including pointers) to data. Note that this assumes that the reference is actually referring to a valid object or value, and not some arbitrary address, which is obviously not self-evident in the C++ case (this is what my Java alter ego does not have to deal with...) Even a reference has an optional ID, in case it is referred (known as a "handle" chain). 'referee' could be a 'objectId' of an 'atomic', 'compound', 'array', or a 'ref'. --> <!ELEMENT ref EMPTY> <!ATTLIST ref <!-- The ID of this reference, not the referee !! --> objectId ID #IMPLIED kind (pointer | reference) "pointer" referee IDREF #REQUIRED > ------------------------------------------------------------------------ -- In the spirit of environment independence, one would also need to represent the meta data for the behavior and exact layout of structs and classes (to detail the 'type' elements in the DTD above) used, so a dynamic environment can replicate that meta structure as well... Alternative 2: This would just be XMLish in the superficial sense, since the only compatible reader would be your specific unmarshaller. Anyhow, it would reap the benefits of (1) being able to state "XML" in the product description (thereby raising the probability of acceptance in certain enterprise environments) and (2) having the marshalled C++ objects (and values) passing certain firewalls. Anyhow, one can embed any binary (marshalled) data in an XML document by simply using a CDATA section such as: <?xml version="1.0" ?> <!DOCTYPE object_graph SYSTEM "boost_serial.dtd"> <object_graph> <![CDATA[ ... some binary representation, looks kind of like /A9kjQjA778AkkkQQQ/ ]]> </object_graph> The binary representation inside the CDATA section must follow the ISO/IEC 10646 stanard, and should use the UTF-8 or UTF-16 encodings. I strongly recommend using the UTF-8 here! There are several encoding schemes for converting binary data to ISO/IEC 10646, such as Base-64. One could additionally add a MIME type as an attribute to 'object_graph' to describe the particular binary encoding scheme used. Also, there is room here for encryption and/or compression... There are variants having the binary part outside the XML document, as an attachment to XMTP or SOAP, such as <SOAP-ENV:Envelope> <SOAP-ENV:Body> <boost:object_graph href="http://repository.company.org/files/state_04.bin" > </SOAP-ENV:Body> </SOAP-ENV:Envelope> One could be even more experimental in using MS:s DIME format... I hope this helped a bit, and I could definitely give more info in XMLing the serialization library. Thanks, David -----Original Message----- From: boost-bounces_at_[hidden] [mailto:boost-bounces_at_[hidden]] On Behalf Of Robert Ramey Sent: Monday, November 18, 2002 10:02 PM To: 'boost_at_[hidden]' Subject: FW: [boost] Serialization & XML (was Serialization Library Review) Is there a reason you sent this to me privately? > From: David Abrahams <dave_at_[hidden]> >I believe your assessment that some >data structures can't be represented using XML is incorrect, and >that's easy to prove. A serialization library which makes generation >of XML output difficult is severely handicapped in the modern world. Well, I have conceded that it was preliminary. All I know about XML is from a small book containing a concise description of XML. My skeptism is based on the following thought experiment: Suppose on is given a list of polymorphic pointers, some of which correspond to bottom node of a diamond in heritance structure and some of which are repeated in the list and serialized some where else as well. a) How would such a thing be represented in XML? b) Could be loaded back to create an equivalent structure? c) Would it be useful for anything other than this serialization system? If someone can assure me that the answers to all three of the above is yes then it should be possible - otherwise not. Given that its "easy to prove" these questions should be easy to answer in a convincing way. Robert Ramey _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk