Boost :

davidb_at_[hidden]

---
<!-- A macro for all possible data values -->
<!ENTITY actual_data % "atomic | compound | ref | array" >
<!--
The primitive data, which is (often) either signed or unsigned, and
normal or long precision
'objectId' is only necessary if referenced. See 'ref'...
-->
<!ELEMENT atomic EMPTY>
<!ATTLIST atomic
	objectId ID #IMPLIED
	signed (true|false) #IMPLIED
	long (true|false) #IMPLIED
	type (char|short|int|long|float|double) #IMPLIED
	value CDATA #IMPLIED
>
<!--
The (short) type description of compound data, which is either a class
or a struct declaration.
This description could be extended to incorporate behaviour and
layout...
Note that this type description includes meta type descriptions, i.e.,
templates...
Also, 'instantiates' refers to the template instantiated in creating
this type (i.e., a 'typeId' of a 'type'), in which case 'name' is not
required.
One could also divide the 'type' element into an 'actual_type' and
'template' to distinguish these two meta levels in C++...
The 'instantiationParams' is a very ad-hoc way to provide instantiation
information...
-->
<!ELEMENT type EMPTY>
<!ELEMENT type
	kind (struct | class | template | instance) "struct"
	typeId ID #IMPLIED
	instantiates IDREF #IMPLIED
	instantiationParams #CDATA #IMPLIED
	namespace CDATA ""
	name CDATA #IMPLIED
>
<!--
The actual compund data, referring to the aforementioned type
descriptions
'type' refers to a 'type' element.
'objectId' is only necessary if referenced (in contrast to pure embedded
compounds). Note that this will simply be a document-wide unique ID in
most cases...
-->
<!ELEMENT compound (%value;)*>
<!ATTLIST compound
	objectId ID #IMPLIED
	type IDREF #REQUIRED
>
<!--
The other kind of composition is, obviously, arrays.
The polymorphism in this element definition w.r.t. the actual items will
not be used by the C++ runtime...
-->
<!ELEMENT array (%value;)>
<!ATTLIST array
	objectId ID #IMPLIED
	length NMTOKEN #IMPLIED
>
<!--
Ok, we might need references (including pointers) to data.
Note that this assumes that the reference is actually referring to a
valid object or value, and not some arbitrary address, which is
obviously not self-evident in the C++ case (this is what my Java alter
ego does not have to deal with...)
Even a reference has an optional ID, in case it is referred (known as a
"handle" chain).
'referee' could be a 'objectId' of an 'atomic', 'compound', 'array', or
a 'ref'.
-->
<!ELEMENT ref EMPTY>
<!ATTLIST ref
<!-- The ID of this reference, not the referee !! -->
	objectId ID #IMPLIED
	kind (pointer | reference) "pointer"
	referee IDREF #REQUIRED
>
------------------------------------------------------------------------
--
In the spirit of environment independence, one would also need to
represent the meta data for the behavior and exact layout of structs and
classes (to detail the 'type' elements in the DTD above) used, so a
dynamic environment can replicate that meta structure as well...
Alternative 2:
This would just be XMLish in the superficial sense, since the only
compatible reader would be your specific unmarshaller. Anyhow, it would
reap the benefits of (1) being able to state "XML" in the product
description (thereby raising the probability of acceptance in certain
enterprise environments) and (2) having the marshalled C++ objects (and
values) passing certain firewalls.
Anyhow, one can embed any binary (marshalled) data in an XML document by
simply using a CDATA section such as:
<?xml version="1.0" ?>
<!DOCTYPE object_graph SYSTEM "boost_serial.dtd">
<object_graph>
<![CDATA[
... some binary representation, looks kind of like /A9kjQjA778AkkkQQQ/
]]>
</object_graph>
The binary representation inside the CDATA section must follow the
ISO/IEC 10646 stanard, and should use the UTF-8 or UTF-16 encodings. I
strongly recommend using the UTF-8 here!
There are several encoding schemes for converting binary data to ISO/IEC
10646, such as Base-64.
One could additionally add a MIME type as an attribute to 'object_graph'
to describe the particular binary encoding scheme used. Also, there is
room here for encryption and/or compression...
There are variants having the binary part outside the XML document, as
an attachment to XMTP or SOAP, such as
<SOAP-ENV:Envelope>
  <SOAP-ENV:Body>
    <boost:object_graph
href="http://repository.company.org/files/state_04.bin" >
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>
One could be even more experimental in using MS:s DIME format...
I hope this helped a bit, and I could definitely give more info in
XMLing the serialization library.
Thanks,
David 
-----Original Message-----
From: boost-bounces_at_[hidden]
[mailto:boost-bounces_at_[hidden]] On Behalf Of Robert Ramey
Sent: Monday, November 18, 2002 10:02 PM
To: 'boost_at_[hidden]'
Subject: FW: [boost] Serialization & XML (was Serialization Library
Review)
Is there a reason you sent this to me privately?
> From: David Abrahams <dave_at_[hidden]>
>I believe your assessment that some
>data structures can't be represented using XML is incorrect, and
>that's easy to prove. A serialization library which makes generation
>of XML output difficult is severely handicapped in the modern world.
Well, I have conceded that it was preliminary.  All I know about XML
is from a small book containing a concise description of XML.
My skeptism is based on the following thought experiment:
Suppose on is given a list of polymorphic pointers, some of which
correspond to bottom node of a diamond in heritance structure
and some of which are repeated in the list and serialized
some where else as well.
a) How would such a thing be represented in XML?
b) Could be loaded back to create an equivalent structure?
c) Would it be useful for anything other than this serialization system?
If someone can assure me that the answers to all three of the above
is yes then it should be possible - otherwise not.  Given that its
"easy to prove" these questions should be easy to answer in
a convincing way.
 Robert Ramey
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost