Just an FYI, boost is not a SAX parser like libxml2, so it will be orders of magnitude faster then.  However you will loose some flexibility if for some reason there is an error in your output stream due to human interaction.  A SAX parser could move on and ignore the error, it's kind of all or nothing with boost::serialization.  Unless Matthias or Robert know something that I don't, as they are the authors, while I'm just a avid user. 

Cheers,
Tim

On Mon, Aug 4, 2008 at 10:31 AM, <Sebastian.Karlsson@mmorpgs.org> wrote:
1) In the overview section performance is nowhere to be seen as a  goal, which for my use case is very important. If I were to use  the  binary archive, how well would it perform in comparsion to a  hand  crafted optimized serialization aproach? I've seen in the  examples  that strings seems to be used to identify data, won't  this create a  large overhead for both deserilzation and storage?

Performance is a secondary goal that I have worked on, especially the
serialization of large dense arrays. This is now as fast ad any hand
crafted approach. What data structures are you interested in?

Matthias

I'm reading a xml file into a custom tree data structure, parsing  the string representations into their correct types stored as  boost::any. I'm hoping that deserialization using boost::serialize  will be considerably faster than using libxml2 which I use to parse  the xml file. The node data in this structure pretty much looks like:

vector< DataCollection > children; // Naturally all the children of  this node
std::string name; // This is the tag name in xml
boost::any value; // This is <b>value</b> in xml
std::map< std::string, boost::any > attributes; // Not entirely  suprising the attributes of the xml node

The values stored in boost::any will be fairly lightweight, so I  would recon that the majority of data read will actually be  std::string for keys into the attributes as well as the name of the  node. So I guess I'm having a little bit of everything hehe.

Since I won't send this data over network, and if I make a build  for another system I can just ship different data files, I'm more  interested in speed and the flexibility which boost::serilization  offers. I'd be very interested in your changes Matthias.

There are not many optimizations for XML files: most of the overhead is
in parsing the strings. I you are interested in performance, a binary
archive will always be faster than an XML one. Most of the
optimizations for binary archives are already in Boost 1.35.

I have a couple of questions:

1. why are your attributes a std::map< std::string, boost::any >  and
not a std::map< std::string, std::string > ? How do you find out which
type to use?

2. why is your value a boost::any? How do you know the type to use?

When I parse the XML file with libxml2 I have a list where the different types have registered a regex filter which it will use to find the real type. Lets say you have for example <elem position="3 3 3">, then that will match the vector3 filter and construct a boost::any holding that vector3. I have a pretty neat system running here where I just need new types to register at FilterList. My DataCollection then have a Type& GetAttribute< Type >( const std::string& ), which basically wraps the any_cast and asserts that the typeids match. This way I get a pretty decent type safety, and since the client knows what type to expect it works out in the end.

I don't really know how boost::serialize works under the hood, but I was expecting to get healthy speed up due to:
A) libxml2 needs to parse the string data, locating start/end of xml elements, which I'm presuming is pretty costly in searching through all the string data.
B) When I use libxml2 it first parses data into a string, which I then need to extract and match at runtime to construct the real type.
C) I'm hoping the binary archive will take up less memory, resulting in less I/O. I strip the xml formating for example.

I'm also enteraining the thought of having much more complex objects stored from my application, kind of using the binary archive as a cache. I haven't really explored that area all that much yet though.

_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users



--
Regards,
Timothy St. Clair
[timothysc@gmail.com]