Boost logo

Boost Users :

From: Sebastian.Karlsson_at_[hidden]
Date: 2008-08-04 11:31:19


>>>> 1) In the overview section performance is nowhere to be seen as a
>>>> goal, which for my use case is very important. If I were to use
>>>> the binary archive, how well would it perform in comparsion to
>>>> a hand crafted optimized serialization aproach? I've seen in
>>>> the examples that strings seems to be used to identify data,
>>>> won't this create a large overhead for both deserilzation and
>>>> storage?
>>>
>>> Performance is a secondary goal that I have worked on, especially the
>>> serialization of large dense arrays. This is now as fast ad any hand
>>> crafted approach. What data structures are you interested in?
>>>
>>> Matthias
>>
>> I'm reading a xml file into a custom tree data structure, parsing
>> the string representations into their correct types stored as
>> boost::any. I'm hoping that deserialization using boost::serialize
>> will be considerably faster than using libxml2 which I use to parse
>> the xml file. The node data in this structure pretty much looks
>> like:
>>
>> vector< DataCollection > children; // Naturally all the children of
>> this node
>> std::string name; // This is the tag name in xml
>> boost::any value; // This is <b>value</b> in xml
>> std::map< std::string, boost::any > attributes; // Not entirely
>> suprising the attributes of the xml node
>>
>> The values stored in boost::any will be fairly lightweight, so I
>> would recon that the majority of data read will actually be
>> std::string for keys into the attributes as well as the name of the
>> node. So I guess I'm having a little bit of everything hehe.
>>
>> Since I won't send this data over network, and if I make a build
>> for another system I can just ship different data files, I'm more
>> interested in speed and the flexibility which boost::serilization
>> offers. I'd be very interested in your changes Matthias.
>
> There are not many optimizations for XML files: most of the overhead is
> in parsing the strings. I you are interested in performance, a binary
> archive will always be faster than an XML one. Most of the
> optimizations for binary archives are already in Boost 1.35.
>
> I have a couple of questions:
>
> 1. why are your attributes a std::map< std::string, boost::any > and
> not a std::map< std::string, std::string > ? How do you find out which
> type to use?
>
> 2. why is your value a boost::any? How do you know the type to use?

When I parse the XML file with libxml2 I have a list where the
different types have registered a regex filter which it will use to
find the real type. Lets say you have for example <elem position="3 3
3">, then that will match the vector3 filter and construct a
boost::any holding that vector3. I have a pretty neat system running
here where I just need new types to register at FilterList. My
DataCollection then have a Type& GetAttribute< Type >( const
std::string& ), which basically wraps the any_cast and asserts that
the typeids match. This way I get a pretty decent type safety, and
since the client knows what type to expect it works out in the end.

I don't really know how boost::serialize works under the hood, but I
was expecting to get healthy speed up due to:
A) libxml2 needs to parse the string data, locating start/end of xml
elements, which I'm presuming is pretty costly in searching through
all the string data.
B) When I use libxml2 it first parses data into a string, which I then
need to extract and match at runtime to construct the real type.
C) I'm hoping the binary archive will take up less memory, resulting
in less I/O. I strip the xml formating for example.

I'm also enteraining the thought of having much more complex objects
stored from my application, kind of using the binary archive as a
cache. I haven't really explored that area all that much yet though.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net