Boost logo

Boost Users :

From: Tim St. Clair (timothysc_at_[hidden])
Date: 2008-08-04 11:42:55


Just an FYI, boost is not a SAX parser like libxml2, so it will be orders of
magnitude faster then. However you will loose some flexibility if for some
reason there is an error in your output stream due to human interaction. A
SAX parser could move on and ignore the error, it's kind of all or nothing
with boost::serialization. Unless Matthias or Robert know something that I
don't, as they are the authors, while I'm just a avid user.

Cheers,
Tim

On Mon, Aug 4, 2008 at 10:31 AM, <Sebastian.Karlsson_at_[hidden]> wrote:

> 1) In the overview section performance is nowhere to be seen as a goal,
>>>>> which for my use case is very important. If I were to use the binary
>>>>> archive, how well would it perform in comparsion to a hand crafted
>>>>> optimized serialization aproach? I've seen in the examples that strings
>>>>> seems to be used to identify data, won't this create a large overhead for
>>>>> both deserilzation and storage?
>>>>>
>>>>
>>>> Performance is a secondary goal that I have worked on, especially the
>>>> serialization of large dense arrays. This is now as fast ad any hand
>>>> crafted approach. What data structures are you interested in?
>>>>
>>>> Matthias
>>>>
>>>
>>> I'm reading a xml file into a custom tree data structure, parsing the
>>> string representations into their correct types stored as boost::any. I'm
>>> hoping that deserialization using boost::serialize will be considerably
>>> faster than using libxml2 which I use to parse the xml file. The node data
>>> in this structure pretty much looks like:
>>>
>>> vector< DataCollection > children; // Naturally all the children of this
>>> node
>>> std::string name; // This is the tag name in xml
>>> boost::any value; // This is <b>value</b> in xml
>>> std::map< std::string, boost::any > attributes; // Not entirely
>>> suprising the attributes of the xml node
>>>
>>> The values stored in boost::any will be fairly lightweight, so I would
>>> recon that the majority of data read will actually be std::string for keys
>>> into the attributes as well as the name of the node. So I guess I'm having
>>> a little bit of everything hehe.
>>>
>>> Since I won't send this data over network, and if I make a build for
>>> another system I can just ship different data files, I'm more interested in
>>> speed and the flexibility which boost::serilization offers. I'd be very
>>> interested in your changes Matthias.
>>>
>>
>> There are not many optimizations for XML files: most of the overhead is
>> in parsing the strings. I you are interested in performance, a binary
>> archive will always be faster than an XML one. Most of the
>> optimizations for binary archives are already in Boost 1.35.
>>
>> I have a couple of questions:
>>
>> 1. why are your attributes a std::map< std::string, boost::any > and
>> not a std::map< std::string, std::string > ? How do you find out which
>> type to use?
>>
>> 2. why is your value a boost::any? How do you know the type to use?
>>
>
> When I parse the XML file with libxml2 I have a list where the different
> types have registered a regex filter which it will use to find the real
> type. Lets say you have for example <elem position="3 3 3">, then that will
> match the vector3 filter and construct a boost::any holding that vector3. I
> have a pretty neat system running here where I just need new types to
> register at FilterList. My DataCollection then have a Type& GetAttribute<
> Type >( const std::string& ), which basically wraps the any_cast and asserts
> that the typeids match. This way I get a pretty decent type safety, and
> since the client knows what type to expect it works out in the end.
>
> I don't really know how boost::serialize works under the hood, but I was
> expecting to get healthy speed up due to:
> A) libxml2 needs to parse the string data, locating start/end of xml
> elements, which I'm presuming is pretty costly in searching through all the
> string data.
> B) When I use libxml2 it first parses data into a string, which I then need
> to extract and match at runtime to construct the real type.
> C) I'm hoping the binary archive will take up less memory, resulting in
> less I/O. I strip the xml formating for example.
>
> I'm also enteraining the thought of having much more complex objects stored
> from my application, kind of using the binary archive as a cache. I haven't
> really explored that area all that much yet though.
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>

-- 
Regards,
Timothy St. Clair
[timothysc_at_[hidden]]


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net