Boost logo

Boost Users :

From: Terence Wilson (tez_at_[hidden])
Date: 2006-12-09 17:55:15


Robert,

XML is normally parsed using a DOM or SAX parser. DOM reads the whole file
into memory, SAX behaves like a recursive descent parser with callbacks to
the client application. By placing the data block at the start of the file I
should be able to get good performance from SAX or Spirit. Both would be
good choices, however, I want to write some 'reference' code using standard
tools since my work will be part of an SDK.

Regards,

Terence

> -----Original Message-----
> From: boost-users-bounces_at_[hidden] [mailto:boost-users-
> bounces_at_[hidden]] On Behalf Of Robert Ramey
> Sent: Saturday, December 09, 2006 2:10 PM
> To: boost-users_at_[hidden]
> Subject: Re: [Boost-users] Boost Serialization make_binary & XML/ASCII
>
> Well, now I'm out of my depth. Some have commented that the spirit parser
> is slower than other xml parsers.I don't know. I would have hoped that
> since spirit does a lot of the heavy lifting at compile time, it would be
> pretty fast. I haven't seen too much data on this so I really don't know.
> Any parser has to scan every character in the file so its not clear to me
> that a SAX parser or any other can be know a priore to be faster than any
> other one.
>
> My reason for using spirit was
>
> a) it was already part of boost
> b) it was - after some learning curve - a good fit with what I wanted to
> do.
> c) well documented.
> d) customizable - serialization only uses a portion of the full xml so it
> seemed the most efficient.
> e) all done at compile time so it wouldn't include dead code.
> f) portability to all compilers boost supports.
> g) By exercising a little care in code organization I was able to arrange
> things so that the module containing the parsing didn't depend on the rest
> of the program. So the long compile time is not an issue. It is in the
> library and is only recompiled when the grammar changes.
>
> It is the last feature that suggests that you can easily use this to do
> your
> own actions upon parsing the serialization library.
>
> After some initial pain figuring out how to use it, I have to say I have
> been extremely pleased with this application of spirit. I never wanted to
> do xml serialization as I felt it was a pain in the neck and of relatively
> little utility in my view. I had anticipated a maintainence nightmare so
> more and more obscure corners of xml syntax were touched. I'm pleased to
> say this thing has been fantastic as far as I'm concerned. After the
> intial
> one time pain - I haven't had to touch it since 2002 - and this (through
> spirt 1.6x - still available) is still compatible with Borland 5.51. And
> all the hacks required to make this so portable are only compiled into the
> platforms that need them.
>
> This has been one of the most significant implementations in making the
> serialization library possible. (the other one would probably be mpl).
>
> So if this were my problem I would:
>
> a) Include the xml grammar and parser from the serialization library - add
> my own actions.
> b) finish my code. Really this I would expect it would be 100 lines.
> c) If its too slow - and if profiling suggests that the spirit parser is
> the
> bottleneck - then
> I would look at tweaking the grammar to speed up parsing or replacing the
> spirit parser with a faster one. This is my rule: "First make it work
> ASAP - then make it faster if necessary"
>
> But I already am somewhat familiar with spirit so it might not be an
> interesting option for you. But then yo might be able to use the current
> parser unchanged. Of course this would bring the huge benefit that if the
> xml_archive parser is tweaked for some reason (there are a couple of
> issues
> with special characters), you would automatically inherit these changes
> and
> still be in sync.
>
> I made the choice to invest the effort to figure out spirit rather than
> write my 10,000th file parser. Of course that was my decision and may not
> be
> everyone's preference.
>
>
> Good Luck
>
> Terence Wilson wrote:
> > Robert,
> >
> > The utility I am writing needs to be able to extract a small portion
> > from a large XML file generated by your library. Since it is
> > performance sensitive I chose to use a SAX parser in order to avoid
> > reading the whole file. Would it be much work to do this with the
> > Spirit parser?
> >
> > As always, thanks for the super-fast response.
> >
> > Best regards,
>
>
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net