Boost logo

Boost Users :

From: Robert Ramey (ramey_at_[hidden])
Date: 2006-12-09 17:10:28


Well, now I'm out of my depth. Some have commented that the spirit parser
is slower than other xml parsers.I don't know. I would have hoped that
since spirit does a lot of the heavy lifting at compile time, it would be
pretty fast. I haven't seen too much data on this so I really don't know.
Any parser has to scan every character in the file so its not clear to me
that a SAX parser or any other can be know a priore to be faster than any
other one.

My reason for using spirit was

a) it was already part of boost
b) it was - after some learning curve - a good fit with what I wanted to do.
c) well documented.
d) customizable - serialization only uses a portion of the full xml so it
seemed the most efficient.
e) all done at compile time so it wouldn't include dead code.
f) portability to all compilers boost supports.
g) By exercising a little care in code organization I was able to arrange
things so that the module containing the parsing didn't depend on the rest
of the program. So the long compile time is not an issue. It is in the
library and is only recompiled when the grammar changes.

It is the last feature that suggests that you can easily use this to do your
own actions upon parsing the serialization library.

After some initial pain figuring out how to use it, I have to say I have
been extremely pleased with this application of spirit. I never wanted to
do xml serialization as I felt it was a pain in the neck and of relatively
little utility in my view. I had anticipated a maintainence nightmare so
more and more obscure corners of xml syntax were touched. I'm pleased to
say this thing has been fantastic as far as I'm concerned. After the intial
one time pain - I haven't had to touch it since 2002 - and this (through
spirt 1.6x - still available) is still compatible with Borland 5.51. And
all the hacks required to make this so portable are only compiled into the
platforms that need them.

This has been one of the most significant implementations in making the
serialization library possible. (the other one would probably be mpl).

So if this were my problem I would:

a) Include the xml grammar and parser from the serialization library - add
my own actions.
b) finish my code. Really this I would expect it would be 100 lines.
c) If its too slow - and if profiling suggests that the spirit parser is the
bottleneck - then
I would look at tweaking the grammar to speed up parsing or replacing the
spirit parser with a faster one. This is my rule: "First make it work
ASAP - then make it faster if necessary"

But I already am somewhat familiar with spirit so it might not be an
interesting option for you. But then yo might be able to use the current
parser unchanged. Of course this would bring the huge benefit that if the
xml_archive parser is tweaked for some reason (there are a couple of issues
with special characters), you would automatically inherit these changes and
still be in sync.

I made the choice to invest the effort to figure out spirit rather than
write my 10,000th file parser. Of course that was my decision and may not be
everyone's preference.

Good Luck

Terence Wilson wrote:
> Robert,
>
> The utility I am writing needs to be able to extract a small portion
> from a large XML file generated by your library. Since it is
> performance sensitive I chose to use a SAX parser in order to avoid
> reading the whole file. Would it be much work to do this with the
> Spirit parser?
>
> As always, thanks for the super-fast response.
>
> Best regards,


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net