Boost logo

Boost Users :

Subject: Re: [Boost-users] Fast XML Parser
From: Mike Marchywka (marchywka_at_[hidden])
Date: 2008-12-14 18:52:24


 
> Thanks for responding. I've never used XML before and have been itching to
> learn XML lately.

http://www.w3.org/TR/REC-xml/#NT-prolog
 
If you are really into this for speed, you might want to try writing your
own code generator from even something simple like the spec document.
It turns out you can grep and sed this quite well and get a decent skeleton.
There are of course plenty of code generators and I'm hoping someone with
experience will comment.
I ended up with code suited to my immediate needs with each state having its
own method but most of the bodies I had to fill in by hand
but the code is was pretty simple for what I needed.
 
I ended up with a bunch of stuff like this that presumably would inline fairly
well. I created maps for the char classes etc but you get the
idea.
 
//20 CData ::= Char* - Char* '>'Char*))
parse_api_type state_CData(STATESIG)
//22 prolog ::= XMLDecl?Misc* doctypedeclMisc*)?
//[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
parse_api_type state_prolog(STATESIG)
{
 ds->enter(22);
  state_XMLDecl(ps,ds); //return false;
  while (state_Misc(ps,ds));
  while (state_doctypedecl(ps,ds)) while (state_Misc(ps,ds));
 ds->exit(22);
 
 
On the few test cases I ran, mostly from here,
 
http://www.sec.gov/Archives/edgar/xbrl.html
 
it seemed to perform quite well for what I was after.
 
Of course there are plenty of SOAP or RSS type examples of
things you can do with XML but I would
point to some others that may be of immediate specific interest.
As I wasn't doing much over Thanksgiving, I thought I would put
in a few comments in favor of computers to these folks,
 
http://www.ots.treas.gov/?p=OpenComment&Topic_id=c0316a9e-1e0b-8562-ebd0-1ae5298909e2
 
http://www.federalreserve.gov/generalinfo/FOIA/index.cfm?doc_id=OP-1338&doc_ver=1&ShowAll=Yes
 
( essentially the same tirade at both locations).
 
 
I summarized some existing computer facilities ( NCBI has some xml options
and the FDA AERS is IIRC SGML ) and make some suggestions for new XML databases.
And of course their comment window is still open if you have an agenda to promote too. LOL.
 
 
 

Mike Marchywka

> To: boost-users_at_[hidden]
> From: jeff_j_dunlap_at_[hidden]
> Date: Sun, 14 Dec 2008 15:20:50 -0600
> Subject: Re: [Boost-users] Fast XML Parser
>
>
> "Alan M. Carroll" wrote in message
> news:7.0.0.16.2.20081214143626.00ef62c0_at_network-geographics.com...
>> Let me start by saying that I am very happy with rapidXML. In fact, we
>> have converted most of our XML parsing from various other libraries to
>> rapidXML and have committed to a complete conversion over time (i.e.,
>> using rapidXML as our only XML parsing library, including replacing
>> Expat). We use XML almost exclusively as a serialization format and
>> rapidXML is excellent for that use case.
>>
>> *However*, I would not recommend rapidXML if you are going to do
>> non-trivial editing of in-place DOM trees. It is not, IMHO, well suited
>> for that. If you're going to do a lot of editing, parsing speed shouldn't
>> be your primary concern. You will want a much richer API as you go on and
>> rapidXML just doesn't provide that. You could build one on top of
>> rapidXML, but why bother when there's things just as good already out
>> there?
>>
>> That said, I have some wrapper code that makes rapidXML even nicer, if
>> you're interested, but it doesn't perform any edit, delete, or add
>> operations since my code base does not perform any of those.
>>
>
> Hi Alan,
>
> Thanks for responding. I've never used XML before and have been itching to
> learn XML lately.
>
>
_________________________________________________________________
Suspicious message? There’s an alert for that.
http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_broad2_122008


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net