Boost logo

Boost Users :

From: abir basak (abirbasak_at_[hidden])
Date: 2007-02-13 00:47:02


Boris Kolpackov wrote:
> Hi Abir,
>
> abir basak <abirbasak_at_[hidden]> writes:
>
>> Now I am looking to use spirit for parsing an specific xml file ( w3c
>> inkml file). So my intension is not to have a generic xml parser, rather
>> than a specific xml parser (which also have some BNF grammar) . Anyone
>> had used spirit for domain specific xml parsing?
>
> Trust me you don't want to go this route. Parsing XML is a lot more
> than finding opening and closing tags. To implement a conforming XML
> parser you will need to handle namespaces, entity references, CDATA,
> etc. This is a lot harder to get right than most people think.
>
Yes I know the full xml grammar is really hard to implement. I had a
tough time to implement it in ANTLR :(
Here my intension is not to use full xml grammar, and make a subset of
it, and test how it performs esp when I know what are all tags & what
attribute they can contain. So a generalized validation is also not
needed, as the grammar will validate the file.
Moreover the file is not fully xml, rather also contains BNF grammar
(like SVG or the one I had given as example).
I will surely use a full phased xml parser, if the situation demands so.
But now I am in a mood to experiment with this particular subset of xml
(a w3c format known as InkML, or even a subset of inkml).
> The only time it makes sense to have a domain specific XML parser is
> when you have control over all your XML instances and can make sure
> that only a subset of XML 1.0 is used. This is normally done for
> performance reasons.
>
Yes, the grammar of the file format is specific, just like xhtml or
mathml doesn't need to match all nodes.
>
>> I believe using spirit will make it faster.
>
> Highly unlikely since most of the XML parsers are hand-coded.
>
Not sure why! I always had specific xml parsers in Antlr (the highly
used language recognition tool) faster than the generic one.
>> Also I am interested to
>> parse only a portion of the whole document at a time, and generate data
>> from that portion only, rather generating data for whole DOM (The
>> files are large, 4-20 MB typically)
>> my xml file is something like,
>>
>> [...]
>>
>> note that inside <trace> the grammar is a BNF (comma sep float pairs
>> mostly)
>
> You can use a SAX2 parser (e.g., Expat or Xerces-C++) to handle XML and
> then use Spirit-based parser to handle the data.
>
>
At present I am using Qt Sax parser. That is a good one.
This one is a thought specific to this particular task.
> hth,
> -boris
>
>
And thanks for suggestions ...

-- 
Abir Basak, Member IEEE
Software Engineer, Read Ink Technologies
B. Tech, IIT Kharagpur
email: abir_at_[hidden]
homepage: www.abirbasak.com

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net