Boost logo

Boost :

From: dietmar_kuehl_at_[hidden]
Date: 2001-09-26 08:40:05


--- In boost_at_y..., "Damien Fisher" <dfisher_at_u...> wrote:
> AFAICT (from the spec at
> http://www.w3.org/TR/2000/REC-xml-20001006#sec-internal-ent), the
> ability to switch between encodings for a given stream is vital for
> a conformant XML parser,

What is vital for a conforming XML parser is that it can read UTF-8
and UTF-16 encodings. Whether it does so by switching any encoding
or any other mechanism is not specified. Also, the encoding is
determined in the XML-declaration, ie. the very first entity read
by the XML parser. Even if you want to have the XML parser decide
on the encoding, you can integrate this logic into a special
code conversion facet which just delegates the requests to
appropriate internally used code conversion facets.

> as it must support at least UTF-8 and UTF-16 encodings, and these
> can be changed at an entity-by-entity level.

No, they don't. They change on a document level. You may have
entity references to external entities using a different encoding.
But this is again, a different document.

> never fast enough for my needs;

Apparently you are stuck with a lame implementation: There is no
need to for streams to be slow. Neither for "small" requests, eg.
when processing characters elementwise, nor for "large" requiests
processing whole streams, or any size in between.

> It also seems that the last paragraph in the above
> e-mail attempts to address this issue.

No. This paragraph addresses the issue of 'wchar_t' being a 16 bit
entity on some platforms and how to cope with this.

> So I would guess that such a library
> would be required before we could really expect to develop any
> useful XML parser.

I disagree and actually I think that it makes sense to even provide
XML parsing and processing facilities on ASCII characters: The
processor would have two modes, a fast one using ASCII only files
and a conforming, slower one using Unicode characters. The XML
specification requires that the latter works but doesn't object
to the former one.

--
<mailto:dietmar_kuehl_at_[hidden]> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk