Boost logo

Boost :

From: Damien Fisher (dfisher_at_[hidden])
Date: 2001-09-26 08:59:11


----- Original Message -----
From: <dietmar_kuehl_at_[hidden]>
To: <boost_at_[hidden]>
Sent: Wednesday, September 26, 2001 11:40 PM
Subject: [boost] Re: Unicode in C++; Was: New file uploaded to boost

> > as it must support at least UTF-8 and UTF-16 encodings, and these
> > can be changed at an entity-by-entity level.
>
> No, they don't. They change on a document level. You may have
> entity references to external entities using a different encoding.
> But this is again, a different document.

Oops. Misread the XML spec on this one. I have never actually had a need
to use non-ASCII characters in an XML document, so I was really flying blind
reading the spec :). I am used to using the word "entity" to reference to
what is in the specification defined as an "internal entity," and got a
little confused.

That makes it much easier.

>
> > never fast enough for my needs;
>
> Apparently you are stuck with a lame implementation: There is no
> need to for streams to be slow. Neither for "small" requests, eg.
> when processing characters elementwise, nor for "large" requiests
> processing whole streams, or any size in between.

My choice in the matter is indeed dictated by the implementation, not by the
library, which I think looks really cool, but there just has not been reason
for me to read up on it. Sorry I didn't make that clear.

> > So I would guess that such a library
> > would be required before we could really expect to develop any
> > useful XML parser.
>
> I disagree and actually I think that it makes sense to even provide
> XML parsing and processing facilities on ASCII characters: The
> processor would have two modes, a fast one using ASCII only files
> and a conforming, slower one using Unicode characters. The XML
> specification requires that the latter works but doesn't object
> to the former one.

Ok, my usage of the word "useful" leaves a little to be desired. I meant
"useful" as in "generally useful" ie totally conformant. I apologize for
not being clear (again).

I personally would probably never use the unicode version of the XML parser,
as all the XML I handle day-to-day is strictly ASCII (the joys of being an
English speaker in an English speaking country, I suppose), so an ASCII-only
version would be great.

Perhaps it would be a good idea to ignore unicode until boost finalizes its
solution on the matter. Until then, an ASCII version would definitely
suffice, and I don't imagine it being difficult to move over. So the next
question would be: do we start with W3C interfaces first (DOM or SAX or
both) or come up with a new interface, and build DOM/SAX on top of that?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk