Boost logo

Boost :

From: Stefan Seefeld (seefeld_at_[hidden])
Date: 2005-11-07 13:24:20

Anthony Williams wrote:

> Assume I know the encoding and character type I wish to use as input. In order
> to specialize converter<> for my string type, I need to know what encoding and
> character type the library is using. If the encoding and character type are
> not specified in the API, but are instead open to the whims of the backend, I
> cannot write my conversion code.

Ah, I think I understand what you mean by 'character type'. Yes, you are right.
The code as I posted it to the vault is missing these bits. that enable users
to write converters without knowing backend-specific details. However, some
'dom::char_trait' should be enough, right ?

>>>I would suggest that the API accepts input in UTF-8, UTF-16 and UTF-32. The
>>>user then has to supply a conversion function from their encoding to one of
>>>these, and the library converts internally if the one they choose is not the
>>>"correct" one.
>>It already does. libxml2 provides conversion functions. I need to hook them
>>up into such an 'xml char trait'.
> I don't understand how your response ties in with my comment, so I'll try
> again.
> I was suggesting that we have overloads like:
> node::append_element(utf8_string_type);
> node::append_element(utf16_string_type);
> node::append_element(utf32_string_type);
> With two of them (but unspecified which two) converting to the correct
> internal encoding.

Oh, but that multiplies quite a chunk of the API by four !
Typically, a unicode library provides converter functions, so what advantage
would such a rich interface have instead of asking the user to do the conversion
before calling into the xml library ?

If the internal storage encoding is a compile-time constant that can be queried
from the proposed dom::char_trait, it should be simple for users to decide how
to write the converter, and in particular, how to pass strings in the most
efficient way.


> Imagine, for example a web browser or XML editor. The XML comes in as a byte
> stream with an encoding tag such as a Charset-encoding field (if you're
> lucky). You then have to read this and convert it from whatever encoding is
> specified to the DOM library's internal encoding, do some processing and then
> output to the screen in the user's chosen encoding.


> If I specify the conversions to use directly on the input and output, then I
> can cleanly separate my application into three layers --- process input, and
> build DOM in internal encoding; process DOM as necessary; display result to
> user.
> If the string type and encoding is inherently part of the DOM types, this is
> not so simple.

I still don't understand what you have in mind: Are you thinking of using
two separate unicode libraries / string types for input and output ? Again
unicode libraries should provide encoding conversion, if all you want is
to use distinct encodings.

I may not understand the details well enough, but asking for the API to
integrate the string conversions as you seem to be doing sounds exactly
like what you accused me of doing: premature optimization. ;-)

>>I'm not sure I understand your requirement ? Do you really want to plug in
>>multiple unicode libraries / string types ? Or do you want to use multiple
>>encodings ?
> Multiple encodings, generally. However, your converter<> template doesn't
> allow for that --- it only allows one encoding per string type.

Ah, well, the converter is not even half-finished, as in its current form
it is tied to the string type. It sure requires some substantial design
to be of any practical use.


Boost list run by bdawes at, gregod at, cpdaniel at, john at