Boost logo

Boost :

From: Anthony Williams (anthony_w.geo_at_[hidden])
Date: 2005-11-08 10:24:53


Stefan Seefeld <seefeld_at_[hidden]> writes:

>> If the encoding is only available as a compile-time constant, that won't help
>> me write a converter. I need it available as a software-writing-time constant
>> for that (i.e. specified in the documentation).
>>
>> If you don't want to fix the encoding in the docs, maybe we should require
>> that the user supply conversions to each of UTF-8, UTF-16 and UTF-32, and the
>> library will use whichever is most convenient.
>
> Exactly. Unicode libraries provide these conversion functions, and the user
> should be able to implemet the boost::xml::dom::converter with these.

Ok. So boost::xml::dom::converter will have convert_to_utf8, convert_to_utf16
and convert_to_utf32 functions (and their inverses) which the user will have
to implement. I'm happy with that.

>> With the current design, the whole API is tied to a single external string
>> type, with a single converter function for converting to the internal string
>> type. This implies that if you wish to use different encodings, you need a
>> different external string type, and therefore you end up with different
>> template instantiations for different encodings, and my nice separate
>> application parts suddenly need to know what encodings are used for input and
>> output.
>
> Oh, now I see your point ! You argue that multiple encodings will be tied to
> multiple C++ types, even if they are part of the same unicode library.
> I'm not quite sure what to say. I suspect there are ways around this issue
> with a clever choice of the string type template argument for the library.
> But if not, let's fix that once it becomes a problem.
> I'd rather start simple and let the system evolve once we see users plug
> real unicode libraries into it.

I guess we disagree over what's simple:

    I see simple => API not tied to external string type; it's up to the user
    to do the conversions to/from the internal string type as and when they
    see fit.

    As I understand what you're saying, you see simple => user supplies data
    in their own external string type, and library calls back to the
    user-supplied converter to convert to/from the internal string type when
    needed.

If we start off with the internal string type being UTF-8 encoded strings, and
have the API accept/return internal strings, then we can discard the converter
stuff for now, get rid of all the template parameters specifying the external
string type, and focus on the API details.

If we're going to allow choice of backends (and I'm not happy being tied to
libxml2), it would be nice to allow for this internal string type to have a
different encoding and character type (in order to avoid unnecessary
conversions), but we could leave that for now.

>> For axemill, I decided to provide a set of conversion templates for converting
>> between encodings.
>
> what unicode libraries are you working with ? As I said above, I'd suspect these
> to provide all coversions, no matter whether that would generate a new C++
> type or not.

Personally I don't use a separate Unicode library; I write the functions I
need as and when I need them. With axemill, I again make no assumptions about
the Unicode library, but expect the user to provide appropriate
specializations of the Encode and Decode templates, using their Unicode
library of choice. The axemill API itself expects everything to be in the
internal string type; the Encode and Decode templates, and the convert to/from
functions are provided as a convenience to the user, to assist them in working
with a Unicode library of their choice.

Anthony

-- 
Anthony Williams
Software Developer
Just Software Solutions Ltd
http://www.justsoftwaresolutions.co.uk

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk