|
Boost : |
From: James Porter (porterj_at_[hidden])
Date: 2007-10-17 21:42:50
Phil Endecott wrote:
> Yes, other people have suggested similar things. Even if it were true
> that most charset conversion occured during I/O - and that's not been
> my experience in my own work - then I would still argue that charset
> conversion should be available for use in other contexts.
I don't mean to say that the recode function shouldn't exist, but that
it should exist only as a convenience function. The actual conversion
should be directly usable by I/O operations, so a string doesn't need to
be fully converted (and allocated) before output. For all their problems
(mostly runtime-specified conversion), the std::codecvt facets make it
fairly easy to handle partial conversion and shift states.
> I imagine that an I/O streams library or some sort of adapter layer
> compatible with these strings would be necessary.
I think this is key, and goes back to my argument that string conversion
should be seen as an I/O operation, and separate from the strings
themselves. Unless you merely want a raw byte array, you're
(conceptually) converting bytes into code points and then into some
internal storage container.
For ASCII this is trivial, since each byte is equivalent to a code point
and the storage container is just a char/byte, so you're back to where
you started. For Unicode, this is considerably more complicated (or we
wouldn't be discussing it!). The stream must at least be aware of the
external (file) encoding, in order to keep track of shift states. I
don't think we'd be able to delegate that responsibility to the strings
we'd be filling with data.
> Yes, this has some advantages. But using a map has the disadvantage
> that lookups are more expensive, compared to the array indexed by enum
> that I have; in my code, getting the char* name of a charset is a
> compile-time-constant operation. I'm not sure how much that matters in practice.
Another option would be, for every encoding, to create a class (for
compile-time tagging) and a global instance of that class (for run-time
tagging).
- James
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk