|
Boost : |
From: James Porter (porterj_at_[hidden])
Date: 2007-09-27 17:13:41
On 9/27/07, Jeremy Maitin-Shepard <jbms_at_[hidden]> wrote:
>
> I think as others have said, in practice a fixed-width encoding really
> gains you very little or nothing at all. Needing random access to code
> points is, I think, an extremely rare operation.
I know, but it'd be easy to put together a fixed-width encoded basic_string,
and we could use that as a basis for building a code conversion framework,
at least as a proof-of-concept. Of course, that assumes that we'd be using
basic_string for fixed-width strings, which isn't necessarily the case.
UCS-2 is bogus and should not be used at all. Conceivably UCS-4 is
> legitimate but in practice not likely to be used by anyone. Still, it
> is probably important to support it.
Are there any situations where UCS-2 is actually needed (deprecated
libraries, for instance)? If not, then I agree that we can eliminate it.
I don't think the issues of a mutable UTF-8/UTF-16 representation are
> very different from the issues of a mutable UTF-32 representation. In
> practice, in handling non-ASCII text, all searching and replacement will
> be in terms of substrings (likely single or sequences of grapheme
> clusters).
I suppose it depends on how we allow UTF-8/UTF-16 strings to be modified.
Direct (mutable) character access through operator [] would be bad, but
substrings would be better. Depending on the situation, it may be better to
use a stringstream to compose a new string from the old. I'd have to think
about it some more.
- James
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk