Subject: Re: [boost] RFC: interest in Unicode codecs?
From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2009-07-20 17:28:08
On Sat, Jul 18, 2009 at 15:34, Cory Nelson<phrosty_at_[hidden]> wrote:
> On Fri, Jul 17, 2009 at 4:29 PM, Rogier van Dalen<rogiervd_at_[hidden]> wrote:
>> Though I'm not sure decoding this much UTF8-encoded data is often a
>> bottleneck in practice,
> UTF-8 is the primary bottleneck in XML decoding. Â That's been my
> motivation thus far.
And is it necessary to decode large stretches of UTF-8 rather than
only the textual content? I imagined performance characteristics are
quite different when you decode only short amounts of text at the same
time. But I've never actually done this comparison, so I'm happy to
take your word for it.
>> It now seems to me that a full Unicode library would be hard to get
>> accepted into Boost; it would be more feasible to get a UTF library
>> submitted, which is more along the lines of your library. (A Unicode
>> library could later be based on the same principles.)
>> Freestanding transcoding functions and codecvt facets are not the only
>> thing I believe a UTF library would need, though. I'd add to the list:
>> - iterator adaptors (input and output);
>> - range adaptors;
>> - a code point string;
>> - compile-time encoding (meta-programming);
>> - documentation.
> I agree, mostly. Â I'm not sure if a special string (as opposed to
> basic_string<utf16_t>) would be worthwhile -- what would you do with
> it that didn't require a full Unicode library supporting it?
Good point. I am not able to come up with a use case, other than "use
it as the base of a grapheme string". From the tactical perspective of
getting something through a Boost review, though, it would help to
flesh out the design of a code point string before writing a grapheme
string in the same vein. I think. But I'm becoming less sure of it as
I write it.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk