Boost logo

Boost :

Subject: Re: [boost] [unicode] Interest Check / Proof of Concept
From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2008-11-19 10:55:11


> Over the past few months, I've been tinkering with a Unicode string library.
> It's still *far* from finished, but it's far enough along that the overall
> structure is visible. I've seen a bunch of Unicode proposals for Boost come
> and go, so hopefully this one will address the most common needs people
> have.

I would love to see a Unicode support library added to Boost.
However, I question the usefulness of another string class, or in this
case another hierarchy of string classes. Interoperability with
std::string (and QString, and CString, and a thousand other
API-specific string classes) is always thorny. I'd much rather see an
iterators- and algorithms-based approach, along the lines of your
ct_string::iterator. Instead of doing this:

> baz.encode(bar,rt::utf8);

I'd rather be able to do something like this:

typedef std::basic_string<some_32bit_char_type> unicode_string;

unicode_string u_string = /*...*/;
std::string std_string = /*...*/;

typedef boost::recoding_iterator<boost::ucs4, boost::utf8> ucs4_to_utf8_iter;
std::copy(ucs4_to_utf8_iter(u_string.begin()),
ucs4_to_utf8_iter(u_string.end()), std::back_inserter(std_string));

// or

typedef boost::recoding_iterator<boost::utf8, boost::ucs4> utf8_to_ucs4_iter;
std::copy(utf8_to_ucs4_iter(std_string.begin()),
utf8_to_ucs4_iter(std_string.end()), std::back_inserter(u_string));

Having iterators that do the right thing, in terms of stepping over
code points or (possibly synthesized) characters as appropriate, in an
efficient manner, would provide a toolkit with which anyone could
write whatever custom Unicode-aware code they need.

Zach


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk